Keywords of Genomics

Population genetics

Population genetics is the study of the distribution and change in frequency of alleles within populations, and as such it sits firmly within the field of evolutionary biology.

The main processes of evolution are natural selection, genetic drift, gene flow, mutation, and genetic recombination and they form an integral part of the theory that underpins population genetics.

Studies in this branch of biology examine such phenomena as adaptation, speciation, population subdivision, and population structure.

Population stratification

Population stratification refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease.

It would be caused by systematic differences in the ancestry of cases and controls.

Diploid genome

Diploid genome refers to a genome that contains a balanced set of chromosomes derived equally from maternal and paternal sources.

Coalescent theory

Coalescent theory is a retrospective stochastic model of population genetics that relates genetic diversity in a sample to demographic history of the population from which it was taken.

That is, it is a model of the effect of genetic drift, viewed backwards in time, on the genealogy of antecedents.

GitHub

Repository

A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets – anything your project needs. We recommend including a README, or a file with information about your project. GitHub makes it easy to add one at the same time you create your new repository. It also offers other common options such as a license file.

Branch

Branching is the way to work on different versions of a repository at one time.

By default your repository has one branch named master which is considered to be the definitive branch. We use branches to experiment and make edits before committing them to master.

Commit

On GitHub, saved changes are called commits.

Pull Request

When you open a pull request, you’re proposing your changes and requesting that someone review and pull in your contribution and merge them into their branch. Pull requests show diffs, or differences, of the content from both branches. The changes, additions, and subtractions are shown in green and red.

GitHub Pages

Nottwya

GitHub Pages are public webpages hosted and published through our site.

You can create and publish GitHub Pages online using the Automatic Page Generator. If you prefer to work locally, you can use the GitHub Desktop or the command line.

Pages are served over HTTP, not HTTPS, so you shouldn’t use them for sensitive transactions, like sending passwords or credit card numbers.

如何读文章?

  1. 读摘要
    通过摘要,我们能快速知道,这篇文章的主题、研究对象和实验结论等,这些能够帮助我们最终确定这篇文章是否含有我们需要的信息;
  2. 读图
    通过读图,我们能够迅速知道这篇文章比较凝练的信息,从而快速切入这篇文章的核心结论;此外,图片方便理解,通过图片能够帮助我们对文章建立初步认识;
  3. 选读
    在进行了上面两步以后,选定自己感兴趣的部分进行深入阅读。

J.Q. Liu

  1. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions (2015)
    1. genome variation of wild and domestic yaks
    2. evolution
  2. Genome resequencing: 13 wild yaks and 59 domestic yaks

windows install and configure

如果是让电脑维修店的人重装系统,要注意找一家好一点的维修店。因为,重装系统看似一样,其实,每个店使用的安装镜像以及一些细节的配置是有出入的。去一家很差的店,重装的系统,会给自己后期的配置带来极大的困难。

  1. 操作系统重装
    win7
  2. 硬件驱动重装
    1.显卡驱动
  3. 软件重装
    1. DirectX
  4. 运行库
  5. 编程语言编译工具
    1. Java
    2. MinGW
    3. Strawberry Perl
  6. 小工具
    1. daemon tool lites
    2. pchunter
    3. xming+putty
  7. 123

NCBI 使用注意事项及技巧

  1. 关于序列标识join和complement:
    join:表示序列是模板链上的5′->3’;
    complement:表示序列是编码链上的5′->3’;
    example:
    join:
    现在(2016.4.5),似乎没有再标识join了。
    complement:
    我看到的素有gene类别下的序列都是给的complement。

    gene complement(2872..3195)
    /gene=” lacZ’ ”
    Sequence:NC_000913.3 (363231..366305, complement)

  2. 在指定的基因组检索目的序列:打开基因组,然后输入目的序列,开始检索。
  3. 对于蛋白质,NCBI提供了查看其CD(conserved domain),名字叫“Identify Conserved Domains”;

WordPress build & tips

  1. 连接数据库
    1. 在空间提供商处找到数据库的IP;
    2. 在网站根目录下找到WP的wp-config.php文件;
    3. 将DB_HOST的值,改为数据库的IP
  2. 备份
    1. 使用WP自带的工具中的导出工具,导出能被WP模板普遍识别的网站内容,方便在网站出现意外时,在任何一个新的WP木板上导入网站所有内容(不含图片)。
    2. 使用空间提供商的数据下载,直接下载整个网站,如若网站出现问题,直接从新上传整个WP;
  3. 防垃圾评论
    1. WP自身会要求管理者审核每一条评论;
    2. 使用Akismet,以后会自动过滤同一个邮箱的评论;
  4. 托管在国内的虚拟主机后,加载缓慢:
    原因:使用了google的fonts和ajax库
    解决方案:安装两个插件:Disable Google Fonts,WP Acceleration for China
  5. wordpress图片使用中文名称,会加载不出来,尽量使用英文名称
  6. wordpress整站迁移:需要修改配置文件。
  7. shift+enter: 回车以后的间距不会那么大
  8. 如果网站的域名改变了,要记得修改数据库的_options中的网站域名

Windows10 debug & optimize

 

五分法-如何读书或者文章

从Podcast-清华大学广播台-珊越拾穗-读书小贴士里听到的,觉得很有用。特此记录下来,不断加深巩固理解。

如何读书

  1. 历史化(historicize)
    了解这个书或者文章产生的背景。
  2. 对话式(dialogue)
    看一本书时,要有一个跟这本书相反意见的或者跟他相似意见的书,同时去看。看别的作家怎么讨论这个问题。然后就会发现作家的风格不同,读起来感觉不同。另外是读者跟作者的对话。
  3. 智性化(intellectualize)
    从读书过程中找到、发展自己的观点。
  4. 转化(transformative)
    通过阅读,有自己的发现和创新。

读文章

  1. 找到作者想要回答的问题
    作者通过写这本书,这篇文献,想要解决什么问题。
  2. 他有表述哪些竞争性的理论
    一个研究不可能是开天辟地,头一个的研究。这个问题肯定之前有被研究过,或者其他理论可以解释这个问题。在作者的作品中,他使用了哪些人的理论来解释这个问题,他是怎么解释的,这些理论在解释这个问题中是否有什么不足。
  3. 作者提出了他自己的什么观点
    除了前人的理论,他自己对这个问题的解读。
  4. 他用了什么东西,什么样的步骤,什么数据证明自己
    他是怎么证明自己的观点的。
  5. 他的长处和不足
    讨论他的文章的长处和不足。

写文章

  1. 我要解答什么问题;
  2. 别人是怎么解答这个问题的;
  3. 我和别人有什么不一样;
  4. 我怎么证明这个不一样是站得住脚的;
  5. 那我有什么不足或者我有什么新的发现或者我有什么优点。

特此向主播的导师,总结出这个方法的人,致以我的敬意。

TED speaker

Benoit Mandelbrot: (20 November 1924 – 14 October 2010)

TED title: Fractals and the roughness
Introduction:  Benoit Mandelbrot was a Polish-born, French and American mathematician. He is recognized for his contribution to the field of fractal geometry, which included coining the word “fractal” as well as for developing a “theory of roughness” and “self-similarity” in nature.

Carter Emmart

TED title: A 3D atlas of the universe
Introduction: It is a standalone 4-dimensional space visualization application built on the programmable Partiview data visualization engine designed by Stuart Levy of the National Center for Supercomputing Applications (NCSA) as an adjunct of the NCSA’s Virtual Director virtual choreography project. The Virtual Universe Atlas project was launched by the American Museum of Natural History’s Hayden Planetarium with significant programming support from the National Aeronautics and Space Administration as well as Stuart Levy. The database draws on the National Virtual Observatory.

Daniel Kahneman

TED title: The riddle of experience vs. memory.
Introduction: (Hebrew: דניאל כהנמן‎, born March 5, 1934) is an Israeli-American psychologist notable for his work on the psychology of judgment and decision-making, as well as behavioral economics, for which he was awarded the 2002 Nobel Memorial Prize in Economic Sciences (shared with Vernon L. Smith). His empirical findings challenge the assumption of human rationality prevailing in modern economic theory.

python 使用技巧

  1. 安装 pip:
    1. 下载:https://bootstrap.pypa.io/get-pip.py
    2. 安装:python get-pip.py
  2. 在windows下使用pip:
    python -m pip
  3. error:Microsoft Visual C++ 9.0 required(Unable to find vcvarsall.bat).
    解决方法:下载 VCForPython27.msi 。
    地址: http://www.microsoft.com/en-us/download/confirmation.aspx?id=44266

虞美人

C130720

break

break

少年听雨歌楼上,
红烛昏罗帐。
去年听雨客舟中,
江阔云低,
断雁叫西风。

而今听雨屋檐下,
秋叶已凋零。
悲欢离合总多情,
伊人何处?
总在寒冷清秋。

2132454423561438277

Basic information on S.suis

  1. one of the most prevalent pathogens in swine causing a range of disease syndromes including arthritis (关节炎), meningitis (脑膜炎), pneumonia (肺炎), septicemia (败血症) and endocarditis (心内膜炎), etc. [1]
  2. an zoonotic agent able to induce meningitis, endocarditis, and streptococcal toxic shock-like syndrome in humans. [1]
  3. Thirty three S. Suis serotypes identified on the basis of antigenic differences in their CPS (Capsule Polysacharides). [1]
  4. S. suis 2 mainly infects people who have direct contact with carrier pigs, sick pigs, or raw pork via wounds on the skin, or the mucosa of the mouth, or nasal cavity. [1]
  5. 1642 cases of S. suis human infection had been reported worldwide until Dec. 31, 2013. [1]

[1]. Zhang, Y., Ding, D., Liu, M., Yang, X., Zong, B., Wang, X., Chen, H., Bei, W., and Tan, C. (2016). Effect of the glycosyltransferases on the capsular polysaccharide synthesis of Streptococcus suis serotype 2. Microbiological research.

The elements for building a website – 建站必备

  1. Domain name – 域名
    功能:让对方找到你的网站。
    购买处:西部数码
  2. Web hosting – 主机托管
    功能:放置你的网站主体(包括,文字、图像、声音、脚本等)的地方。
    购买处:hostinger
  3. File manager
    功能:网站文件的上传、下载和修改
    推荐:FileZilla
  4.  Raster graphics editor
    功能:图片编辑
    推荐:photoshop
  5. Browser
    功能:网页效果预览
    推荐:Google Chrome、Firefox
  6. CMS (Content Management System)
    功能:方便用户管理网站内容,发布内容;
    推荐:WordPress

Key points of the analysis of microarray – 基因芯片分析要点

  1. biological replicates – 生物学重复
    Five or more is usually robust for micro-array studies
    五个及以上的样本数对基因芯片研究来说才是具有鲁棒性的。
  2. qPCR validation
    Micro-array may give many false positives so it is usually necessary to validate the differential expression observed in some of the key genes.
    基因芯片可能产生许多的假阳性结果,所以验证部分关键的差异表达基因通常来说是必须的。

Effect of the glycosyltransferases [糖基转移酶] on the CPS [荚膜多糖] synthesis of S.suis 2

  1. The incomplete CPS resulting from deletion of the cps genes in S.suis 2 SC19;
  2. Interplay between S.suis 2 SC19 and different cell lines in vitro changed by these
    genes deletion
    cps2E, cps2G, cps2J and cps2L
  3. More deposition on the mutant strains of complement C3 in porcine serum
    than on WT
  4. Essential role of the cps genes in viability of SC19 in a murine model

Zhang, Y., Ding, D., Liu, M., Yang, X., Zong, B., Wang, X., Chen, H., Bei, W., and Tan, C. (2016). Effect of the glycosyltransferases on the capsular polysaccharide synthesis of Streptococcus suis serotype 2. Microbiological research.

LaTeX 随笔

  1. LaTeX在windows下认识的文件路径是“/”,而使用Perl的File::Spec包得到的路径使用的是“\”;
  2. 生成dvi: latex filename.tex;
  3. 生成pdf: dvipdfm filename.dvi;

KEGG 使用注意事项

  1. bta里的pathway个数在不断增加,过去抓取的和现在的混着用就会出错;
  2. 批量下载KEGG Mapper生成的图像时,由于网络状况可能导致下载不完全,请一定仔细核实数目是否对应,图像是否完整;

KEGG ORTHOLOGY (KO) Database

在KEGG中,分子水平上的功能保存在KO(KEGG Orthology)数据库中。这些功能与直系同源组联系在一起,以此来使得一个特殊物种的实验数据可以被扩展到其他物种。KEGG中的基因组注释是直系同源注释,其方式为,为GENES数据库中的每个基因制定KO identifiers (K numbers) 。对于原始数据,像由RefSeq或者GenBank给出的基因名和描述,即使他们和KO的分配不一致,KEGG也不会做任何修改。

将KO的条目与功能表征的序列数据的实验证据联系在一起的工作,已经开始了,并且现在已经展示在REFERENCE下的SEQUENCE子域中。而且,基因组层面的“KEGG GENES”(http://www.genome.jp/kegg/genes.html)集合已经被扩展,使其可以将蛋白数据也包含在附录中。最终KO数据库将覆盖所有的功能表征蛋白序列信息(另见”KEGG Enzyme”(http://www.genome.jp/kegg/annotation/enzyme.html))。

In KEGG, molecular-level functions are stored in the KO (KEGG Orthology) database and associated with ortholog groups in order to enable extension of experimental evidence in a specific organism to other organisms. Genome annotation in KEGG is ortholog annotaion, assigning KO identifiers (K numbers) to individual genes in the GENES database. No updates are made to original data, such as gene names and descriptions given by RefSeq or GenBank, even if they are inconsistent with the KO assignment.

Major efforts have been initated to associate each KO entry with experimental evidence of functionally characterized sequence data, now shown in the SEQUENCE subfield of the REFERENCE field. Furthermore, the genome-based collection of KEGG GENES has been expanded to allow individual protein data to be included in the addendum category. Eventually the KO database will cover all knowledge on functionally characterized protein sequences (see also KEGG Enzyme).

一般来说,KO对功能直系同源的划分是定义在KEGG分子网络的语境中(KEGG pathway maps, BRITE hierarchies and KEGG modules)。KEGG分子网络实际上是由K numbers标识的网络节点表示的。KOs和相应的分子网络的关系呗存储在下面这个系统中。

KEGG Orthology (KO)

将功能信息和直系同源组关联在一起这个功能是KEGG资源的一个独特的功能。基于有限总量的实验数据生成的对序列相似性的预测被预先定义好在KEGG中。如同在BlastKOALA和其他工具中实现的那样,对KEGG GENES的序列相似性搜索是针对K numbers的。一旦一个K numbers被指定给基因组中的基因,KEGG pathways maps, Brite hierarchies,和KEGG modules都会自动重建。如此一来,就能对较高水平的功能有一个生物学上的科学的诠释。

In general KO grouping of functional orthologs is defined in the context of KEGG molecular networks (KEGG pathway maps, BRITE hierarchies and KEGG modules), which are in fact represented as networks of nodes identified by K numbers. The relationships between KOs and corresponding molecular networks are represented in the following KO system.

KEGG Orthology (KO)The fact that functional information is associated with ortholog groups is a unique aspect of the KEGG resource. The sequence similarity based inference as a generalization of limited amount of experimental evidence is predefined in KEGG. As implemented in BlastKOALA and other tools, the sequence similarity search against KEGG GENES is a search for most appropriate K numbers. Once K numbers are assigned to genes in the genome, the KEGG pathways maps, Brite hierarchies, and KEGG modules are automatically reconstructed, enabling biological interpretation of high-level functions.

DAVID/DAVID-WS使用技巧

DAVID-WS(网络服务)被开发出来,使用户完成任务无需进行人工交互,而是编程接入DAVID,经由状态网络服务实现自动化。

DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. [1]

DAVID-WS通过保留一个用户在一次查询会话中的状态相关的操作输入,使这些输入能在用户该次会话接下来的操作中被获取,从而达到状态化。用户可以增添基因列表,改变分析背景总体,选择物种和种类,重置数据分析的功能参数,在一次会话中调用所有工具以及按照希望规范输出。

DAVID-WS is made stateful by keeping the state-related input of a user operation in a session context that can be accessed by subsequent user operations within the same session. Users can add lists, change background populations, select species and categories and reset functional parameters for data analysis, as well as query all tools within the same session and format output as desired. [1]

[1] Jiao, X., Sherman, B.T., Huang da, W., Stephens, R., Baseler, M.W., Lane, H.C., and Lempicki, R.A. (2012). DAVID-WS: a stateful web service to facilitate gene/protein list analysis. Bioinformatics 28, 1805-1806.