Keywords of Genomics

Population genetics

Population genetics is the study of the distribution and change in frequency of alleles within populations, and as such it sits firmly within the field of evolutionary biology.

The main processes of evolution are natural selection, genetic drift, gene flow, mutation, and genetic recombination and they form an integral part of the theory that underpins population genetics.

Studies in this branch of biology examine such phenomena as adaptation, speciation, population subdivision, and population structure.

Population stratification

Population stratification refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease.

It would be caused by systematic differences in the ancestry of cases and controls.

Diploid genome

Diploid genome refers to a genome that contains a balanced set of chromosomes derived equally from maternal and paternal sources.

Coalescent theory

Coalescent theory is a retrospective stochastic model of population genetics that relates genetic diversity in a sample to demographic history of the population from which it was taken.

That is, it is a model of the effect of genetic drift, viewed backwards in time, on the genealogy of antecedents.



A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets – anything your project needs. We recommend including a README, or a file with information about your project. GitHub makes it easy to add one at the same time you create your new repository. It also offers other common options such as a license file.


Branching is the way to work on different versions of a repository at one time.

By default your repository has one branch named master which is considered to be the definitive branch. We use branches to experiment and make edits before committing them to master.


On GitHub, saved changes are called commits.

Pull Request

When you open a pull request, you’re proposing your changes and requesting that someone review and pull in your contribution and merge them into their branch. Pull requests show diffs, or differences, of the content from both branches. The changes, additions, and subtractions are shown in green and red.

GitHub Pages


GitHub Pages are public webpages hosted and published through our site.

You can create and publish GitHub Pages online using the Automatic Page Generator. If you prefer to work locally, you can use the GitHub Desktop or the command line.

Pages are served over HTTP, not HTTPS, so you shouldn’t use them for sensitive transactions, like sending passwords or credit card numbers.


  1. 读摘要
  2. 读图
  3. 选读

J.Q. Liu

  1. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions (2015)
    1. genome variation of wild and domestic yaks
    2. evolution
  2. Genome resequencing: 13 wild yaks and 59 domestic yaks

windows install and configure


  1. 操作系统重装
  2. 硬件驱动重装
  3. 软件重装
    1. DirectX
  4. 运行库
  5. 编程语言编译工具
    1. Java
    2. MinGW
    3. Strawberry Perl
  6. 小工具
    1. daemon tool lites
    2. pchunter
    3. xming+putty
  7. 123

NCBI 使用注意事项及技巧

  1. 关于序列标识join和complement:

    gene complement(2872..3195)
    /gene=” lacZ’ ”
    Sequence:NC_000913.3 (363231..366305, complement)

  2. 在指定的基因组检索目的序列:打开基因组,然后输入目的序列,开始检索。
  3. 对于蛋白质,NCBI提供了查看其CD(conserved domain),名字叫“Identify Conserved Domains”;

WordPress build & tips

  1. 连接数据库
    1. 在空间提供商处找到数据库的IP;
    2. 在网站根目录下找到WP的wp-config.php文件;
    3. 将DB_HOST的值,改为数据库的IP
  2. 备份
    1. 使用WP自带的工具中的导出工具,导出能被WP模板普遍识别的网站内容,方便在网站出现意外时,在任何一个新的WP木板上导入网站所有内容(不含图片)。
    2. 使用空间提供商的数据下载,直接下载整个网站,如若网站出现问题,直接从新上传整个WP;
  3. 防垃圾评论
    1. WP自身会要求管理者审核每一条评论;
    2. 使用Akismet,以后会自动过滤同一个邮箱的评论;
  4. 托管在国内的虚拟主机后,加载缓慢:
    解决方案:安装两个插件:Disable Google Fonts,WP Acceleration for China
  5. wordpress图片使用中文名称,会加载不出来,尽量使用英文名称
  6. wordpress整站迁移:需要修改配置文件。
  7. shift+enter: 回车以后的间距不会那么大
  8. 如果网站的域名改变了,要记得修改数据库的_options中的网站域名

Windows10 debug & optimize





  1. 历史化(historicize)
  2. 对话式(dialogue)
  3. 智性化(intellectualize)
  4. 转化(transformative)


  1. 找到作者想要回答的问题
  2. 他有表述哪些竞争性的理论
  3. 作者提出了他自己的什么观点
  4. 他用了什么东西,什么样的步骤,什么数据证明自己
  5. 他的长处和不足


  1. 我要解答什么问题;
  2. 别人是怎么解答这个问题的;
  3. 我和别人有什么不一样;
  4. 我怎么证明这个不一样是站得住脚的;
  5. 那我有什么不足或者我有什么新的发现或者我有什么优点。


TED speaker

Benoit Mandelbrot: (20 November 1924 – 14 October 2010)

TED title: Fractals and the roughness
Introduction:  Benoit Mandelbrot was a Polish-born, French and American mathematician. He is recognized for his contribution to the field of fractal geometry, which included coining the word “fractal” as well as for developing a “theory of roughness” and “self-similarity” in nature.

Carter Emmart

TED title: A 3D atlas of the universe
Introduction: It is a standalone 4-dimensional space visualization application built on the programmable Partiview data visualization engine designed by Stuart Levy of the National Center for Supercomputing Applications (NCSA) as an adjunct of the NCSA’s Virtual Director virtual choreography project. The Virtual Universe Atlas project was launched by the American Museum of Natural History’s Hayden Planetarium with significant programming support from the National Aeronautics and Space Administration as well as Stuart Levy. The database draws on the National Virtual Observatory.

Daniel Kahneman

TED title: The riddle of experience vs. memory.
Introduction: (Hebrew: דניאל כהנמן‎, born March 5, 1934) is an Israeli-American psychologist notable for his work on the psychology of judgment and decision-making, as well as behavioral economics, for which he was awarded the 2002 Nobel Memorial Prize in Economic Sciences (shared with Vernon L. Smith). His empirical findings challenge the assumption of human rationality prevailing in modern economic theory.

python 使用技巧

  1. 安装 pip:
    1. 下载:
    2. 安装:python
  2. 在windows下使用pip:
    python -m pip
  3. error:Microsoft Visual C++ 9.0 required(Unable to find vcvarsall.bat).
    解决方法:下载 VCForPython27.msi 。








Basic information on S.suis

  1. one of the most prevalent pathogens in swine causing a range of disease syndromes including arthritis (关节炎), meningitis (脑膜炎), pneumonia (肺炎), septicemia (败血症) and endocarditis (心内膜炎), etc. [1]
  2. an zoonotic agent able to induce meningitis, endocarditis, and streptococcal toxic shock-like syndrome in humans. [1]
  3. Thirty three S. Suis serotypes identified on the basis of antigenic differences in their CPS (Capsule Polysacharides). [1]
  4. S. suis 2 mainly infects people who have direct contact with carrier pigs, sick pigs, or raw pork via wounds on the skin, or the mucosa of the mouth, or nasal cavity. [1]
  5. 1642 cases of S. suis human infection had been reported worldwide until Dec. 31, 2013. [1]

[1]. Zhang, Y., Ding, D., Liu, M., Yang, X., Zong, B., Wang, X., Chen, H., Bei, W., and Tan, C. (2016). Effect of the glycosyltransferases on the capsular polysaccharide synthesis of Streptococcus suis serotype 2. Microbiological research.

The elements for building a website – 建站必备

  1. Domain name – 域名
  2. Web hosting – 主机托管
  3. File manager
  4.  Raster graphics editor
  5. Browser
    推荐:Google Chrome、Firefox
  6. CMS (Content Management System)

Key points of the analysis of microarray – 基因芯片分析要点

  1. biological replicates – 生物学重复
    Five or more is usually robust for micro-array studies
  2. qPCR validation
    Micro-array may give many false positives so it is usually necessary to validate the differential expression observed in some of the key genes.

Effect of the glycosyltransferases [糖基转移酶] on the CPS [荚膜多糖] synthesis of S.suis 2

  1. The incomplete CPS resulting from deletion of the cps genes in S.suis 2 SC19;
  2. Interplay between S.suis 2 SC19 and different cell lines in vitro changed by these
    genes deletion
    cps2E, cps2G, cps2J and cps2L
  3. More deposition on the mutant strains of complement C3 in porcine serum
    than on WT
  4. Essential role of the cps genes in viability of SC19 in a murine model

Zhang, Y., Ding, D., Liu, M., Yang, X., Zong, B., Wang, X., Chen, H., Bei, W., and Tan, C. (2016). Effect of the glycosyltransferases on the capsular polysaccharide synthesis of Streptococcus suis serotype 2. Microbiological research.

LaTeX 随笔

  1. LaTeX在windows下认识的文件路径是“/”,而使用Perl的File::Spec包得到的路径使用的是“\”;
  2. 生成dvi: latex filename.tex;
  3. 生成pdf: dvipdfm filename.dvi;

KEGG 使用注意事项

  1. bta里的pathway个数在不断增加,过去抓取的和现在的混着用就会出错;
  2. 批量下载KEGG Mapper生成的图像时,由于网络状况可能导致下载不完全,请一定仔细核实数目是否对应,图像是否完整;


在KEGG中,分子水平上的功能保存在KO(KEGG Orthology)数据库中。这些功能与直系同源组联系在一起,以此来使得一个特殊物种的实验数据可以被扩展到其他物种。KEGG中的基因组注释是直系同源注释,其方式为,为GENES数据库中的每个基因制定KO identifiers (K numbers) 。对于原始数据,像由RefSeq或者GenBank给出的基因名和描述,即使他们和KO的分配不一致,KEGG也不会做任何修改。

将KO的条目与功能表征的序列数据的实验证据联系在一起的工作,已经开始了,并且现在已经展示在REFERENCE下的SEQUENCE子域中。而且,基因组层面的“KEGG GENES”(集合已经被扩展,使其可以将蛋白数据也包含在附录中。最终KO数据库将覆盖所有的功能表征蛋白序列信息(另见”KEGG Enzyme”(。

In KEGG, molecular-level functions are stored in the KO (KEGG Orthology) database and associated with ortholog groups in order to enable extension of experimental evidence in a specific organism to other organisms. Genome annotation in KEGG is ortholog annotaion, assigning KO identifiers (K numbers) to individual genes in the GENES database. No updates are made to original data, such as gene names and descriptions given by RefSeq or GenBank, even if they are inconsistent with the KO assignment.

Major efforts have been initated to associate each KO entry with experimental evidence of functionally characterized sequence data, now shown in the SEQUENCE subfield of the REFERENCE field. Furthermore, the genome-based collection of KEGG GENES has been expanded to allow individual protein data to be included in the addendum category. Eventually the KO database will cover all knowledge on functionally characterized protein sequences (see also KEGG Enzyme).

一般来说,KO对功能直系同源的划分是定义在KEGG分子网络的语境中(KEGG pathway maps, BRITE hierarchies and KEGG modules)。KEGG分子网络实际上是由K numbers标识的网络节点表示的。KOs和相应的分子网络的关系呗存储在下面这个系统中。

KEGG Orthology (KO)

将功能信息和直系同源组关联在一起这个功能是KEGG资源的一个独特的功能。基于有限总量的实验数据生成的对序列相似性的预测被预先定义好在KEGG中。如同在BlastKOALA和其他工具中实现的那样,对KEGG GENES的序列相似性搜索是针对K numbers的。一旦一个K numbers被指定给基因组中的基因,KEGG pathways maps, Brite hierarchies,和KEGG modules都会自动重建。如此一来,就能对较高水平的功能有一个生物学上的科学的诠释。

In general KO grouping of functional orthologs is defined in the context of KEGG molecular networks (KEGG pathway maps, BRITE hierarchies and KEGG modules), which are in fact represented as networks of nodes identified by K numbers. The relationships between KOs and corresponding molecular networks are represented in the following KO system.

KEGG Orthology (KO)The fact that functional information is associated with ortholog groups is a unique aspect of the KEGG resource. The sequence similarity based inference as a generalization of limited amount of experimental evidence is predefined in KEGG. As implemented in BlastKOALA and other tools, the sequence similarity search against KEGG GENES is a search for most appropriate K numbers. Once K numbers are assigned to genes in the genome, the KEGG pathways maps, Brite hierarchies, and KEGG modules are automatically reconstructed, enabling biological interpretation of high-level functions.



DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. [1]


DAVID-WS is made stateful by keeping the state-related input of a user operation in a session context that can be accessed by subsequent user operations within the same session. Users can add lists, change background populations, select species and categories and reset functional parameters for data analysis, as well as query all tools within the same session and format output as desired. [1]

[1] Jiao, X., Sherman, B.T., Huang da, W., Stephens, R., Baseler, M.W., Lane, H.C., and Lempicki, R.A. (2012). DAVID-WS: a stateful web service to facilitate gene/protein list analysis. Bioinformatics 28, 1805-1806.