python 单元测试

unittest

https://docs.python.org/2/library/unittest.html

A testcase is created by subclassing unittest.TestCase. The three individual tests are defined with methods whose names start with the letters test. This naming convention informs the test runner about which methods represent tests.

The crux of each test is a call to assertEqual() to check for an expected result; assertTrue() or assertFalse() to verify a condition; or assertRaises() to verify that a specific exception gets raised. These methods are used instead of the assert statement so the test runner can accumulate all test results and produce a report.

The setUp() and tearDown() methods allow you to define instructions that will be executed before and after each test method. They are covered in more detail in the section Organizing test code.

The final block shows a simple way to run the tests. unittest.main() provides a command-line interface to the test script. When run from the command line, the above script produces an output that looks like this:

PCA – 数据降维

原数据,2维:(3,4),(6,8)

新数据,2维:(5,0), (10,0)

最终简化为一维:5, 10

从几何来理解,就是坐标轴的旋转。

这里降维的理由:所有的点实际上都是分布在y=(4/3)X这条斜线上的。

related posts:

1. http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/pca_tutorial.pdf

2. https://stats.stackexchange.com/questions/90331/step-by-step-implementation-of-pca-in-r-using-lindsay-smiths-tutorial

3. http://www.cnblogs.com/pangxiaodong/archive/2011/10/15/2212786.html

 

 

git usage

1. 提交

git add .

git commit -m “your comments about this submission”

git push origin master

2. 同时开发,解决冲突

维护一个稳定的master,每个需要开发一个特征的人,都可以创建一个分支,当完成自己的分支工作后,再merge回master。可以想象的一种情况是,在某个人开发某个软件特性的时候,其他人可能在他之前完成了自己的特性的开发,并且merge回了master。那么可能出现的一种问题是:

如果两个人都对某个文件进行了修改,后提交的人咋办?

这个问题必须后提交的人手动解决这个冲突。如果直接覆盖前一个人的修改,那么可能前一个人的代码就会报错。

另一个问题是:如果后一个人修改了前一个人依赖的一个文件,并且前一个人并没有修改这个文件,那么,本次提交就会成功,本来前一个人可以正常运行的代码,就会报错。同样的,如果后一个人依赖的文件,并且在后一个人的开发过程中没有修改,而前一个人做了修改,那么,后一个人的提交可以通过,但是后一个人本来在本地可以正常运行的代码,提交以后就会报错。

所以,如果一个文件是大家都要依赖的,那么,这类文件的修改,必须由专人负责,并且每次的修改,必须要兼顾到所有依赖它的人。

Good place

1.Stanford Medical School:斯坦福医学院

(http://med.stanford.edu/: logo encrypted)

2.harvard medical school: 哈佛医学院

Purcell lab: in Department of Psychiatry at Brigham & Women’s Hospital, an affiliate of Harvard Medical School.(plink)

3.Division of Statistical Genetics, Department of Human Genetics, University of Pittsburgh: 匹兹堡大学,人类遗传学院,统计遗传所

4.ecole polytechnique federale de lausanne: 洛桑联邦理工学院

5.Stowers Institute for Medical Research: 斯托瓦斯医学研究所

 

eigensoft 7.2.1

1.vcf样例

2.genotype

3.smartpca 结果

PC1 PC2
eigenvector1 eigenvector2
eigenvalue 9 0
YC1.YC_snp -0.3162 -0.3162
YC2.YC_snp -0.3162 -0.3162
YC3.YC_snp -0.3162 -0.3162
YC4.YC_snp -0.3162 -0.3162
YC5.YC_snp -0.3162 -0.3162
ZC1.ZC_snp 0.3162 0.3162
ZC2.ZC_snp 0.3162 0.3162
ZC3.ZC_snp 0.3162 0.3162
ZC4.ZC_snp 0.3162 0.3162
ZC5.ZC_snp 0.3162 0.3162

4.PCA图

PCA summary

方法:

1. 使用的population scale SNPs

2. EIGENSOFT 4.2

结果解读:

亚洲的野猪和家猪聚类在一起;欧洲的野猪和家猪以及巴克夏猪聚类在一起;非洲的疣猪和四种野生猪聚在一起(另外这四种野生猪是否也是非洲的?);

引文:

2014 – Whole-genome sequencing of Berkshire (European native pig) provides insights into its origin and domestication

BMC genomics – 2017 – Oreochromis niloticus (Nile Tilapia) – sex determination regions

Sex determination regions

The new O_niloticus_UMD1 assembly was used to study sequence differentiation across two sex-determining regions in tilapias. The first region is an XX/XY sex-determination region on LG1 found in many strains of til-apia [9, 34, 44–47]. We previously characterized this region by whole genome Illumina re-sequencing of pooled DNA from males and females [48]. We realigned these sequences to the new O_niloticus_UMD1 assembly and searched for variants that were fixed in the XX female pool and poly-morphic in the XY male pool. Figure 4 shows the FST and the sex-patterned variant alle le frequencies for the XX/XY O. niloticus comparison across the complete Orenil1.1 and O_niloticus_UMD1 assemblies, while Fig. 5 focuses on the highly differentiated ~9Mbp region on LG1 with a substantial number of sex-patterned variants, indicative of a reduction in recombination in a sex determination region that hasexistedforsometime[48].

The second sex comparison is for an ZZ/WZ sex-determination region on LG3 in a strain of O. aureus [11,49]. This region has not previously been characterized using whole genome sequencing. For this comparison we identified variant alleles fixed in the ZZ male pool and polymorphic in the WZ female pool. Figure 6 shows the FST and the sex-patterned variant allele frequencies for this comparison across the whole O_niloticus_UMD1 assembly, while Fig. 7 focuses on the differentiated region on LG3. O. aureus LG3 contains a large ~50Mbp region of differentiated sex-patterned variants, also indicative of a reduction in recombination in the sex determination region. Figure 6 also shows this differentiation pattern on several other LGs (LG7, LG9, LG14, LG16, LG18, LG22 and LG23). It is possible that these smaller regions of sex-patterned differentiation are actually translocations in O.aureus relative to the O. niloticus genome assembly.

summary of phylogenetic tree

Tools:

2014-RAxML version 8  ->  2006-RAxML-VI-HPC  ->  2005-RAxML-III

Methods:

=>  1981-Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach

Theories:

=>  Maximum Likelihood Approach ->  statistics

一种概率论在统计学的应用,它是参数估计的方法之一。

 

进化模型=替换矩阵


concatenation: 将关心的基因连在一起做

Beavis effect

In a simulation study, William D. Beavis showed that the average estimates of phenotypic variances associated with correctly identified QTL were greatly overestimated if only 100 progeny were evaluated, slightly overestimated if 500 progeny were evaluated, and fairly close to the actual magnitude when 1000 progeny were evaluated.

(http://www.genetics.org/content/165/4/2259)

QTL Analysis

a) Quantitative trait locus (QTL) mapping requires parental strains (red and blue plots) that differ genetically for the trait, such as lines created by divergent artificial selection.

b) The parental lines are crossed to create F1 individuals (not shown), which are then crossed among themselves to create an F2, or crossed to one of the parent lines to create backcross progeny. Both of these crosses produce individuals or strains that contain different fractions of the genome of each parental line. The phenotype for each of these recombinant individuals or lines is assessed, as is the genotype of markers that vary between the parental strains.

c) Statistical techniques such as composite interval mapping evaluate the probability that a marker or an interval between two markers is associated with a QTL affecting the trait, while simultaneously controlling for the effects of other markers on the trait. The results of such an analysis are presented as a plot of the test statistic against the chromosomal map position, in recombination units (cM). Positions of the markers are shown as triangles. The horizontal line marks the significance threshold. Likelihood ratios above this line are formally significant, with the best estimate of QTL positions given by the chromosomal position corresponding to the highest significant likelihood ratio. Thus, the figure shows five possible QTL, with the best-supported QTL around 10 and 60 cM.

https://www.nature.com/scitable/topicpage/quantitative-trait-locus-qtl-analysis-53904

researchers

1.pig

MIKAWA Satoshi (美川智博士)

https://researchmap.jp/read0080334/

2. phylogeny

高芳銮

https://user.qzone.qq.com/58001704/main

http://blog.sciencenet.cn/home.php?mod=space&uid=460481

plink

#snp2bedbimfam
plink –23file JPT-NA19001.snp JPT ID002 –out JPT-NA19001

#去除有问题的snp
plink –bfile JPT-NA19001 –exclude merge.missnp –make-bed –out new

#merge单个文件
plink –bfile source1 –bmerge source2_trial –make-bed –out merged_trial

#merge多个文件
plink –merge-list merge_list –make-bed –out merge

编程哲理

1. 面向对象编程的奥义在于每种数据都自带其操作,这样使用者就不必了解如何操作复杂的数据结构了,而只需要学习这种数据的接口即可;

2.泛型编程使得编写的一种算法可以广泛用于各种类型的数据,这样就不必为每种类型的数据重新重载一次函数。

C++ 模板与泛型编程

“泛型编程旨在编写独立于数据类型的代码” 《c++ primer plus》(6th ed)

实现一种方法,可以用于各种类型的数据。

输出结果: