Variety of Formulae

1. Nucleotide Diversity

Screenshot from 2016-12-30 20-44-52

where xi and xj are the respective frequencies of the ith and jth sequences, πij is the number of nucleotide differences per nucleotide site between the ith and jth sequences, and n is the number of sequences in the sample.

2. variance


3. the average k-mer coverage

如果k=L, 根据公式,那么Ckmer=0。但是这明显是错的,实际应该是1,因为,每个k-mer(reads),至少被覆盖了一次。而在k=L的极端情况下,只会存在少量频率大于1的k-mer。因为reads之间,除了PCR duplication会生成完全相同的reads,一般情况下,两条reads完全相同的概率是非常小的。


Leave a Reply

Your email address will not be published. Required fields are marked *