BMC genomics – 2017 – Oreochromis niloticus (Nile Tilapia) – sex determination regions

Sex determination regions

The new O_niloticus_UMD1 assembly was used to study sequence differentiation across two sex-determining regions in tilapias. The first region is an XX/XY sex-determination region on LG1 found in many strains of til-apia [9, 34, 44–47]. We previously characterized this region by whole genome Illumina re-sequencing of pooled DNA from males and females [48]. We realigned these sequences to the new O_niloticus_UMD1 assembly and searched for variants that were fixed in the XX female pool and poly-morphic in the XY male pool. Figure 4 shows the FST and the sex-patterned variant alle le frequencies for the XX/XY O. niloticus comparison across the complete Orenil1.1 and O_niloticus_UMD1 assemblies, while Fig. 5 focuses on the highly differentiated ~9Mbp region on LG1 with a substantial number of sex-patterned variants, indicative of a reduction in recombination in a sex determination region that hasexistedforsometime[48].

The second sex comparison is for an ZZ/WZ sex-determination region on LG3 in a strain of O. aureus [11,49]. This region has not previously been characterized using whole genome sequencing. For this comparison we identified variant alleles fixed in the ZZ male pool and polymorphic in the WZ female pool. Figure 6 shows the FST and the sex-patterned variant allele frequencies for this comparison across the whole O_niloticus_UMD1 assembly, while Fig. 7 focuses on the differentiated region on LG3. O. aureus LG3 contains a large ~50Mbp region of differentiated sex-patterned variants, also indicative of a reduction in recombination in the sex determination region. Figure 6 also shows this differentiation pattern on several other LGs (LG7, LG9, LG14, LG16, LG18, LG22 and LG23). It is possible that these smaller regions of sex-patterned differentiation are actually translocations in O.aureus relative to the O. niloticus genome assembly.

summary of phylogenetic tree

Tools:

2014-RAxML version 8  ->  2006-RAxML-VI-HPC  ->  2005-RAxML-III

Methods:

=>  1981-Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach

Theories:

=>  Maximum Likelihood Approach ->  statistics

一种概率论在统计学的应用,它是参数估计的方法之一。

 

进化模型=替换矩阵


concatenation: 将关心的基因连在一起做

Beavis effect

In a simulation study, William D. Beavis showed that the average estimates of phenotypic variances associated with correctly identified QTL were greatly overestimated if only 100 progeny were evaluated, slightly overestimated if 500 progeny were evaluated, and fairly close to the actual magnitude when 1000 progeny were evaluated.

(http://www.genetics.org/content/165/4/2259)

QTL Analysis

a) Quantitative trait locus (QTL) mapping requires parental strains (red and blue plots) that differ genetically for the trait, such as lines created by divergent artificial selection.

b) The parental lines are crossed to create F1 individuals (not shown), which are then crossed among themselves to create an F2, or crossed to one of the parent lines to create backcross progeny. Both of these crosses produce individuals or strains that contain different fractions of the genome of each parental line. The phenotype for each of these recombinant individuals or lines is assessed, as is the genotype of markers that vary between the parental strains.

c) Statistical techniques such as composite interval mapping evaluate the probability that a marker or an interval between two markers is associated with a QTL affecting the trait, while simultaneously controlling for the effects of other markers on the trait. The results of such an analysis are presented as a plot of the test statistic against the chromosomal map position, in recombination units (cM). Positions of the markers are shown as triangles. The horizontal line marks the significance threshold. Likelihood ratios above this line are formally significant, with the best estimate of QTL positions given by the chromosomal position corresponding to the highest significant likelihood ratio. Thus, the figure shows five possible QTL, with the best-supported QTL around 10 and 60 cM.

https://www.nature.com/scitable/topicpage/quantitative-trait-locus-qtl-analysis-53904

researchers

1.pig

MIKAWA Satoshi (美川智博士)

https://researchmap.jp/read0080334/

2. phylogeny

高芳銮

https://user.qzone.qq.com/58001704/main

http://blog.sciencenet.cn/home.php?mod=space&uid=460481

plink

#snp2bedbimfam
plink –23file JPT-NA19001.snp JPT ID002 –out JPT-NA19001

#去除有问题的snp
plink –bfile JPT-NA19001 –exclude merge.missnp –make-bed –out new

#merge单个文件
plink –bfile source1 –bmerge source2_trial –make-bed –out merged_trial

#merge多个文件
plink –merge-list merge_list –make-bed –out merge

编程哲理

1. 面向对象编程的奥义在于每种数据都自带其操作,这样使用者就不必了解如何操作复杂的数据结构了,而只需要学习这种数据的接口即可;

2.泛型编程使得编写的一种算法可以广泛用于各种类型的数据,这样就不必为每种类型的数据重新重载一次函数。

C++ 模板与泛型编程

“泛型编程旨在编写独立于数据类型的代码” 《c++ primer plus》(6th ed)

实现一种方法,可以用于各种类型的数据。

输出结果:

Coordinates

1-based

software: 1). samtools; 2). annovar;

file format: 1). vcf; 2). sam/bam; 3). gff (include end); 4).the Description of Sequence Variants (nomenclature)

0-based

software: 1). bed; (并且最后一个碱基不包含在内,比如3 5,包含的是第2个碱基到第4个碱基之间的序列,一共3个碱基);

 

HOX gene

ref: 2013-the regulation of hox gene expression during animal development

 

homeosis the replacement of part of one segment of an insect or other segemented animal by a structure characteristic of a different segment, especially through mutation.
homeobox any of a class of closely similar sequences which occur in various genes and are involved in regulating embryonic development in a wide range of species

 

 

GATK caveat

1. 选择/过滤

VariantFiltration: Filter variant calls based on INFO and/or FORMAT annotations
output: A filtered VCF in which passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed.
SelectVariants:    Select a subset of variants from a VCF file.
output:
1.如果一个值缺失,VariantFiltration会认为这条值所在的记录通过检查,而SelectVariants认为这条记录不能通过检查。

2.foobar