Estimation of recombination rates in a
dairy cattle population
A. Hampel, F. Teuscher, D. Wittenburg 01.09.2016
3
Background: Half-sib family
4
Background: Population parameters
Population structure has influence on population parameters Population LD Population recombination rate Maternal LD Paternal recombination rate
5
Background: Linkage disequilibrium
B b
A a 𝐴𝐴; 𝑎𝑎 … two alleles at a locus (allele frequency 𝑓𝑓𝐴𝐴 ,𝑓𝑓𝑎𝑎)
𝐵𝐵; 𝑏𝑏 … two alleles at a locus (allele frequency 𝑓𝑓𝐵𝐵 ,𝑓𝑓𝑏𝑏) Frequencies of combinations in a population:
6
Loci are in linkage equilibrium: 𝑓𝑓𝐴𝐴𝐵𝐵𝑓𝑓𝑎𝑎𝑏𝑏 = 𝑓𝑓𝐴𝐴𝑏𝑏𝑓𝑓𝑎𝑎𝐵𝐵
𝐷𝐷 = 𝑓𝑓𝐴𝐴𝐵𝐵𝑓𝑓𝑎𝑎𝑏𝑏 − 𝑓𝑓𝐴𝐴𝑏𝑏𝑓𝑓𝑎𝑎𝐵𝐵
Loci are in linkage disequilibrium: 𝐷𝐷 … disequilibrium coefficient
𝑓𝑓𝐴𝐴𝐵𝐵𝑓𝑓𝑎𝑎𝑏𝑏 ≠ 𝑓𝑓𝐴𝐴𝑏𝑏𝑓𝑓𝑎𝑎𝐵𝐵
7
Paternal diplotype
1
2 (1 − 𝜃𝜃)
Daughter generation Probability
A B
a b
A B
Background: Recombination rate
1 2 𝜃𝜃 𝜃𝜃 … Recombination rate a B A b a b 1 2 (1 − 𝜃𝜃) 1 2 𝜃𝜃
8
Objective: Estimation of LD and recombination rates
Both methods were applied to an empirical dataset
Verification of the accuracy was performed in simulated half-sib families
EM Method* New Method
Minimization approach with less computation time
Maximization approach with high computation time
*Gomez-Raya (2012): Maximum likelihood estimation of linkage disequilibrium in half-sib families. Genetics
9
Parameters
Maternal haplotype frequencies Genotype counts from offspring Paternal recombination rate
Solved by applying the EM algorithm
10
Parameters
Empirical covariance between genotype codes for additive/dominant effects at two SNPs
e.g. AA 1, Aa 0, aa -1 / e.g. AA -1, Aa 1, aa -1
Allele frequency in the maternal population LD of dam; LD of sire 𝜃𝜃 = 1−4𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
2
11
Empirical data set
Comprised 1295 half-sibs of a dairy cattle population (Fugato-plus “BovIBI” data)
40317 SNP-genotypes (29 autosomes) Estimation on BTA1
Maternal linkage disequilibrium Paternal recombination rate
12
Results: LD on chromosome 1
EM Method New Method 𝐷𝐷�𝑑𝑑𝑎𝑎𝑑𝑑 𝐷𝐷�𝑑𝑑𝑎𝑎𝑑𝑑 Fr eque nc y Fr eque nc y13
Results: 𝜃𝜃 on chromosome 1
𝜃𝜃� 𝜃𝜃� Fr eque nc y Fr eque nc y EM Method New Method14
Simulation: Parameters
𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.15 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 = 0.05 𝑁𝑁 ∈#{30,100,1000} 𝜃𝜃 ∈ {0.01, 0.05, 0.10, 0.20, 0.40, 0.50}LD Population size Recombination rate
15
Bias of paternal recombination rate Mean squared error (𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑)
Computation time Selection of results
16
Results: Bias of 𝜃𝜃
New Method EM Method New Method EM Method
𝜃𝜃 = 0.20; N = 30 𝜃𝜃� 𝜃𝜃 = 0.01; N = 30 𝜃𝜃�
17
Results: MSE of 𝐷𝐷
𝑑𝑑𝑎𝑎𝑑𝑑𝑀𝑀𝑀𝑀 𝑀𝑀 𝐷𝐷� 𝑑𝑑𝑎𝑎 𝑑𝑑 𝜃𝜃
18
Results: Computation time
N=30 N=100 N=1000 FBN-Method Gomez-Raya tim e i n se conds 20 40 60 80 100 120 0 1 1 𝜃𝜃� 1
19
𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑
Results: Likelihood function
Two maxima of likelihood function Example: 𝑓𝑓1𝑑𝑑𝑎𝑎𝑑𝑑 = 𝑓𝑓 2𝑑𝑑𝑎𝑎𝑑𝑑 = 0.5 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.15 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 = 0.05 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.05 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 = 0.15
20 Simulation 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.15 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.05 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 = 0.05 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 = 0.15 𝜃𝜃 = 0.20 𝜃𝜃 = 0.40 𝑓𝑓𝐴𝐴𝐵𝐵 = 0.40 𝑓𝑓𝑎𝑎𝐵𝐵 = 0.10 𝑓𝑓𝑎𝑎𝑏𝑏 = 0.40 𝑓𝑓𝐴𝐴𝑏𝑏 = 0.10 recalculation Complementary case 𝑓𝑓𝐴𝐴𝐵𝐵 = 0.30 𝑓𝑓𝑎𝑎𝐵𝐵 = 0.20 𝑓𝑓𝑎𝑎𝑏𝑏 = 0.30 𝑓𝑓𝐴𝐴𝑏𝑏 = 0.20
Results: Complementary case
21
𝜃𝜃� 𝜃𝜃=0.20; n=100 𝜃𝜃�
Results: Simulation of complementary case
𝜃𝜃=0.40; n=100
22
Outlook and summary
Simulation
The New Method had more accurate estimates for higher recombination rates
New Method
Based on a minimization approach
23
Empirical data set
The comparison of results showed that both methods had a
similar distribution of the maternal LD but the distribution of the recombination rate differed.
24
Maximization function
Two possible solutions two maxima Found in both New and EM Method
Final solution depends on starting values
25
Next steps
Criterion for distinction of both maxima (likelihood values) Estimation of good starting values
(combination of minimization and maximization approach)
26 𝑔𝑔2: = 16𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 + 4𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 1 − 2𝑓𝑓̂1𝑑𝑑𝑎𝑎𝑑𝑑 1 − 2𝑓𝑓̂1𝑑𝑑𝑎𝑎𝑑𝑑 𝑔𝑔1: = 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 + 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 𝑐𝑐𝑐𝑐𝑐𝑐� 𝑑𝑑𝑜𝑜𝑑𝑑 = 16𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 + 4𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 1 − 2𝑓𝑓̂1𝑑𝑑𝑎𝑎𝑑𝑑 1 − 2𝑓𝑓̂1𝑑𝑑𝑎𝑎𝑑𝑑 𝑐𝑐𝑐𝑐𝑐𝑐� 𝑎𝑎𝑑𝑑𝑑𝑑 = 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 + 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 𝑄𝑄 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠, 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 = 𝑐𝑐𝑐𝑐𝑐𝑐� 𝑎𝑎𝑑𝑑𝑑𝑑 − 𝑔𝑔1 2 + 𝑐𝑐𝑐𝑐𝑐𝑐� 𝑑𝑑𝑜𝑜𝑑𝑑 − 𝑔𝑔2 2 arg min(𝑄𝑄) 𝐷𝐷𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠,𝐷𝐷𝑑𝑑𝑑𝑑𝑑𝑑 = 𝑄𝑄 𝐷𝐷 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠, 𝐷𝐷𝑑𝑑𝑎𝑎𝑑𝑑 Minimization approach
Appendix: New Method
Leibniz-Institut für Nutztierbiologie FBN Wilhelm-Stahl-Allee 2 18196 Dummerstorf contact Hampel, Alexander Telefon: +49 38208 68 908 E-Mail: hampel@fbn-dummerstorf.de Internet: www.fbn-dummerstorf.de