Through simulations, I have developed several alternatives to random sib-matings to dramat- ically accelerate the creation of RILs by as much as 16 generations. These include the judicious use of parental backcrossing and the selection of mating pairs based on genotypes from genome- wide SNPs. Both of these techniques, when applied after the point of peak diversity is reached, result in a negligible reduction in the number of segments. I also propose an advanced intercross variant in which MAI is applied during the early generations to increase the number of haplotype segments for better mapping resolution.
In simulation I also have the luxury of assuming uniform litter sizes and equal sex ratios, but in reality the fecundity of a RIL and the sex-balance of litters are complicating issues. As lines become more inbred, fertility generally decreases [53]. One way to address this is to use backcrosses as discussed previously. However fertility issues might override the choice of “best breeding pair”. To address this problem I calculate backups that, when used, may extend the number of generations required to achieve fixation.
Taking fertility into account and prioritizing for the preservation of the lines, how do I select the final breeders? WSM optimizes for becoming inbred in one generation, but it might be more advantageous in the early MAI generations to select for animals whose probability to become inbred in two or more generations is maximized. However, in simulations, the two-generation metric generally chooses the same breeding pairs as the single-generation model, leading to the same number of generations to achieve fixation. Once lines reach small levels of residual heterozygosity, it might also be advantageous to maintain multiple breeding pairs selected to produce compatible offspring, which are more like sib-pairs than cousin-pairs. This provides more pair options, as well as a chance to compensate for uneven sex ratios or small litter sizes. Although it seems best to choose the optimal breeding pairs early on, finding good pairs near the end-game (in order to fix the last 1%-2% of the genome) is a harder problem. The last few heterozygous regions can take several generations to fix if compatible breeding pairs do not exist. Trying to fix the last 1%-2% of the genome is difficult since it may take 1-2 generations for each residual heterozygous region to become fixed. It is unlikely that two compatible breeders will exist that are able to produce offspring in which each of the remaining regions is fixed.
The simulation software used in this analysis is available for download from http://sourceforge.net/p/breedingsim/. It has been adapted for many uses other than marker as- sisted inbreeding such as estimating the significance of measured statistics in the developing CC [17].
side of this work, MUGA and MegaMUGA. I will discuss their design principles as well as their performance metrics.
CHAPTER4: DESIGNING MICRO-ARRAYS FOR MAXIMUM INFORMATIVENESS Genotyping arrays have long been used to characterize the underlying DNA within partic- ular regions of interest in model organisms. More recently, it has become cost-effective to use full-genome genotyping arrays, rather than targeted arrays. These microarrays have the benefit of only needing to be designed once, but can be used for many experiments. When designed properly, these genotyping arrays can be used to distinguish between most population diversity within each area of the genome. However, to make these arrays useful in most experiments they need to be cost-effective and widely-available as well as informative. To be cost-effective, these arrays can only contain a set amount of SNPs based on the cost of the technology at the time of design.
The laboratory mouse is a popular model organism in biomedical research that complements the strengths of many human studies. As a result, a number of these arrays have been designed for use with mouse[64, 14, 52, 34]. However these arrays have either too few markers to be infor- mative genome-wide [14, 52], too many markers to be cost-effective for large experiments[64] or are not widely available [34]. In each case, one of the crucial components for an ideal genotyping platform was missing.
Existing mouse strains exhibit evidence of a population structure which makes them less than ideal.[65] The Collaborative Cross (CC)[17], described in Chapter 2, is an ongoing effort to create a more genetically diverse panel of inbred mouse strains to provide a more useful model for mapping complex genetic traits. In the later generations of inbreeding of the Collaborative Cross, the CC progeny are genotyped to select breeders with the least residual heterozygosity. This genotyping requires an efficient and low-cost platform for determining residual heterozygosity genome wide, as well as the CC founder origin of fixed regions of the genome.
In response to this need for a genotyping platform to use with the CC, two cost-effective, maximally informative, widely available full-genome genotyping arrays were designed. At the time of the original design in 2010, it was determined that cost effective meant a price point of $100/sample, which allowed for the selection of 9,000 SNPs. Two years later when the second generation genotyping array was designed, it was determined that for the same cost, 80,000 SNPs could now be chosen. The first generation genotyping array is called the Mouse Universal Genotyping Array (MUGA) and the second generation array is called MegaMUGA, as it has 10x more SNPs on it than MUGA. Both custom arrays were developed using the Illumina iSelect platform for the Infinium system. In this chapter, I describe the design criteria for each of these two genotyping arrays, as well as the number of samples genotyped on each and the performance of the arrays on these samples.