1. Introduction
1.3. Genetic models
1.4.1. Linkage analysis methods
When preparing to undertake a linkage study, it is important that an appropriate
population is collected for analysis and that the appropriate type of analysis and
parameters are selected. This section discusses the different choices available and when
they are typically used.
1.4.1.1. Population choice
Depending upon the type of genetic disease being investigated, different types of
populations are collected for linkage analysis. When trying to find a simple Mendelian
disease gene, large extended families with several affected individuals are preferable for
linkage analysis. For a dominant disease, every affected individual within an extended
pedigree will share one chromosomal region that is not present in any of the unaffected
members of the pedigree (taking into account age of incidence and penetrance). For a
recessive disease, every affected individual will share two chromosomes in the disease
gene region and unaffected individuals will only share one or none. Potentially, one
large family is sufficient to locate the region where a Mendelian disease gene is located.
However, due to the lack of informative recombinants within one pedigree and the lack
of affected individuals in some recessive traits, typically more then one family is
In theory, almost any population containing a few pedigrees with at least two affected
individuals can be used to find a monogenic disease. However, when the aim of a study
is to locate genes involved in complex, polygenic diseases such as IBD, many pedigrees
containing affected relative pairs (ARPs) are required. Within a pedigree containing
multiple affected individuals several disease susceptibility genes could be segregating.
Different combinations of disease genes could be causing the same disease in different
individuals. Thus, to provide sufficient power to detect linkage to a gene in the
presence of interference from affected individuals with other disease genes, multiple
pairs of affected individuals sharing the same gene are required. In complex disease it
is typically hard to find extended pedigrees containing numerous affected individuals.
Yet, due to the increased first-degree relative risk for most genetic disease it is relatively
quick and easy to collect nuclear families containing ASPs. Thus, ASP populations
have become the population of choice for complex polygenic linkage studies.
Still, some scientists have argued that collection of only ASPs limits the power of the
study and ignores data that may be easy to collect and generate from other types of
ARPs. Collecting and analysing all available affected individuals from every pedigree
increases the power of the study. However, if the appropriate weighting or correction is
not applied, large pedigrees with several ARPs can further mask the information from
smaller single ASP pedigrees and overestimate the effect caused by genes within the
extended pedigrees. Thus, it is more important to collect a linkage population with
pedigrees of uniform number and type of ARPs rather than worrying about whether to
1.4.1.2. Parametric vs. non-parametric analysis
To perform linkage analysis on genotype data, the appropriate set of analysis conditions
needs to be selected. Parametric analysis allows for inputs such as disease mode of
inheritance and penetrance to be set prior to analysis. By informing the analysis
program about the disease, it weights certain results more than it would without
parameters. For example, if the disease were a completely penetrant recessive disorder
then the program would increase the significance of regions where two affected siblings
share two chromosomes. Thus, for single gene disorders with known modes of
inheritance and penetrance values parametric analysis is better for detecting linkage.
However, for complex, polygenic disease where the number of genes involved and their
associated modes of inheritance and penetrance values are not known, non-parametric
analysis is better for detecting valid regions of linkage. Still, some scientists prefer to
use several types of parametric analysis rather than non-parametric analysis when
analysing polygenic disease. However, if positive linkage is detected after several types
of parametric analysis are conducted, it is not known whether the result is actually real
and the parameters chosen appropriate or whether it is a false positive generated from
multiple testing and chance selection of parameters coinciding with the data inputted.
1.4.1.3. Two-point vs. multipoint analysis
In the past, the only analysis available for scientists conducting linkage studies was two-
point analysis, where each marker genotyped in a genome scan is analysed for linkage
individually. However, today scientists also have the choice of using multipoint
analysis to analyse genotype data. Multipoint analysis uses the data from all markers
and the distances between them to calculate the linkage across the area between the
markers rather than solely at single marker loci. The results of multipoint analysis of
verified with data from adjacent markers. In contrast, the LOD scores from two-point
analysis of neighbouring markers can vary greatly due to unverified inferences of
sharing status. Only when both parents are heterozygous for different alleles is a family
fully informative and no assumptions made about sharing status. Varying
heterozygosities of markers lead to different proportions of families fully informative
for each marker. Thus, analysing data from markers with varying heterozygosities will
give varying results, with the results from more heterozygous markers being more
accurate. Furthermore, since different markers are informative in different families the
varying two-point results from adjacent markers can be even further accentuated.
Consequently, with the recent availability of multipoint analysis, the use of two-point
analysis has decreased sharply.