• No results found

1. Introduction

1.3. Genetic models

1.4.1. Linkage analysis methods

When preparing to undertake a linkage study, it is important that an appropriate

population is collected for analysis and that the appropriate type of analysis and

parameters are selected. This section discusses the different choices available and when

they are typically used.

1.4.1.1. Population choice

Depending upon the type of genetic disease being investigated, different types of

populations are collected for linkage analysis. When trying to find a simple Mendelian

disease gene, large extended families with several affected individuals are preferable for

linkage analysis. For a dominant disease, every affected individual within an extended

pedigree will share one chromosomal region that is not present in any of the unaffected

members of the pedigree (taking into account age of incidence and penetrance). For a

recessive disease, every affected individual will share two chromosomes in the disease

gene region and unaffected individuals will only share one or none. Potentially, one

large family is sufficient to locate the region where a Mendelian disease gene is located.

However, due to the lack of informative recombinants within one pedigree and the lack

of affected individuals in some recessive traits, typically more then one family is

In theory, almost any population containing a few pedigrees with at least two affected

individuals can be used to find a monogenic disease. However, when the aim of a study

is to locate genes involved in complex, polygenic diseases such as IBD, many pedigrees

containing affected relative pairs (ARPs) are required. Within a pedigree containing

multiple affected individuals several disease susceptibility genes could be segregating.

Different combinations of disease genes could be causing the same disease in different

individuals. Thus, to provide sufficient power to detect linkage to a gene in the

presence of interference from affected individuals with other disease genes, multiple

pairs of affected individuals sharing the same gene are required. In complex disease it

is typically hard to find extended pedigrees containing numerous affected individuals.

Yet, due to the increased first-degree relative risk for most genetic disease it is relatively

quick and easy to collect nuclear families containing ASPs. Thus, ASP populations

have become the population of choice for complex polygenic linkage studies.

Still, some scientists have argued that collection of only ASPs limits the power of the

study and ignores data that may be easy to collect and generate from other types of

ARPs. Collecting and analysing all available affected individuals from every pedigree

increases the power of the study. However, if the appropriate weighting or correction is

not applied, large pedigrees with several ARPs can further mask the information from

smaller single ASP pedigrees and overestimate the effect caused by genes within the

extended pedigrees. Thus, it is more important to collect a linkage population with

pedigrees of uniform number and type of ARPs rather than worrying about whether to

1.4.1.2. Parametric vs. non-parametric analysis

To perform linkage analysis on genotype data, the appropriate set of analysis conditions

needs to be selected. Parametric analysis allows for inputs such as disease mode of

inheritance and penetrance to be set prior to analysis. By informing the analysis

program about the disease, it weights certain results more than it would without

parameters. For example, if the disease were a completely penetrant recessive disorder

then the program would increase the significance of regions where two affected siblings

share two chromosomes. Thus, for single gene disorders with known modes of

inheritance and penetrance values parametric analysis is better for detecting linkage.

However, for complex, polygenic disease where the number of genes involved and their

associated modes of inheritance and penetrance values are not known, non-parametric

analysis is better for detecting valid regions of linkage. Still, some scientists prefer to

use several types of parametric analysis rather than non-parametric analysis when

analysing polygenic disease. However, if positive linkage is detected after several types

of parametric analysis are conducted, it is not known whether the result is actually real

and the parameters chosen appropriate or whether it is a false positive generated from

multiple testing and chance selection of parameters coinciding with the data inputted.

1.4.1.3. Two-point vs. multipoint analysis

In the past, the only analysis available for scientists conducting linkage studies was two-

point analysis, where each marker genotyped in a genome scan is analysed for linkage

individually. However, today scientists also have the choice of using multipoint

analysis to analyse genotype data. Multipoint analysis uses the data from all markers

and the distances between them to calculate the linkage across the area between the

markers rather than solely at single marker loci. The results of multipoint analysis of

verified with data from adjacent markers. In contrast, the LOD scores from two-point

analysis of neighbouring markers can vary greatly due to unverified inferences of

sharing status. Only when both parents are heterozygous for different alleles is a family

fully informative and no assumptions made about sharing status. Varying

heterozygosities of markers lead to different proportions of families fully informative

for each marker. Thus, analysing data from markers with varying heterozygosities will

give varying results, with the results from more heterozygous markers being more

accurate. Furthermore, since different markers are informative in different families the

varying two-point results from adjacent markers can be even further accentuated.

Consequently, with the recent availability of multipoint analysis, the use of two-point

analysis has decreased sharply.