1.5. Identifying disease genes
1.5.6 Physical mapping
If there are no candidate genes in the region, or a mutation cannot be found in the candidate genes identified, the next step is to physically map the region in an attempt to identify new genes in the region. Human genomic DNA from the region can be isolated and cut using restriction endonucleases into small fragments of DNA, which can be cloned into a vector. A collection o f clones covering a genome is known as a library. Clones containing DNA mapping to the disease interval are identified by screening the library, with a fragment of DNA known to map to the interval. In most cases, the fragment of DNA used is one of the microsatellite markers known to be linked to the disease. Any fragment of DNA for which the sequence is known is called a sequence tagged site (STS). Every time a clone is identified, human DNA sequenced from the clone can be used as the STS for the next round of screening the library. A set of overlapping clones is called a contig (Davies & Read, 1992).
The choice of vector (YAC, PAC, BAC, cosmid) is dependent on the size of the interval and the size of the fragments of genomic DNA. The yeast artificial chromosome (YAC) has been the vector of choice for contig mapping. The vector can hold large inserts up to 2 million base pairs allowing cloning to occur over a long range (Burke et al., 1987). Restriction mapping with the aid o f pulse field gel electrophoresis (Schwartz & Cantor, 1984) has enabled fingerprinting of such clones and the ability to estimate the physical distances involved. There has been an effort to construct an integrated physical map of the human genome using YACs. Presently these contigs cover up to 90% of the genome (Chumakov et al., 1995). The main problem with YACs is that some human DNA is unclonable in yeast. YACs also tend to be chimaeric (containing more than one fragment of human DNA). This has lead to the use of additional vector systems that have smaller insert sizes and are less likely to be chimaeric. These vector systems include PI artificial chromosomes (PACs - -100 kb inserts), bacterial artificial chromosomes (BACs - -120 kb insert), and cosmids (-35 kb inserts).
Introduction
Clones containing DNA of interest can be mapped back to the region of linkage by a variety of techniques. The cloned fragment can be radioactively labelled and hybridised directly to a spread of metaphase chromosomes {in situ hybridisation). Its cytogenetic localisation can be worked out by examining the spread under a microscope. Fluorescent labelling has largely superseded radiolabelling. Fluorescent
in situ hybridisation (FISH) is a powerful technique for physical mapping.
Somatic cell hybrids are another widely used mapping technique. When rodent and humans cells in culture are fused in the presence o f polyethylene glycol, the resulting heterokaryons are unstable and tend to lose the human chromosomes in a random way. Eventually stable cell lines are produced which contain a full set of rodent chromosomes plus a few human chromosomes. Once the human chromosomes present have been characterised, these stable cell lines can be used to test where a cloned piece of DNA maps (Davies & Read, 1992). Radiation hybrids are even more useful for mapping. They are produced in the same way as somatic cell hybrids except that before fusion the human cells are irradiated to fragment their DNA. After fusion, fragmented human DNA integrates into the rodent genome, producing a much finer mapping tool (Wicking & Williams^ 1991).
The most difficult part of any positional cloning project is the identification o f coding regions in the critical interval. A variety of strategies have been used in the past. These techniques include random sequencing of the clones from the contig. Analysis of this sequence can then take place looking for sequences that may belong to genes. Various computer programs have been designed for this use (Mallon & Strivens, 1998). The GRAIL program is one exon prediction program (Uberbacher et al.,
1991).
Traditional methods of gene detection include CpG island mapping and exon trapping (Boehm, 1998). CpG island mapping takes advantage of the cosegregation of CpG islands in the vicinity of genes (Larsen et al., 1992). CpG islands are short dispersed
regions of immethylated DNA with a high frequency of CpG dinucleotides relative to the whole genome. They are associated with 5' ends of almost all housekeeping genes and 40% of tissue specific genes (Bird, 1986). Exon trapping takes advantage of the fact that all mammalian genes are interrupted by introns removed from RNA transcripts by the process of splicing. This method involves the detection of splice sites and is useful for the identification of genes not characterised by CpG islands or where the level o f expression is low (Duyk et al., 1990, Buckler et al., 1991).
Other methods used are based on the identification of cDNAs mapping to the region. Direct sequencing of cDNA libraries use the genomic clones identified as radioactive hybridisation probes. This method is useful when the candidate region is large, and was used to isolate cDNAs from the Huntington disease region (Snell et al., 1993). Direct cDNA selection (Lovett et al., 1991) utilises genomic DNA to capture cDNA via hybridisation, and PCR to identify transcription products. This allows cDNAs from the disease interval to be identified.
None o f these techniques used in modem cloning would be possible without the polymerase chain reaction. The polymerase chain reaction (PCR) since its discovery (Mullis & Faloora, 1987) has made a profound effect on molecular biology and can be considered the foundation of many modem molecular biology techniques.