Omics and Arrays
3.2 Structural Genomics
3.2.2 Physical mapping
Physical mapping entails constructing a physical map which consists of continuous overlapping fragments of cloned DNA that has the same linear order as found in the chromosomes from which they are derived.
A series of overlapping clones or sequences that collectively span a particular chromo-somal region and form a contiguous segment is called a contig. Recommended references for physical mapping include Zhang and Wing (1997), Brown (2002), Meyers et al.
(2004) and Lolle et al. (2005).
DNA libraries
Large-insert DNA libraries are one of the key components in genome research. They are especially useful for genome studies in large and complex genomes. These libraries can be used in a variety of research projects such as physical mapping of chromosomes, map-based cloning of important genes, genome organization and evolution, com-parative genomics and molecular breeding programmes.
A gene or DNA library is a collection of all the genes for an organism so that there is a high probability of finding any particu-lar segment of the source DNA in the col-lection. To contain a colony of bacteria for every gene, a library will consist of tens of thousands of colonies or clones. The col-lection is represented in the form of recom-binants between DNA fragments from the organism and the vector. The library has to be ordered so that each clone has been placed in a precise physical location rela-tive to others (such as in wells of microtitre plates).
Various highly efficient cloning vec-tors have been used to construct DNA libraries. Most frequently used vectors are l phages, cosmids, P1 phages and artificial chromosomes. There are various types of
artificial chromosomes including yeast arti-ficial chromosome (YAC), bacterial arti arti-ficial chromosome (BAC), binary BAC (BIBAC), P1-derived artificial chromosome (PAC), transformation-competent artificial some (TAC), mammalian artificial chromo-some (MAC), human artificial chromochromo-some (HAC) and plant artificial chromosome.
When the DNA is simply ligated to the vec-tor and packaged in the phage particles, the library is said to be unamplified. In an amplified library, the original DNA has been subsequently increased by replication in bacteria.
Which DNA is cloned in libraries depends on the purpose of the research.
Genomic libraries are constructed from the total nuclear DNA of an organism. In mak-ing these libraries, the DNA must be cut into clonable-size pieces as randomly as possi-ble. Shearing or partial digestion with a fre-quently cutting restriction endonuclease is often used. Chromosome-specific libraries are made from the DNA of purified isolated chromosomes. A cDNA library contains a collection of cDNA clones transcribed from mRNAs collected from a specific tissue or organ at a specific growth or developmen-tal stage under a specific environment.
Therefore, a cDNA library only contains the genes that are expressed in the specific con-ditions. Furthermore, cDNAs do not contain introns or promoters.
Functionally, gene libraries can be clas-sified into cloning and expression libraries.
Cloning libraries are constructed by clon-ing vectors which contain replicons, mul-tiple cloning sites and selection markers.
Clones can be multiplied by bacterial cul-ture. Expression libraries are constructed by expression vectors which contain spe-cific sequences that control gene expres-sion such as promoters, Shine-Dalgarno sequences, ATG and stop codons, etc. in addition to those contained in cloning vec-tors. The coding products of clones can be expressed in host cells.
cDNA libraries are often expression libraries in which clone construction is such that part or all of the encoded pro-tein is expressed in bacteria harbouring the cloned DNA. Such expression is needed
in screening libraries using antibodies or enzyme activities.
In order to be confident that virtually all regions of the genome are represented at least once in a library, considerable redun-dancy of cloned DNA must be included in the library. The number of DNA clones (n) needed for a certain probability (P) of finding a target clone, is calculated by the formula: is the haploid genome size in kb. As a rule of thumb, a library containing DNA inserts which collectively add up to three times the amount of DNA in a single gamete of the organism, will provide about 95% confi-dence that any DNA element in the genome is represented at least once in the library.
A library that has ‘five genome-equivalent’
coverage (rather than three), will provide about 99% confidence of including the target element. For example, the number of BACs of an average size of 150 kb required for 5 × coverage of Arabidopsis (m = 125,000 kb) is 3835. When DNA fragments are randomly distributed the probability of obtaining any DNA sequence from this library is no lower than 0.99.
Construction of large insert genomic libraries Construction of large insert genomic librar-ies includes three steps: (i) development of the cloning vector; (ii) isolation of high molecular weight DNA; and (iii) preparation of insert DNA.
DEVELOPMENT OF LARGE-INSERT CLONING VECTORS. Developing a vector which can accommo-date a large DNA fragment has been a dif-ficult task. Ten kb is the maximum insert size of most plasmid vectors. As the insert size increases, the ligation and transforma-tion efficiency decreases significantly.
The first such vector was the bacte-riophage l vector in which the size of the largest DNA insert is about 25 kb. This is
Table 3.1. Characteristics of artificial chromosome vectors.
Maximum DNA Plant
Vector Host size (kb) Stability Chimerism preparation transformation YAC Yeast ∼1000 − + Difficult No
P1 E. coli ∼100 + − Easy No
BAC E. coli ∼300 + − Easy No
PAC E. coli ∼300 + − Easy No
BIBAC/TAC E. coli and
A. tumefaciens ∼300 + − Easy Yes
because the fixed capacity of the phage head prevents genomes that are too long being packaged into progeny particles. Cosmids are one type of hybrid vector that replicate like a plasmid but can be packaged in vitro into l phage coats. The vector can accom-modate DNA inserts as large as 45 kb.
The YAC vector was developed in which an insert up to 1000 kb can be main-tained. The YAC cloning system includes Tel – yeast telemeres, ARS1 – autonomously replicating sequence, CEN4 – centromere from yeast chr.4, URA3 (Uracil) and TRP1 (tryptophan) – yeast selection marker genes, Amp – ampicillin-resistance gene and Ori – origin of replication of pBR322. Although the YAC clones have played a major role in several genome projects and map-based cloning of many genes in the early 1990s, the following four problems have prevented their further use in genome studies: (i) high percentage of chimaeric clones; (ii) dif-ficulty in DNA preparation and storage;
(iii) low transformation efficiency; and (iv) instability of some inserts in yeast. In the rice cultivar Nipponbare for example, 40%
of the clones in the YAC library alone were chimaeric thus limiting its use for genome sequencing or map-based cloning.
The BAC cloning system is based on the E. coli single copy F factor (Shizuya et al., 1992). It is easy to manipulate, screen and maintain the cloned DNA. It is non-chimaeric, and has high transformation efficiency.
To facilitate gene identification in plant species, second-generation BAC vectors such as BIBAC were constructed (Hamilton et al., 1996). A 150-kb human DNA frag-ment in the BIBAC vector was transferred into the tobacco genome by
Agrobacterium-mediated transformation. A similar vec-tor called TAC, was developed and used to complement a mutant phenotype in Arabidopsis (Liu et al., 1999). Table 3.1 provides characteristics of several artificial chromosome vectors.
ISOLATION OF HIGH MOLECULAR WEIGHT DNA. Preparing quality high molecular weight (HMW) DNA (most of the DNA > 1 Mb) suitable for large insert library construc-tion can be one of the most difficult steps in constructing a large-insert plant genomic library. There are four predom-inant problems involved in isolating plant nuclear DNA: (i) plant cell walls must be physically broken or enzymati-cally digested without damaging nuclei;
(ii) chloroplasts must be separated from nuclei and/or preferentially destroyed, an important process since copies of the chloroplast genome may comprise the majority of the DNA within a plant cell;
(iii) volatile secondary compounds such as polyphenols must be prevented from interacting with the nuclear DNA; and (iv) carbohydrate matrices that often form after tissue homogenization must be pre-vented from trapping nuclei.
Several different isolation methods have been developed. The first method was to isolate the protoplast from leaf tis-sue and then embed the protoplast in low-melting point agarose in the forms of a plug or bead. This method is expensive and time consuming. In addition, chloroplast DNA is not separated. The development of methods to isolate nuclei from leaf tissue has dramatically improved the procedure and quality of the HMW DNA for library construction.
PREPARATION OF INSERT DNA FOR LIGATION. The average size of DNA fragments produced by complete digestion with restriction enzymes with four- or six-base recogni-tion sequences is too small for large insert library construction. To obtain relatively HMW restriction fragments (100–300 kb), the popular method is to partially digest the target DNA with a four-base-cut enzyme.
Partial DNA digestion not only yields frag-ments of the desired size but also fragfrag-ments the genome randomly without exclusion of any sequence.
To determine the conditions that yield a maximum percentage of fragments between 100 and 300 kb, a series of partial digestions are carried out by using different amounts of restriction enzyme for a specific diges-tion period. Once the optimal condidiges-tions for producing fragments between 100 and 300 kb are determined, a mass digestion using several plugs is carried out to obtain sufficient DNA for size selection. Partially digested HMW DNA is then subjected to pulsed field gel analysis.
If there is no size selection of partially digested DNA, a random library will have a preponderance of small inserts since small fragments ligate more efficiently and clones with small inserts transform with higher efficiency. Contour-clamped homogeneous electrical field (CHEF) is the most common method for separating large DNA molecules.
It uses a hexagonal array of fixed electrodes and a homogeneous electrical field is gen-erated for enhancing DNA resolution. After two-size selection using CHEF Mapper, the HMW restriction fragments must be removed from surrounding agarose before they can be used in ligation reactions. After developing the high insert library, a number of random clones can be selected to confirm the successful cloning of the inserts and the average insert size. The average insert size will then determine how many clones are needed to achieve the desired amount of genome coverage.
Physical mapping
There are five physical mapping methods:
optical mapping; restriction fragment
fin-gerprinting; chromosome walking; sequence tagged site (STS) mapping; and fluorescent in situ hybridization (FISH). In restriction fragment fingerprinting, individual clones are first digested with different restriction enzymes. The digested DNA is then labelled with radioactive or fluorescent dye and run on a sequence gel. The fingerprint data is collected and analysed for contig assembly.
During the procedure, markers with known map position are used as probes to screen the large insert library. Clones hybridized with the same single copy marker are con-sidered to be overlapping. PCR amplifica-tion of DNA pools using primers derived from DNA markers with known position can also be used for physical map construc-tion. The disadvantages of this method are that it is labour intensive and filling the gaps is difficult.
STS mapping uses a sequenced tagged site (STS) which is a short region of DNA about 200–300 bases long whose exact sequence is found nowhere else in the genome.
Two or more clones containing the same STS must overlap and the overlap must include the STS. There are two disadvantages to this method: it is still very labour intensive and the primer synthesis is expensive.
FISH uses synthetic polynucleotide strands that bear sequences known to be complementary to specific target sequences at specific chromosomal locations. The poly-nucleotides are bound via a series of linked molecules to a fluorescent dye that can be detected with a fluorescence microscope.
In addition, physical mapping can be achieved by a combination of finger-printing, molecular linkage mapping, STS mapping, end sequencing and FISH map-ping. A by-product of physical mapping is the integration of genetic, physical and sequence maps as shown in Fig. 3.6.