DNA BASED METHODS
TECHNIQUES FOR SCREENING THE NUCLEOTIDE CHANGES IN CANDIDATE GENES
Mutation screening of the candidate genes in patients is the most sensitive and well-adopted method to confirm the linkage data for a disease. PCR based approaches are the most common methods employed for such purposes in which the genomic DNA or the cDNA of interest are amplified and screened by different techniques.
The difference in the annealing pattern of the primers designed specifically for the wild type/mutant sequence followed by gel analysis based on which the Allele specific PCR amplification system (ARMS) works is used to detect the known sequence changes.
The differences in the electrophoretic mobility between the single strand of wild type and mutant upon denaturation that is in turn governed by their conformation (Single-strand conformation polymorphism) or the heteroduplex formation on reannealing (Heteroduplex analysis) is used for detecting unknown mutations as a rapid method of mutation detection.
An advanced use of chromatographic column based on similar principle mentioned above called as the denaturing high performance liquid chromatography (DHPLC) is used to analyze samples for detecting the known/unknown nucleotide variations. The other rapid techniques for the detection of known mutations include restriction enzyme digestion in which
Table 2.3: Preparation of stacking and separating gel for PAGE
Reagents Separating Gel
Dis. Water (mL) 3.5
Tris HCl (mL) 2.5 (pH 8.8)
30% Acrylamide (mL) 4.0
10% Ammonium per sulfate 50 μL
TEMED 5 μL
151
the alteration in the recognition site for restriction enzyme due to mutation is exploited for detecting the nucleotide change.
The direct sequencing of the genomic DNA or the cDNAs the gold standard for screening unknown mutations, despite being an expensive technique.
Sanger Dideoxy Sequencing
DNA polymerases copy single-stranded DNA templates by adding nucleotides to a growing chain. Chain elongation occurs at 3’ end of the primer,an oligonucleotide that anneals to the template. The extension product grows by the formation of phosphodiester bond between the 3’
hydroxyl end at the growing end of the primer and the 5’ phosphate group of the incoming deoxynucleotide. The growth is in the 5’→ 3’ direction.
DNA polymerases can also incorporate analogues of nucleotide bases. The dideoxy method of DNA sequencing devoloped by Sanger et al (1981) takes advantage of this ablity by using 2’,3’-dideoxynucleotides as substrates.
When a dideoxy nucleotide is incorporated at the 3’ end of the growing chain, chain elongation is terminated selectively at A,T,Gor C because the chain lacks 3’ hydroxyl groups.
In automated fluorescent sequencing, fluorescent dye labels are incorporated into DNA extension products using 5’-dye labeled primers or 3’-dye labeled dideoxynucleotide triphosphate (Table 2.4; dye terminators). The most appropriate labelling method to use depends on sequencing objectives, the performance characteristic of each method,and on personal preference. PE Applied Biosystem DNA sequencer detects fluorescence from four different dyes that are used to identify the A,G,T and C extension reactions. Each dye emits light at a different wavelength when extited by an argon ion laser. All four colours and therefore all four bases can be detected and distinguished in a single gel lane or capillary injection.
Table 2.4: The four nucleotide bases with the respective acceptor dyes and color emission
Terminator Acceptor Color of raw data on ABI
dye PRISM310
electrophoretogram
A dR6G Green
C dROX Red
G dR110 Blue
T dTAMRA Black
ABI PRISM 310 Genetic Analyzer (Figurs 2.5A and B) which is an automated instrument was used for analyzing fluorescently labeled DNA
152
fragments by capillary eletrophoresis. The sequencing reaction sample tubes are placed in an autosample tray that holds 48 samples. The autosampler successively brings each sample into contact with the cathode electrode and one end of a glass capillary filled with a separation polymer. An anode eletrode at the other end of the capillary is immersed in buffer.
The sample enters the capillary as current flows from the cathode to the anode. The short period of electrophoresis conducted while the capillary and the cathode are immersed in the sample is called electrokinetic injection.
The end of the capillary near the cathode is then placed in buffer. Current is applied again to continue electrophoresis.
When the DNA fragments reach the detector window in the capillary, a laser excites the fluorescent dye labels. Emitted fluorescence from the dye is collected once per second by cooled, charge-coupled device (CCD) camera at particular wavelength bands (visual filters) and stored as digital signals on a Power Macintosh computer for processing. The Sequencing Analysis software interprets the results, calling the bases from the fluorescence intensity at each datapoint. The protocol and reaction conditons for cycle sequencing reaction is given in Tables 2.5 and 2.6.
Purification of Extension Products
The extension products were purified to remove the unincorporated dye terminators before subjecting the samples to capillary electrophoresis.
Excess dye terminators in sequencing reactions obscure data in the early part of the sequence and can interfere with base calling.
Procedure
2 μL of 125 mM EDTA and 2 μL of 3M sodium acetate (pH 4.6) were mixed to the cycle sequenced products followed by the addition of 50 μL of absolute ethanol and incubated at room temperture for 15 minutes followed by Figs 2.5A and B: (A) Region of laser emission that detects the fluorescent signals,
(B) processed sequence of a gene segment
153
Table 2.5: Protocol for cycle sequencing reaction
Components Volume (μL)
Amplified products 2.0
Sequence buffer 2.0
Primer (2pmoles/μL) 2.0
RRMIX 1.0
Water 3.0
The amount of buffer and RRMIX are varied according to the product size.
Table 2.6: PCR conditions for cycle sequencing for 25 cycles
PCR step Temperature (°C) Time
Initial denaturation 96 1 min
Denaturation 96 10 sec
Annealing 50 5 sec
Extension 60 4.00 min
centrifugation at 8000 rpm for 20 minutes to precipitate the amplified product and remove the unutilized ddNTPs, primer (short length molecules), etc. The pellet was washed twice with 75% ethanol followed by air drying. The purified samples are suspended in formamide and subjected for capillary electrophoresis in ABI PRISM 310/3100/Z genetic analyzer. The sequences were then analyzed in Sequence Navigator software (version 1.0.1; ABI Prism 310) or Seq scape manager (version 2.1;
ABI Prism 3100, AVANT Figure 2.6)
Fig. 2.6: Electrophoretogram of OPTN sequence with IVS7+24G>A polymorphism run in ABI prism 3100 Avant Genetic analyser. (A) WT (G/G) (FP), (B) Heterozygous (G/A) (FP), (C) Homozygous (A/A) FP, (D) Homozygous (T/T) RP
154
Genetic Linkage
Linkage analysis is a technique aimed at findingthe estimated chromosomal location of any type of diseasegene. Two loci that are on the same chromosome, are said to be in syntenic. Alleles at the two loci which are close enough either maternal or paternal in origin tend to pass to the same gamete (sperm or egg) and hence are transmitted together to an offspring;
resulting in a cosegregation at the two loci. However, when the chromosomes pair up together at the time of gamete formation (a process known as meiosis), portions of the paternal and maternal chromosomes (in each of the parent) interchange by a process known as crossing over.
Then the alleles received by the offspring at the two loci from one parent are no longer identical to those that occur in one of the parental chromosomes. In fact, they have recombined. The closer the loci are together, the lesser the probability of a recombination, and thus the larger the probability of cosegregation, a phenomenon called genetic linkage.
The fraction of gametes in which recombination is likely to occur between two loci is the recombination fraction, usually denoted as θ. If the two loci are far apart, segregation at one locus is independent of that at the other, and θ = 1/2; all four different types of gametes are produced in equal frequencies. When linkage occurs, 0< θ <1/2, and when gametogenesis the chance for maternal and paternal-type gametes are more frequent than the recombined gametes.
In linkage analysis, two parameters are tested to see if they are linked, locus for genetic marker is the measured one for which the genotype is known with a high degree of certainty, and the other is a disease locus with unknown genotype, for which the genotype is inferred only through the disease or trait phenotype. When genotyping of marker loci at known locations is done we can test each marker for linkage to a disease or trait and approximate the location of the disease or trait to the chromosomal region harboring the linked marker. The further apart two loci are, the higher the probability of recombination between them and this is the measure of genetic distance. Genetic distance is correlated with physical distance, but there is no simple function that relates them because of the varying recombination frequencies across the chromosomes. The unit of recombination is the Morgan, which is defined as the distance in which exactly one crossover is expected to occur. A Morgan is divided into centiMorgans (cM), where 1 Morgan = 100 cM. As recombination is the measure of genetic distance, identifying recombinants and estimating recombination fractions is the only obvious way to construct the genetic map. For this there should be markers to identify double heterozygotes for identifying recombinants. In principle any Mendelian trait can be a marker.
But disease-disease, trait-trait mapping is not possible in humans.
155
Researchers have used different markers for the identification of disease genes starting from blood groups to serum proteins, which are not DNA markers. Initially gene-mapping studies used restriction fragment length polymorphisms (RFLP), which were the first generation DNA markers, but the advent of PCR and the enormous information given by the human genome project facilitated the concept of microsatellite regions to be used extensively as DNA markers. Microsatellite DNA sequences consist of short repeatsof one to five base pairs together with satellites andminisatellites they belong to a large family known as tandemlyrepetitive sequences.
Microsatellites or STRs (shorttandem repeats) comprise of simple mononucleotide to pentanucleotiderepeats, varying from a few tens of bases up to typically onehundred.1 They are spread almost evenly across the entire human genome. Microsatellites are thought to arise by a process that has been multiply referred to as “DNAslippage”, “polymerase slippage”, or “slipped strand mispairing”.In essence, this slippage is thought to occur within the complexof proteins that mediates DNA replication, as a consequenceof mispairing (by one repeat unit or occasionally more) betweenthe original template strand and the newly synthesized DNA strand. Microsatellites are the most widely used and ideal genetic markers for genetic studies in human diseases. They are abundant, highly polymorphic and, more importantly, they arespread across the entire euchromatic part of the genome. As a resultthey have largely replaced minisatellites and representthe markers of choice for many genetic applications. As they are widely used in many different applications that a universallyaccepted system for naming and cataloguing have evolved. The naming of human microsatellite markers are in standard formats—for example, D20S324, where20 is the chromosome on which the marker is located and 324comprises a unique identifier.2 Important information pertaining to eachmarker, like cytogenetic location, heterozygosity, allele frequencies, and assay conditions, primer sequence, product size can be obtained using online public databases.
Some of the major applications which use microsatellite genotyping are linkage analysis mainly used for gene mapping, triplet repeat expansion studies for anticipatory diagnostic tests, loss of heterozygosity, which is a techniqueused to identify the possible locations of tumor suppressorgenes and in gene mapping diagnostics.
Thebasic concept of genetic linkage analysis is relativelysimple, relying fundamentally on two assumptions:
1. At a given locus each parental allele has an equal probabilityof being transmitted to a child as a resultof the random meiotic recombinations.
2. The lesser the genetic distance between two loci the higher thechance that they will be co-inherited.
156
So particular microsatellitemarker alleles, which are inherited preferentially by those family members with the disease, are said to be linked,and hence co-localize with the disease gene. In contrary,the alleles of unlinked markers located far away from the diseasegene locus should be inherited equally between affected andunaffected family members. So one of the most important steps in gene mapping by positional cloning approach is genotyping of marker alleles (at present the description would restrict to microsatellites). A decade ago microsatellite genotyping was done using traditional techniques only. DNA fragments separated in poly-acrylamide havebeen visualized by techniques such as autoradiography or silverstaining. However, for several reasons, such gels had drawbacks and limitations, which have now been resolved by the use of fluorescence technology and robust software fragment calling. In this technology reference size standards and PCR primers usedto amplify the microsatellite sequences are covalently labeledwith different fluorophores. These fluorophores emit fluorescence after absorbing the lightenergy supplied by a laser and the fluorescent signal is captured by CCD camera and interpreted by a computer and displayed as differentcolors in real time.
After automatedfragment sizing, allele calling is a task of relabelling fragments of the same size as the same allele. Thisis called as binning process. The computer software-sizing algorithmgenerates non-integer values, which have to be rounded to an integer value uniformly.3 The technology described above is now standard in many laboratories. But it is predicted that in due course atleast in certain applications it may change, and microsatellites may themselves be outdated as the markersof choice.
There is debate that for linkage analysis single nucleotide polymorphisms may possess certain advantagesover microsatellites, and could ultimately replace them when haplotyped.
Significance of LOD Scores
In 1955, Morton developed a statistical method to calculate the overall likelihood of the pedigree, on the alternative assumptions that the loci are linked (recombination fraction, θ=0) or not linked (recombination fraction, θ=0.5). The ratio of these two likelihood gives the Odds of linkage, and the logarithm of Odds is the LOD score. Morton demonstrated that LOD scores represent the most efficient statistic for evaluating linkage in pedigrees and derived formulae to give the LOD score. LOD scores, thus provides a measure of the likelihoodthat any preferential inheritance seen for a given marker isthe result of genuine linkage as opposed to simple chance.
In linkage analysis, the LOD score (Z) is used as a measure of strength of linkage between loci. The LOD score is the log10 of the odds for linkage, and is calculated in association with a value of the recombination fraction (θ) between loci. It compares the likelihood of the observed data if the loci
157
are linked and separated by a distance of θ, with the likelihood of the data given that the loci are unlinked (θ=0.5).
LOD = Z = log10 [likelihood of data linked at θ/likelihood of data if loci unlinked (θ=0.5)].Likelihood L = θR(1- θ)NR, R= Number of recombinants, NR=Number of non-recombinants. So Z = log10 [θR(1 - θ)NR/ 0.5R(0.5)NR]
Standard LOD score analysis is usually unsuitable for complex/non-Mendelian diseases as they do not have a specific inheritance pattern.
Moreover, standard LOD score analysis called parametric/model based analysis requires a precise genetic model, detailing the mode of inheritance, gene frequencies and penetrance of each genotype. As long as a suitable model is available, parametric linkage provides a wonderfully powerful method for scanning the genome. For Mendelian diseases, specifying an adequate model should be no big problem, whereas nonmendelian conditions/complex diseases, however, are not the same. There are many algorithms like LIPED, LINKAGE, MLINK (LCP), MERLIN, GENEHUNTER, and FASTLINK, etc. for calculating the LOD score.
Genotyping of Microsatellite Markers in ABI Prism 310/3100 AVANT Genetic Analyzer
The primers were designed to amplify the polymorphic microsatellites and labelled with dyes mentioned NED(yellow)/ FAM (blue)/ VIC (green) Figure 2.7. The length of the amplified product determines the separation that are differentiated by the fluorescent labels and product size.
The reaction protocol: 16 ng/μL of DNA (1 in 3 dilution of the 50 ng/μL DNA), 40 nM of dNTP and 15 mM MgCl2, 10x assay buffer and respective primers are used to set gene scan PCR of total reaction volume of 5.0 μL.
The Gene scan PCR protocol and the reaction conditions are given in Tables 2.7 and 2.8.
The amplified gene scan products are then pooled with the molecular sizing marker namely the LIZ500 (orange dye) or ROX400 (red dye) size standards. The initial PCR products were diluted with water (made up to 20 μL). These products are further diluted with 12.0 μL of injection mix (addition of 12.0 μL of Hi Di formamide and 0.5 μL of size standard) and denatured at 100°C and then subjected to capillary electrophoresis. After the electrophoresis the size of the products were analyzed by Gene scan (version 3.1; ABIPrism 310) /Gene mapper (version 3.5; ABI Prism 3100) software (Figure 2.7) and then binned for the linkage analysis.
The LINKAGE programs require two input files namely “pedfile” that contains the pedigree data, and a “datafile” which describes information on the loci, locus order, affection status, etc.
Generation of pre file: The first step in the linkage analysis is the creation of pedigree file using a text editor (word processor) capable of producing
158
Table 2.7: PCR protocol for amplifying microsatellite marker for Genescan analysis
Reagents Volume (μL)
DNTPs 0.5
Tris buffer with 15 mM MgCl-2 0.5
25 mM MgCl2 0.2
Primer 0.2
Taq (5U/μL) 0.1
DNA 1.0
Water 2.5
Table 2.8: Genescan PCR conditions
Temperature Time Cycles
95°C 12 min
94°C 15 sec 10
55°C 15 sec
72°C 30 sec
89°C 15 sec
55°C 15 sec 20
72°C 30 sec
94°C 15 sec
55°C 15 sec 10
72°C 30 sec
89°C 15 sec
55°C 15 sec 20
72°C 30 sec
72°C 10 min
55°C 15 sec 10
72°C 30 sec
89°C 15 sec
55°C 15 sec 20
72°C 30 sec
72°C 10 min
ASCII files. One row of information for each individual that describes the following information per column is entered:
• Pedigree name (or number)
• ID name (or number) of given individual
• ID name (or number) of that individual’s father (0 if father is not in pedigree)
• ID name (or number) of that individual’s mother (0 if mother is not in pedigree; either both or no parents must be given)
• Sex of individual: 1 = male, 2 = female
• Phenotype at locus 1
• Phenotype at locus 2, etc.
159
Fig. 2.7: Electrophoretogram of PCR amplified microsatellite markers with VIC (green) fluorescent tagged primers run in DNA sequencer and analyzed by gene scan software; homozygous genotype for D13S795 of fragmentation size (101.41/
101.41bp) and heterozygous genotype of fragmentation size (142.15/159.77) and (204.82/206.73) for D11S902 and D13S1280 respectively
Dye/Sample Minutes Size Peak Height Peak Area Data Point Peak
B, 1 13.38 89.41 26 80 3649
B, 2 20.50 307.91 49 317 5590
B, 3 20.56 362 2749 5606
G, 1 13.67 97.69 70 322 3728
G, 2 13.74 99.69 202 1130 3747
G, 3 13.78 100.66 55 325 3757
G, 4 13.81 101.41 458 2626 3765
G, 5 13.88 103.22 284 1556 3784
G, 6 15.30 142.15 425 2536 4173
G, 7 15.24 142.98 134 881 4155
G, 8 15.63 155.62 44 258 4261
G, 9 15.69 157.70 125 714 4279
G, 10 15.76 159.77 226 1417 4297
G, 11 17.26 204.82 143 940 4705
G, 12 17.32 206.73 538 3832 4722
G, 13 17.50 212.48 56 325 4773
G, 14 17.54 213.50 38 104 4782
G, 15 17.56 214.29 144 915 4789
160
Table 2.10: Pedigree file *.PED generatedd through the “MAKEPED” command 1410030011212 Ped: 14 Per: 1
1420030020100 Ped: 14 Per: 2 1431204420223 Ped: 14 Per: 3 1441205520223 Ped: 14 Per: 4 1451206620223 Ped: 14 Per: 5 1461207720112 Ped: 14 Per: 6 1471208810223 Ped: 14 Per: 7 1481209910222 Ped: 14 Per: 8 149120101010223 Ped: 14 Per: 9 14101200010222 Ped: 14 Per: 10
Table 2.11: Data file through the “PREPLINK” command 1 21 2 < < AFFECTION, No. of ALLELES
0.999999 0.000001 < < GENE FREQUENCIES 1 < < No. of LIABILITY CLASSES
01.0000 1.0000 < < PENETRANCES
3 3 < < ALLELE NUMBERS, No. of ALLELES 0.333333 0.333333 0.333333 < < GENE FREQUENCIES 0 0 < < SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2) 0.1000 < < RECOMBINATION VALUES
10.10000 0.45000 < < REC VARIED, INCREMENT, FINISHING VALUE
Thus generated file will have the extension *.pre like the example given in Table 2.9.
Table 2.9: Pre file (*.pre) generated in the text document 141001212
142002100 143122223 144122223 145122223 146122112 147121223 148121222 149121223 1410121222
Conversion of pre file to pedigree file with MAKEPED program The information of *. pre-file was then converted into *.ped-file, compatible with the MLINK program which is used to calculate the two-point LOD scores for each marker using the MAKEPED command. This step also emphasis on consanguinity if existing in the pedigree (Table 2.10).
161
DATA file and PREPLINK program:
The datafile (entered as *.dat-file) is generated to give the information on the loci for each individual using the PREPLINK program. The information on total number of alleles, gene frequency depicting the type of inheritance, allele frequencies and recombination fractions needed for MLINK analysis are provided by this command. An example of a pedigree and data file is given in Tables 2.10 and 2.11. The other details that are entered in the preplink program vary according to the pedigree details and are given
The datafile (entered as *.dat-file) is generated to give the information on the loci for each individual using the PREPLINK program. The information on total number of alleles, gene frequency depicting the type of inheritance, allele frequencies and recombination fractions needed for MLINK analysis are provided by this command. An example of a pedigree and data file is given in Tables 2.10 and 2.11. The other details that are entered in the preplink program vary according to the pedigree details and are given