Materials & Methods
2.1 Cloning on a computer
2.1.1
An in tr o d u c tio n to “in silico” c lo n in g
U ntil the late 1990’s, the conventional approach to identifying the orthologues (the sam e genes but in a different species) and paralogues (related genes in the same species) o f a particular gene was by hybridisation screening cD N A libraries or bacterial or yeast artificial chrom osom e (BAG or Y AC respectively) Southern blots to detect colonies or restriction fragm ents containing sequence com plim entary to the gene o f interest. A lternatively, “shotgun” polym erase chain reaction (PGR) experim ents using degenerate oligonucleotide prim ers and low annealing tem peratures have been used to am plify hom ologues or splice variants o f different genes. W hilst successful in identifying a plethora of genes from many species, these techniques are labour intensive, tim e consum ing and success depends upon the researcher optim ising each experim ental variable, particularly when attem pting to clone a gene expressed at low levels or w ith poor homology to the probes/prim ers being used.
The m ore m odern approach to identifying novel genes is by applying the relatively new bioinform atics techniques. Bioinform atics is the science o f developing com puter databases and algorithm s for the purpose o f speeding up and enhancing biological research. An integration of m athem atical, statistical and com puter m ethods are applied to analyse biological, biochem ical and biophysical data. B ioinform aticians who apply their skills in the field of m olecular biology, can use com puter softw are to query the sequence o f a know n gene, cD N A (com plem entary D NA ) fragm ent, protein or polypeptide against the wealth of DNA and protein sequences stored in the databases o f the International N ucleotide Sequence D atabase Gollaboration (com prised o f the DNA D ataB ank o f Japan (DDBJ), the European M olecular Biology Laboratory (EM BL), and G enB ank at the N ational Genter for Biotechnology Inform ation (NGBI)), and predict the sequences o f entire novel genes or proteins from the fragm ents o f D N A or protein show n to have hom ology to the query sequence, all w ithout perform ing a single experim ent in a laboratory. The phrase “m silicd’’’ cloning has been coined to describe this technique.
C hapter 2__________________________________________________M aterials & M ethods
N aturally, conventional m olecular biology techniques are still required to physically clone cD N A s and genes, but in silico cloning facilitates this process by providing a full or partial sequence of a predicted gene that can be used as a tem plate to design prim ers that will specifically amplify the intended target in a PCR reaction. A lternatively, the BAG, YAC, or cD N A library clone in which a sequence o f interest has been determ ined to reside can be ordered from the relevant source for use in subsequent experim ents.
A prim e exam ple of the successful application o f in silico cloning in neuroscience was the cloning o f the previously elusive Cav3 fam ily o f T-type V D CC a i subunits (Perez- Reyes, 1998). In a sim ilar fashion, this investigation utilised in silico techniques to identify a fam ily o f putative human neuronal VDCC y subunits, a group o f proteins that like the T-type a i subunits had escaped being cloned by conventional m olecular biological approaches since the discovery o f the skeletal-m uscle yi subunit m any years before (B osse et a l , 1990; Powers et al., 1993).
2.1.2
In silico c lo n in g of a fam ily of h u m a n VDCC y s u b u n i t s
2.1.2.1
Identification of human ESTs related to the mouse cacng2 by in silico
analysis
The G enB ank and BM BL databases were searched for sequences possessing hom ology to the full-length sequence o f the m ouse cacng2 gene and j2 protein (A F077739) (Letts
et al., 1998) using the BLA STn and tBLASTn alignm ent program s (A ltschul et a i ,
1990). B LA ST (Basic Local A lignm ent Search Tool) is a set o f sim ilarity search program s designed to explore all of the available sequence databases regardless o f w hether the query is protein or DNA. BLASTn directly com pares a nucleotide query sequence against a nucleotide sequence database, w hilst tB LA STn com pares a protein query sequence against a nucleotide sequence database dynam ically translated in all reading fram es (Altschul et a i, 1990). M ultiple hum an expressed sequence tags (ESTs) (Boguski et al., 1993; Soares et a l , 1994) and genom ic sequence from both databases with identities to cacng2 > 40% were clustered to identify overlapping identical sequence belonging to the same transcript using the GCG package (W isconsin Package V ersion 9.0, Genetics Com puter Group), and a program developed by G laxoSm ithK line
C hapter 2__________________________________________________M aterials & M ethods
Research and D evelopm ent Bioinform atics group, term ed ESTB last (Gill et al., 1997). M ultiple alignments of in silico nucleotide sequences or conceptually translated proteins w ere generated using the Clustal W algorithm (O m iga 1.1, O xford M olecular Group, O xford, UK).
2.1.2.2
Identification and analysis of a human genomic sequence related to
CACNG7
The in silico-denvcd 487bp EST cluster subsequently determ ined to be part o f the
C A C N G 7 nucleotide sequence was used as a query sequence using the tBLA STx program (com pares the six-fram e translations of a nucleotide query sequence against the six-fram e translations of a nucleotide sequence database) to identify related hum an genes in the genom ic sequence division o f G enbank. BA Cs identified to contain sequence with hom ology to the query were analysed with the gene prediction program G enscan (Burge and Karlin, 1997) and each predicted gene was subm itted to the tBLA STn program to identify w hich of these genes was related to the CAC N G 7
sequence.
2.1.2.3
identification and analysis of
mouse genomic sequences related to
CACNGSand CACNG7
The nucleotide and protein sequences o f putative hum an js and y? w ere com pared to the high throughput genom ic sequence (HTGS) subset o f G enbank using B LA STn and tB L A STn respectively with default param eters. G enscan identified putative genom ic structure o f the m urine orthologues within the BAG clones in w hich they had been identified. The com plete in silico genes were assem bled from these data.