• No results found

THE OPEN READING FRAME AND THE IDENTIFICATION OF GENES

In document Modern Industrial Microbiology (Page 67-70)

Aspects of Molecular Biology and Bioinformatics of Relevance

3.5 THE OPEN READING FRAME AND THE IDENTIFICATION OF GENES

Regions of DNA that encode proteins are first transcribed into messenger RNA and then translated into protein. By examining the DNA sequence alone we can determine the putative sequence of amino acids that will appear in the final protein. In translation codons of three nucleotides determine which amino acid will be added next in the growing protein chain. The start codon is usually AUG, while the stop codons are UAA, UAG, and UGA. The open reading frame (ORF) is that portion of a DNA segment which will putatively code for a protein; it begins with a start codon and ends with a stop codon.

Aspects of Molecular Biology and Bioinformatics of Relevance in Industrial "%

Once a gene has been sequenced it is important to determine the correct open reading frame. Every region of DNA has six possible reading frames, three in each direction because a codon consists of three nucleotides. The reading frame that is used determines which amino acids will be encoded by a gene. Typically only one reading frame is used in translating a gene (in eukaryotes), and this is often the longest open reading frame. Once the open reading frame is known the DNA sequence can be translated into its corresponding amino acid sequence.

For example, the sequence of DNA in Fig. 3.9 can be read in six reading frames. Three in the forward and three in the reverse direction. The three reading frames in the forward direction are shown with the translated amino acids below each DNA sequence. Frame 1 starts with the ‘a’, Frame 2 with the ‘t’ and Frame 3 with the ‘g’. Stop codons are indicated by an ‘*’ in the protein sequence. The longest ORF is in Frame 1.

5' 3'

Genes can be identified in a number of ways, which are discussed below.

i. Using computer programs

As was shown above, the open reading frame (ORF) is deduced from the start and stop codons. In prokaryotic cells which do not have many extrons (intervening non-coding regions of the chromosome), the ORF will in most cases indicate a gene. However it is tedious to manually determine ORF and many computer programs now exist which will scan the base sequences of a genome and identify putative genes. Some of the programs are given in Table 3.2. In scanning a genome or DNA sequence for genes (that is, in searching for functional ORFs), the following are taken into account in the computer programs:

a. usually, functional ORFs are fairly long and are do not usually contain less than 100 amino acids (that is, 300 amino acids);

b. if the types of codons found in the ORF being studied are also found in known functional ORFs, then the ORF being studied is likely to be functional;

c. the ORF is also likely to be functional if its sequences are similar to functional sequences in genomes of other organisms;

d. in prokaryotes, the ribosomal translation does not start at the first possible (earliest 5’) codon. Instead it starts at the codon immediately down stream of the Shine-Dalgardo binding site sequences. The Shine-Shine-Dalgardo sequence is a short sequence of nucleotides upstream of the translational start site that binds to

"& Modern Industrial Microbiology and Biotechnology

Table 3.2 Some Internet tools for the gene discovery in DNA sequence bases (modified from Fickett, (1996).

Category Services Organism(s) Web address

Database search BLAST; search sequence bases Any [email protected] FASTA; search sequence bases Any [email protected]

BLOCKS; search for functional Any [email protected] motifs

Profilescan Any http://ulrec3.unil.ch.

MotifFinder Any [email protected]

Gene FGENEH; integrated gene Human [email protected].

Identification identification edu

ribosomal RNA and thereby brings the ribosome to the initiation codon on the mRNA. The computer program searches for a Shine-Dalgardo sequence and finding it helps to indicate not only which start codon is used, but also that the ORF is likely to be functional.

e. if the ORF is preceded by a typical promoter (if consensus promoter sequences for the given organism are known, check for the presence of a similar upstream region) f. if the ORF has a typical GC content, codon frequency, or oligonucleotide composition of known protein-coding genes from the same organism, then it is likely to be a functional ORF.

ii. Comparison with Existing Genes

Sometimes it may be possible to deduce not only the functionality or not of a gene (i.e. a functional ORF), but also the function of a gene. This can done by comparing an unknown sequence with the sequence of a known gene available in databases such as The Institute for Genomic Research (TIGR) in Maryland.

3.6 METAGENOMICS

Metagenomics is the genomic analysis of the collective genome of an assemblage of organisms or ‘metagenome.’ Metagenomics describes the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample (Fig. 3.10). Other terms have been used to describe the same method, including environmental DNA libraries, zoolibraries, soil DNA libraries, eDNA libraries, recombinant environmental libraries, whole genome treasures, community genome, whole genome shotgun sequencing. The definition applied here excludes studies that use PCR to amplify gene cassettes or random PCR primers to access genes of interest since these methods do not provide genomic information beyond the genes that are amplified.

Aspects of Molecular Biology and Bioinformatics of Relevance in Industrial "'

Many environments have been the focus of metagenomics, including soil, the oral cavity, feces, and aquatic habitats, as well as the hospital metagenome a term intended to encompass the genetic potential of organisms in hospitals that contribute to public health concerns such as antibiotic resistance and nosocomial infections.

Uncultured microorganisms comprise the majority of the planet earth’s biological diversity. In many environments, as many as 99% of the microorganisms cannot be cultured by standard techniques, and the uncultured fraction includes diverse organisms that are only distantly related to the cultured ones. Therefore, culture-independent methods are essential to understand the genetic diversity, population structure, and ecological roles of the majority of microorganisms in a given environmental situation. Metagenomics, or the culture-independent genomic analysis of an assemblage of microorganisms, has potential to answer fundamental questions in microbial ecology. It can also be applied to determining organisms which may be important in a new industrial process still under study. Several markers have been used in metagenomics, including 16S mRNA, and the genes encoding DNA polymerases, because these are highly conserved (i.e., because they remain relatively unchanged in many groups). The marker most commonly used however is the sequence of 16S mRNA.

The procedure in metagonomics is described in Fig. 3.10.

In document Modern Industrial Microbiology (Page 67-70)