Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST: Individual Portion
Name: ______________________________ Date: _________________ Period: _______________
Access lab manual on class web page in Evolution folder. Open and read the background. There are several questions to answer before you actually get to the procedure.
Human Genome Project
Between 1990 and 2003 scientists working on an international research project known as the Human Genome Project were able to identify and map the 20,000-25,000 genes that define a human being. This was government funded research. What is this branch of government called? ____________________________________________________________________
The Human Genome project also mapped the genomes of other species, including the fruit fly, mouse and Escherichia coli (E.coli). The location and complete sequences of the genes in each of these species are available for anyone in the world to access via the Internet.
Why is this information important? ____________________________________________________________________
_________________________________________________________________________________________________
Bioinformatics
Suppose you identify a single gene that is responsible for a particular disease in fruit flies. Is that same gene found in humans? Does if cause a similar disease? It would take you nearly 10 years (!!!) to read through the entire human genome to try to locate the same sequence of bases as that in fruit flies. This isn’t practical…and what do we know is a technique at our disposal that greatly speeds up human search and computational abilities??
What does the field of Bioinformatics do?
_________________________________________________________________________________________________
_________________________________________________________________________________________________
What is BLAST?
Entire genomes can be quickly compared in order to detect genetic similarities and differences. To do that we use BLAST.
What does BLAST stand for?
________________________________________________________________________ Name 3 types of info you can learn from using BLAST?
1.
2.
3.
Using BLAST, you can input a gene sequence of interest and search entire genomic libraries for identical or similar sequences in a matter of seconds.
Now are you ready to have mind blown?? Take a look at a cladogram made by David Hillis, Derrick Zwickl and Robin Gutell at the University of Texas. (yes, it does look like a slice of a tree) They used small subunit rRNA from 3000 species. This is approximately the square root of the number of species thought to exist on Earth (3000 mapped of the 9 million believed to exist.) This is about 0.18% of the 1.7 million that have been formally described and named.
http://www.zo.utexas.edu/faculty/antisense/DownloadfilesToL.html
Click on the picture and then enlarge the pdf file to zoom in where it says “You are here.” Mind . Blown .
Purpose of this lab
In this lab you will use BLAST to compare several genes, and then use the information to construct a cladogram.
1. Let’s start with drawing a simple cladogram given physical characteristics (the old school way of doing it!)
Major plant groups
Organisms Vascular Tissue Flowers Seeds
Mosses 0 0 0
Pine Trees 1 0 1
Flowering plants 1 1 1
Ferns 1 0 0
Total 3 1 2
2. GAPDH (glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step in glycolysis, an important reaction that produces molecules used in cellular respiration. The following data table shows the percentage similarity of this gene, and the protein it expresses, in humans versus other species. For example, according to the table the GAPDH gene in chimpanzees is 99.6% identical to the gene found in humans, while the protein is identical.
Percentage similarity between the GAPDH Gene and Protein in Humans and Other Species
Species Gene percentage Similarity Protein percentage similarity
Chimpanzee (Pan troglodytes) 99.6% 100%
Dog (Canis familiaris) 91.3% 95.2%
Fruit fly (Drosophilia melanogaster) 72.4% 76.7%
Roundworm (Caenorhabditis elegans) 68.2% 74.3%
a. Why is the percentage similarity in the gene always lower than the percentage similarity in the protein for each of the species? (Hint: recall how a gene is expressed to produces a protein)
_________________________________________________________________________________________
_________________________________________________________________________________________ b. Draw a cladogram depicting the evolutionary relationships among all five species (including humans)
Procedure
A team of scientists has uncovered the fossil specimen in the figure on the left near Liaoning Province, China. They used the fossil to create a depiction of what it might have looked like:
Make some general observations about the morphology (physical structure) of the organism.
Little is known about the fossil. It appears to be a new species. Upon careful examination of the fossil, small amounts of soft tissue have been discovered. Normally, soft tissue does not survive fossilization; however, rare situations of such preservation do occur. Scientists were able to extract DNA nucleotides from the tissue and use the information to sequence several genes. Your task is to use BLAST To analyze these genes and determine the most likely placement of the fossil species the cladogram that follows:
1. Form and initial hypothesis as to where you believe the fossil specimen should be placed on the cladogram based on the morphological observations you made earlier. Draw your hypothesis on Figure 4.
2. Using BLAST: Locate and download gene files. Supplemental Information for Procedure pg S45 Step 2:
Below are the gene files to use for Investigative Lab 3 (gene 1, gene 2, gene 3, gene 4). Note that these “zip” files will not open after you download them.
Gene 1 Gene 2 Gene 3 Gene 4
SO…
1. You must click on each file, and download them by clicking save, and then open.
2. Highlight the file, then click extract all files (top left menu bar),
If the gene 1, gene 2, etc above won’t open for you, then you can always find the files at the link below: scroll way down to find another version of these gene files
http://apcentral.collegeboard.com/apc/members/courses/teachers_corner/218954.html.
Now you are ready for Step 3 pg S45: you should be able to follow the instructions from this point.
Gene 1: ___________________________
1. What species in the BLAST result has the most similar gene sequence to the gene of interest?
2. Where is that species located on your cladogram?
3. How similar is that gene sequence?
4. What species has the next most similar gene sequence to the gene of interest?
Gene 2: ___________________________
1. What species in the BLAST result has the most similar gene sequence to the gene of interest?
2. Where is that species located on your cladogram?
3. How similar is that gene sequence?
4. What species has the next most similar gene sequence to the gene of interest?
Gene 3: ___________________________
1. What species in the BLAST result has the most similar gene sequence to the gene of interest?
2. Where is that species located on your cladogram?
3. How similar is that gene sequence?
4. What species has the next most similar gene sequence to the gene of interest?
Evaluating Results: Revisit your cladogram on page 4. Did you place your new fossil correctly or are there changes you would like to make now? Compare your new placement with others in your group. Discuss in the space below.
Go back to the main page of BLAST http://Blast.ncbi.nlm.nih.gov/Blast.cgi Click on “List all genomic databases” How many genomes are currently available for making comparisons using BLAST? ______ How does this impact the proper analysis of the gene data used in this lab? ________________________________________________________________ What other data could be gathered to help properly identify its evolutionary history?
__________________________________________________________________________________________________
Group Portion: Group members names: ________________________________ Table # _________
Designing your own investigation: FIRST watch this Bozeman video start at 4:48 to get an idea of a program called FASTA (steps are similar but not identical) https://www.youtube.com/watch?v=OSKwuOccAak.
You group will explore another data base called FASTA that is often used in conjunction with BLAST to determine evolutionary relationships. In your group, use the table of suggested genes to explore and assign 1 gene to each group member to investigate. See example procedure on pg S49 Complete steps 1-12.
1. Go to https://www.ncbi.nlm.nih.gov/gene and type in human actin in search bar 2. Click on first link to open. Scroll waaaaaaaaay down to “NCBI Reference Sequences”
3. Under mRNA and protein click on first file name. It will be named NM 001100.3 (or something similar) It’s just a number to make cataloging easier. Click on the abbreviation GSN (or something similar) A new page will open with more detail about the protein. Just below the gene name, click on FASTA.
4. Click on “Run BLAST” on the right side bar.
5. Another page will open with the whole gene sequence Scroll down to “program selection” and choose ” quickBlast”
6. The Click BLAST BE PATIENT the program is using algorythms to search thousands of sequences.
Scroll down to “Descriptions” Look in the Indentical column and find the genus and species name of 2 other primates that have 99% similarity to Homo sapiens.
1. Write down genus and species name ____________________ and the common name _________________
2. Write down genus and species name ____________________ and the common name _________________
As you look at your gene, answer the following questions. When each member is finished, discuss your findings and
compile answers for each bullet point below. Turn in 1 typed copy for the group portion of this grade.