BLAST Lab – Official Lab!
PART B. Does the human GULO gene produce a functional protein?
We will now use protein BLAST to search for GULO proteins in cows, pigs, humans and chimpanzees.
1. Copy the mouse GULO protein below.
>mouse GULO protein
MVHGYKGVQFQNWAKTYGCSPEMYYQPTSVGEVREVLALARQQNKKVKVVGGGHSPSDIACTDG FMIHMGKMNRVLQVDKEKKQVTVEAGILLTDLHPQLDKHGLALSNLGAVSDVTVGGVIGSGTHNT GIKHGILATQVVALTLMKADGTVLECSESSNADVFQAARVHLGCLGVILTVTLQCVPQFHLLETSFPS TLKEVLDNLDSHLKKSEYFRFLWFPHSENVSIIYQDHTNKEPSSASNWFWDYAIGFYLLEFLLWTSTY LPRLVGWINRFFFWLLFNCKKESSNLSHKIFSYECRFKQHVQDWAIPREKTKEALLELKAMLEAHPK VVAHYPVEVRFTRGDDILLSPCFQRDSCYMNIIMYRPYGKDVPRLDYWLAYETIMKKFGGRPHWAK AHNCTRKDFEKMYPAFHKFCDIREKLDPTGMFLNSYLEKVFY
2. Go to www.ncbi.nlm.nih.gov and select BLAST from the menu at the top of the page (Resources >> DNA & RNA >> BLAST).
3. One the main BLAST page, select “protein BLAST”.
4. Copy the mouse GULO sequence into the box under “Enter Query Sequence”.
5. Scroll down until you see “Organism”.
6. In the “Organism” window, type in “cow”. When you see cow appear in the box, select it (don’t select Cowdria or Cow Parsnips!).
7. Scroll to the bottom and select the “BLAST” button.
8. When BLAST is done with its search, you can scroll down and see a chart of your results. Note your result in the chart below.
9. Now start again and do BLAST searches for pig (Sus scrufa), human (Homo sapiens), cow (Bos taurus), guinea pig (Cavia porcellus), and chimpanzee (Pan troglodytes) GULO genes. Record your data in the chart.
10. Look for proteins with the same name (, L-‐gulonolactone oxidase). If the GULO protein is not present, other, more distantly related proteins may come up. They will have a much lower score and a higher E-‐value. Note that the E-‐value represents the chance that the result is due a random matching of some amino acid sequences from both proteins. An E-‐value of 0 means a statistically perfect match. A good E-‐value should be much lower than e-‐4.
11. Record your data in the chart.
Part C: Select a protein and create your own phylogeny
Step 1. You can watch the pre-‐lab tutorial posted on iTunesU.
But I will also walk you through how to search a database (UniProt) and create a phylogenetic tree from selected protein sequences below.
Step 2. Choose a protein (see suggestions below) that interests you and search for that protein name in UniProt. www.uniprot.org
For example, I chose Keratin and entered it in the search box at the top of the UniProt page. It gave me a long list of keratin related proteins – 10,000+ – and since I don’t want to scroll down 10,000 listings to find one particular keratin protein for 15 different species, I selected one specific keratin protein. (type II, cytoskeletal 1) and re-entered it into the search box.
Hmm, I still have 1000+ listings, and the search function didn’t do too well with “1”. Okay, so I look down the list and notice that Keratin, type II cytoskeletal 75 has multiple species just on the first page, so I do a search with that term. Ah, better luck – only 233 listings.
You should take a minute to look at one of the proteins in detail – just click on the Entry number. Remember, someone in a lab somewhere had to find this information and enter it into the database; this entire database is the culmination of years and years of research and work by thousands of researchers.
Now, you want to have an empty document of your choosing open and ready to copy and paste into (you have be careful of programs removing formatting – if you change that, the program won’t be able to read the data). For each different species of Keratin, type II cytoskeletal 75 , I open the Entry number into a different tab. This brings you to the page for just that protein.
You will see a long page filled with information about this protein (rather amazing, actually) but what you want is at the top. There are some blue tabs (BLAST, Align, Format, Add to Basket, and History), click on Format. In the drop-down menu, select FAFSTA format. You will get a text document with the protein sequence that you can copy and paste into your document (side question: what do the letters in the sequence mean?). You want to copy ALL of the text (the name and species information). You should change the name to something easier like so:
>tr|M7BY28|M7BY28_CHEMY Keratin, type II cytoskeletal 75 OS=Chelonia mydas GN=UY3_05865 PE=3 SV=1
>Green sea turtle
As long as you preserve the “<” at the beginning of the first row, you can delete the rest of the identifying information and put a simple name down.
Step 3. Select a total of 15 different organisms to use in the construction of your phylogeny. Copy and paste these protein sequences into a single document in FASTA format. It is crucial that the sequences are in FASTA format. You don’t need to worry about the spacing in between the sequences, just hit “enter” a few times in between each one.
Again, make sure to change the names since a long name/data will make your phylogenetic tree very hard to read.
Step 4. Go to www.phylogeny.fr
and under “Phylogeny Analysis,” select “one-‐click” analysis to create your tree. Feel free to name your analysis. Paste in all 15 sequences into the window and click “submit.”
Step 5. Generate a pdf of your phylogeny and print it out (and save it to send to me if needed). I only did a few species, but you can see from my quick tree below why you wan to change the names to something simpler than the long list that UniProt gives you:
Step 6. Go to the T-‐coffee website: http://tcoffee.vital-it.ch/apps/tcoffee/do:regular
Paste your FASTA file of sequences into the “sequences to align” window and submit to generate a nice-‐looking alignment of the sequences. Take s screenshot(s) of the alignment and paste this into the bottom of this document as well. Click here to see the results from my 6 organisms above. The tcoffee website isn’t doing anything very fancy, all it is doing is lining up the sequences and using color coding to show what sequences are matching and which are completely different. We could do the very same thing in an Excel sheet, but it would be extremely tedious and take so much time! But just envision typing out each sequence for the same protein in a single line, each species underneath the other. And then trying to determine which one matches and which ones are completely different – a headache inducing endeavor, to be sure. It is nice to have tcoffee to do this for us, but it is crucial to realize that it is just grunt work, it isn’t anything special or magical.
Which brings us back to BLAST. BLAST is doing more than just aligning the sequences, it is also searching for the inputted sequence in a huge library of genomes to find other instances of that sequence. When it does, it is also aligning the sequences to determine how they match up and what has been conserved over time. If you were to do a BLAST search on a nucleotide that is fundamental to all living things (like a mitochondrial protein or replicase, a protein involved with copying DNA), you would get a result of that sequence being present in lifeforms from bacteria to humans, and indeed, this is something that evolutionary scientists use in their research every day.
Step 7. For your lab report, answer the following questions:
-‐What does your phylogeny suggest about conserved core processes and what inferences can you make about the common ancestry of the organisms you selected to analyze?
-‐What did your initial search results suggest about how widely distributed this gene was within and across the domains of life?
-‐What can you tell about the degree of sequence conservation in the alignment that you generated in T-‐coffee? Does this alignment corroborate your tree results? Does anything stand out in the alignment as odd?
-‐What other data, either morphological, genetic, or both, could you add to your analyses that could improve or extend the phylogenetic tree?
-‐Elaborate on what else you learned or felt was most interesting about this lab.
Suggested proteins to explore
Cellulose synthase (plants)
Callose synthase (plants)
Critical Thinking Exercises
1. Why do you think that primates (monkeys, apes and humans) have lost the ability to produce vitamin C? (Hint: think about the diet of early primates)
2. Explain why the GULO gene in humans may be considered vestigial.
3. What can you infer about the GULO BLAST results between humans and chimps?
4. Your new pet food company is designing healthy foods for dogs, pigs, cows, mice and guinea pigs. From your results, to which types of feed will you suggest that the manufacturer add supplemental vitamin C? Justify your suggestion with a conclusive and specific result.