Stewart F.J., Ulloa O. & DeLong E.F. (2011) Microbial metatranscriptomics in a permanent
marine oxygen minimum zone Environmental Microbiology doi:10.1111/j.1462- 2920.2010.02400.x
Stookey L.L. (1970) Ferrozine – A New Spectrophotometric Reagent for Iron Analytical Chemistry 42(7): 779-781
Sun S., Chen J., Li W., Altintas I., Lin A., Peltier S., Stocks K., Allen E.E., Ellisman M., Grethe J. & Wooley J. (2011) Community cyberinfrastructure for Advanced Microbial Ecology
Research and Analysis: the CAMERA resource Nucleic Acids Research 39: D546-D551
Suzuki S., Ishida T., Kurokawa K. & Akiyama Y. (2012) GHOSTM: A GPU-Accelerated
Homology Search Tool for Metagenomics PLoS ONE 7(5): e36060
Tamaki H., Hanada S., Kamagata Y., Nakamura K., Nomura N., Nakano K. & Matsumura M. (2003) Flavobacterium limicola sp. nov., a psychrophilic, organic-polymer-degrading
bacterium isolated from freshwater sediments International Journal of Systematic and
Evolutionary Microbiology 53: 519-526
Teske A., Brinkhoff T., Muyzer G., Moser D.P., Rethmeier J. & Jannasch H.W. (2000)
Diversity of Thiosulfate-Oxidizing Bacteria from Marine Sediments and Hydrothermal Vents
Appl Environ Microbiol 66(8): 3125-3133
Thamdrup B., Finster K., Fossing H, Hansen J.K. & Jørgensen B.B. (1994) Thiosulfate and
sulfite distributions in porewater of marine sediments related to manganese, iron and sulfur geochemistry Geochimica et Cosmochimica Acta 58: 67-73
Thamdrup B and Dalsgaard T (2002). Production of N2 through Anaerobic Ammonium
Oxidation Coupled to Nitrate Reduction in Marine Sediments. Applied and Environmental
Microbiology 68(3): 1312-1318
Thomas T., Gilbert J. & Meyer F. (2012) Metagenomics – a guide from sampling to data
analysis Microbial Informatics and Experimentation 2: 3
Tringe S.G., von Mering C., Kobayashi A., Salamov A.A., Chen K., Chang H.W., Podar M., Short J.M., Mathur E.J., Detter J.C., Bork P., Hugenholtz P. & Rubin E.M. (2005)
Comparative Metagenomics of Microbial Communities Science 308: 554-557
Tuttle J.H. & Jannasch H.W. (1977) Thiosulfate Stimulation of Microbial Dark Assimilation
of Carbon Dioxide in Shallow Marine Waters Microbial Ecology 4: 9-25
Tyson G.W., Chapman J., Hugenholtz P., Allen E.E., Ram R.J., Richardson P.M.,
Solovyev V.V., Rubin E.M., Rokhsar D.S. & Banfield J.F. (2004) Community structure and
metabolism through reconstruction of microbial genomes from the environment Nature 428: 37- 43
The UniProt Consortium (2011) Ongoing and future developments at the Universal Protein
Resource Nucleic Acids Research 39: D214-D219
Vick T.J., Dodsworth J.A., Costa K.C., Shock E.L and Hedlund B.P. (2009) Microbiology
and geochemistry of Little Hot Creek, a hot spring environment in the Long Valley Caldera.
Geobiology 8(2): 140-154
Von Trappen S., Vandecandelaere I., Mergaert J. & Swings J. (2004) Gillisia limnaea gen.
nov., sp. nov., a new member of the family Flavobacteriaceae isolated from a microbial met in Lake Fryxell, Antarctica International Journal of Systematic and Evolutionary Microbiology 54: 445-448
Von Trappen S., Vandecandelaere I., Mergaert J. & Swings J. (2005) Flavobacterium
fryxellicola sp. nov and Flavobacterium psychrolimnae sp. nov., novel psychrophilic bacteria isolated from microbial mats in Antarctic lakes International Journal of Systematic and
Evolutionary Microbiology 55: 769-772
Wagner C., Mau M., Schlomann M., Heinicke J. & Koch. U. (2007) Characterization of the
bacterial flora in mineral waters in upstreaming fluids of deep igneous rock aquifers. Journal of
Geophysical Research 112: G01003
Wang Q., Garrity G.M., Tiedje J.M. & Cole J.R. (2007) Naive Bayesian classifier for rapid
assignment of rRNA sequences into the new bacterial taxonomy Appl Environ Microbiol 73(16): 5261-5267
Wang J., Vollrath S., Behrends T., Bodelier P.L.E., Muyzer G., Meima-Franke M., Den Oudsten F., Van Cappellen P. & Laanbroek H.J. (2011) Distribution and Diversity of
Gallionella-Like Neutrophilic Iron Oxidizers in a Tidal Freshwater Marsh Appl Environ
Microbiol 77(7): 2337-2344
Walsh D.A., Zaikova E., Howes C.G., Song Y.C., Wright J.J., Tringe S.G., Tortell P.D. & Hallam S.J. (2009) Metagenome of a Versatile Chemolithoautotroph from Expanding Ocean
Dead Zones. Science 326: 578-582
Watanabe K., Watanabe K., Kodama Y., Syutsubo K. & Harayama S. (2000) Molecular
Characterization of Bacterial Populations in Petroleum-Contaminated Groundwater Discharged from Underground Crude Oil Storage Cavities Appl Env Microbiology 66(11): 4803-4809
Wommack K.E., Bhavsar J. & Ravel J. (2008) Metagenomics: Read Length Matters Appl Environ Microbiol 74(5): 1453-1463
Weatherburn M.W. (1967) Phenol-Hypochlorite Reaction for Determination of Ammonia Analytical Chemistry 39(8): 971-974
Williamson M.A. & Rimstidt J.D. (1992) Correlation between structure and thermodynamic
Xie W., Wang F., Guo L., Chen Z., Sievert S.M., Meng J., Huang G., Li Y., Yan Q., Wu S., Wang X., Chen S., He G., Xiao X. & Xu A. (2011) Comparative metagenomics of microbial
communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries.
ISME J 5: 414-426
Yamamoto M., Nakagawa S., Shimamura S., Takai K. & Horikoshi K. (2010) Molecular
characterization of inorganic sulfur-compound metabolism in the deep-sea
epsilonproteobacterium Sulfurovum sp. NBC37-1 Environmental Microbiology 12(5): 1144- 1153
Yamamoto M. & Takai K. (2011) Sulfur metabolisms in epsilon- and gamma-Protebacteria in
deep-sea hydrothermal fields Frontiers in Microbiology 2: Article 192 doi: 10.3389/fmicb.2011.00192
Ziemert N., Podell S., Penn K., Badger J.H., Allen E. & Jensen P.R. (2012) The Natural
Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity PLoS ONE 7(3): e34064
Zhang J.-Z. & Millero F.J. (1993) The products from the oxidation of H2S in seawater
Geochimica et Cosmochimica Acta 57: 1705-1718
Zopfi J., Ferdelman T.G. & Fossing H. (2004) Distribution and fate of sulfur intermediates –
sulfite, tetrathionate, thiosulfate, and elemental sulfur – in marine sediments Sulfur
Biogeochemistry – Past and Present. Geological Society of America Special Paper 379: 97-116 Zumft WG (1997) Cell Biology and Molecular Basis of Denitrification. Microbiology and Molecular Biology Reviews 61(4): 533-616
APPENDIX 1 PYTHON SCRIPT grab_seqs.py
"""Program grab_seqs.py
written by [email protected] version 2.7
12 November 2011
Documentation
Purpose of the program is to enable the user to quickly retrieve specific DNA, RNA or amino acid sequences from a much larger fasta file of sequences. The program reads an input file containing the list of
unique sequence identifiers (seqID) that the user wants to retrieve. It then searches the larger fasta file of sequences for each identifier in turn, and when it
is found, copies the sequence, and its identifier, into a new file. The new file is in fasta format ready to use in BLAST. The larger file is not altered (ie target sequences are copied not removed).
The program can cope with an input file that contains duplicates of
the same sequence identifier (eg if using to grab seqs for genome assembly) and it provides a report (a log file) if a sequence is not found, or if it is found more than once (log file is empty if each sequence identifier is only included once in the input sequence identifier list, and only appears once in the target fasta file).
The output fasta file contains sequences in the same order as the list of sequence identifiers in the input file.
If a seqID is included only once in input sequence identifier file, and the sequence is only
included once in the target fasta file, then there will be no message in the log file. If a sequence is not found at all, there will be a message saying that. If any seqID is included more than once in the input sequence identifier file (ie is included x times) then there will be x messages for that seqID. If target sequence is only found once in the target fasta file then each message will say the sequence is found x times. If the messages say that the sequence was found more than x times (eg y times) then it appears more than once (ie y/x times) in the target fasta file; that should never happen, each sequence identifier should only be attached to a single sequence in the target fasta file. So if y does not equal x, further investigation is needed to tell if the input fasta file has duplicate entries for that sequence or, worse, if the same sequence identifier is attached to two different sequences.
the same directory, and you need to be in this directory in the terminal. Also need to make sure that the directory doesn't already contain files with the name output_seqs.txt or you will end up adding the new seqs to that file, instead of creating new file (so once program has run, be sure to re-name that file with a more descriptive, and unique, name). Similarly need to ensure directory doesn't already contain a file named log_file.txt
Program reports when it has finished in the terminal.
Program is run from the command line with the command:
python grab_seqs.py argument1 argument2
leave a space (no comma) between program name and the arguments (and between arguments)
argument1 is the file name of the file containing the sequence identifiers (including suffix such as .txt) argument2 is the target file of fasta seqs (including suffix such as .txt)
The sequence identifier input file should have each sequence identifier on a separate line,
with no spaces before or after, and no carriage return after the last entry. The sequence identifiers should not have the > sign before them.
IMPORTANT - it makes a huge difference whether the carraige return \r is used (program assumes this) or if the carriage return \n is used (program doesn't work). If the sequence identifiers were in a single column on an Excel spreadsheet and copied to a txt file they should be in the correct format automatically. Can use the script check_seq_ident_input.py to check seq ID input files are in the correct format (it splits seqIDs into a list, using the \r to separate, just like this program does, and then prints the list - so it is possible to see if the list seqs1 has been
created correctly ie one list item = one seqID or not). If the return button was used to create a new line this will leave a \n at the end - so amend grab_seqs.py to recognize \n instead of \r
The target fasta file (from which the sequences are going to be copied) MUST have a carriage return at the end of the file, after the last sequence.
"""
import sys # sys module allows program to read command line arguments arg1 = sys.argv[1]
# imports 1st command line argument (input file of sequence identifiers) and assigns it a variable (string) name
arg2 = sys.argv[2]
# imports 2nd command line argument (target file of fasta sequences) and assigns it a variable name
myfile1 = open(arg1,'r') # opens file for reading myfile2 = open(arg2,'r')
# creates new file with the name output_seqs.txt to which output (ie sequences program has copied) will be written
# if there is already a file with the name output_seqs.txt the sequences will be added to the end of that file
myfile4 = open('log_file.txt','a')
# creates a new file (or opens existing file if it exists) with name log_file.txt s1 = myfile1.read() # program reads file into 1 string, with variable name s1 s2 = myfile2.read()
seqs1 = s1.split('\r')
# creates a list where each sequence identifier is a list entry (function splits fasta string at \r) # As this is a list, if it is printed all entries will be in ''
seqs2 = s2.split('>')
# creates a list where each sequence in fasta string (split at >) is a list entry, in '' with > removed output1 = []
for seqID in seqs1: for seq in seqs2: if seqID in seq:
output1.append(seq) else:
pass
output1.insert(0,'')
output2 = '>'.join(output1) # re-joins list into string, removing '' and replacing > for seqID in seqs1:
a = output2.count(seqID) if a == 0:
myfile4.write(seqID)
myfile4.write(' was not found\n') elif a == 1:
pass elif a >= 1: b = str(a)
myfile4.write(seqID)
myfile4.write(' was copied ') myfile4.write(b)
myfile4.write(' times\n')
# reports on whether each sequence was found once (no message), not at all, or multiple times myfile3.write(output2) # writes output2 into the file 'output_seqs.txt'
myfile1.close() # closes file myfile2.close()
myfile3.close() myfile4.close()
APPENDIX 2 PYTHON SCRIPT check_seq_ident_input.py
"""check_seq_ident_input.py
written by [email protected] version 2.0
12 November 2011
Purpose of the program is the check the format of a file of sequence identifiers
which will be used as an input file for grab_seqs.py (as the first argument for that program), ie the file which contains the sequence identifiers of the sequences which one wants to copy from a large fasta file of sequences (the second argument for grab_seqs.py).
The sequence identifier input file should have each sequence identifier on a separate line, with no spaces before or after, and no carriage return after the last entry.
This program will check that format by printing a list of each sequence identifier. If the input file is formatted correctly, each sequence identifier will be a separate list item, there will be no extra spaces or extra characters and no empty list items (will get an empty list item at the end of the printed list if there is a carriage return after the last sequence identifier in the input file)
To run from the command line, command is:
python check_seq_ident_input.py name_of_seq_ident_input_file.txt
leave a space between the program name and the name of the file to be checked
"""
import sys # sys module allows program to read command line arguments arg1 = sys.argv[1]
# imports 1st command line argument (input file of sequence identifiers) and assigns it a variable (string) name
myfile1 = open(arg1,'r') # opens file for reading
s1 = myfile1.read() # program reads file into 1 string, with variable name s1 seqs1 = s1.split('\r')
# creates a list where each sequence identifier is a list entry (function splits fasta string at carriage return ie '\r')
# As this is a list, if it is printed all entries will be in '' print seqs1