• No results found

Stewart F.J., Ulloa O. & DeLong E.F. (2011) Microbial metatranscriptomics in a permanent

marine oxygen minimum zone Environmental Microbiology doi:10.1111/j.1462- 2920.2010.02400.x

Stookey L.L. (1970) Ferrozine – A New Spectrophotometric Reagent for Iron Analytical Chemistry 42(7): 779-781

Sun S., Chen J., Li W., Altintas I., Lin A., Peltier S., Stocks K., Allen E.E., Ellisman M., Grethe J. & Wooley J. (2011) Community cyberinfrastructure for Advanced Microbial Ecology

Research and Analysis: the CAMERA resource Nucleic Acids Research 39: D546-D551

Suzuki S., Ishida T., Kurokawa K. & Akiyama Y. (2012) GHOSTM: A GPU-Accelerated

Homology Search Tool for Metagenomics PLoS ONE 7(5): e36060

Tamaki H., Hanada S., Kamagata Y., Nakamura K., Nomura N., Nakano K. & Matsumura M. (2003) Flavobacterium limicola sp. nov., a psychrophilic, organic-polymer-degrading

bacterium isolated from freshwater sediments International Journal of Systematic and

Evolutionary Microbiology 53: 519-526

Teske A., Brinkhoff T., Muyzer G., Moser D.P., Rethmeier J. & Jannasch H.W. (2000)

Diversity of Thiosulfate-Oxidizing Bacteria from Marine Sediments and Hydrothermal Vents

Appl Environ Microbiol 66(8): 3125-3133

Thamdrup B., Finster K., Fossing H, Hansen J.K. & Jørgensen B.B. (1994) Thiosulfate and

sulfite distributions in porewater of marine sediments related to manganese, iron and sulfur geochemistry Geochimica et Cosmochimica Acta 58: 67-73

Thamdrup B and Dalsgaard T (2002). Production of N2 through Anaerobic Ammonium

Oxidation Coupled to Nitrate Reduction in Marine Sediments. Applied and Environmental

Microbiology 68(3): 1312-1318

Thomas T., Gilbert J. & Meyer F. (2012) Metagenomics – a guide from sampling to data

analysis Microbial Informatics and Experimentation 2: 3

Tringe S.G., von Mering C., Kobayashi A., Salamov A.A., Chen K., Chang H.W., Podar M., Short J.M., Mathur E.J., Detter J.C., Bork P., Hugenholtz P. & Rubin E.M. (2005)

Comparative Metagenomics of Microbial Communities Science 308: 554-557

Tuttle J.H. & Jannasch H.W. (1977) Thiosulfate Stimulation of Microbial Dark Assimilation

of Carbon Dioxide in Shallow Marine Waters Microbial Ecology 4: 9-25

Tyson G.W., Chapman J., Hugenholtz P., Allen E.E., Ram R.J., Richardson P.M.,

Solovyev V.V., Rubin E.M., Rokhsar D.S. & Banfield J.F. (2004) Community structure and

metabolism through reconstruction of microbial genomes from the environment Nature 428: 37- 43

The UniProt Consortium (2011) Ongoing and future developments at the Universal Protein

Resource Nucleic Acids Research 39: D214-D219

Vick T.J., Dodsworth J.A., Costa K.C., Shock E.L and Hedlund B.P. (2009) Microbiology

and geochemistry of Little Hot Creek, a hot spring environment in the Long Valley Caldera.

Geobiology 8(2): 140-154

Von Trappen S., Vandecandelaere I., Mergaert J. & Swings J. (2004) Gillisia limnaea gen.

nov., sp. nov., a new member of the family Flavobacteriaceae isolated from a microbial met in Lake Fryxell, Antarctica International Journal of Systematic and Evolutionary Microbiology 54: 445-448

Von Trappen S., Vandecandelaere I., Mergaert J. & Swings J. (2005) Flavobacterium

fryxellicola sp. nov and Flavobacterium psychrolimnae sp. nov., novel psychrophilic bacteria isolated from microbial mats in Antarctic lakes International Journal of Systematic and

Evolutionary Microbiology 55: 769-772

Wagner C., Mau M., Schlomann M., Heinicke J. & Koch. U. (2007) Characterization of the

bacterial flora in mineral waters in upstreaming fluids of deep igneous rock aquifers. Journal of

Geophysical Research 112: G01003

Wang Q., Garrity G.M., Tiedje J.M. & Cole J.R. (2007) Naive Bayesian classifier for rapid

assignment of rRNA sequences into the new bacterial taxonomy Appl Environ Microbiol 73(16): 5261-5267

Wang J., Vollrath S., Behrends T., Bodelier P.L.E., Muyzer G., Meima-Franke M., Den Oudsten F., Van Cappellen P. & Laanbroek H.J. (2011) Distribution and Diversity of

Gallionella-Like Neutrophilic Iron Oxidizers in a Tidal Freshwater Marsh Appl Environ

Microbiol 77(7): 2337-2344

Walsh D.A., Zaikova E., Howes C.G., Song Y.C., Wright J.J., Tringe S.G., Tortell P.D. & Hallam S.J. (2009) Metagenome of a Versatile Chemolithoautotroph from Expanding Ocean

Dead Zones. Science 326: 578-582

Watanabe K., Watanabe K., Kodama Y., Syutsubo K. & Harayama S. (2000) Molecular

Characterization of Bacterial Populations in Petroleum-Contaminated Groundwater Discharged from Underground Crude Oil Storage Cavities Appl Env Microbiology 66(11): 4803-4809

Wommack K.E., Bhavsar J. & Ravel J. (2008) Metagenomics: Read Length Matters Appl Environ Microbiol 74(5): 1453-1463

Weatherburn M.W. (1967) Phenol-Hypochlorite Reaction for Determination of Ammonia Analytical Chemistry 39(8): 971-974

Williamson M.A. & Rimstidt J.D. (1992) Correlation between structure and thermodynamic

Xie W., Wang F., Guo L., Chen Z., Sievert S.M., Meng J., Huang G., Li Y., Yan Q., Wu S., Wang X., Chen S., He G., Xiao X. & Xu A. (2011) Comparative metagenomics of microbial

communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries.

ISME J 5: 414-426

Yamamoto M., Nakagawa S., Shimamura S., Takai K. & Horikoshi K. (2010) Molecular

characterization of inorganic sulfur-compound metabolism in the deep-sea

epsilonproteobacterium Sulfurovum sp. NBC37-1 Environmental Microbiology 12(5): 1144- 1153

Yamamoto M. & Takai K. (2011) Sulfur metabolisms in epsilon- and gamma-Protebacteria in

deep-sea hydrothermal fields Frontiers in Microbiology 2: Article 192 doi: 10.3389/fmicb.2011.00192

Ziemert N., Podell S., Penn K., Badger J.H., Allen E. & Jensen P.R. (2012) The Natural

Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity PLoS ONE 7(3): e34064

Zhang J.-Z. & Millero F.J. (1993) The products from the oxidation of H2S in seawater

Geochimica et Cosmochimica Acta 57: 1705-1718

Zopfi J., Ferdelman T.G. & Fossing H. (2004) Distribution and fate of sulfur intermediates –

sulfite, tetrathionate, thiosulfate, and elemental sulfur – in marine sediments Sulfur

Biogeochemistry – Past and Present. Geological Society of America Special Paper 379: 97-116 Zumft WG (1997) Cell Biology and Molecular Basis of Denitrification. Microbiology and Molecular Biology Reviews 61(4): 533-616

APPENDIX 1 PYTHON SCRIPT grab_seqs.py

"""Program grab_seqs.py

written by [email protected] version 2.7

12 November 2011

Documentation

Purpose of the program is to enable the user to quickly retrieve specific DNA, RNA or amino acid sequences from a much larger fasta file of sequences. The program reads an input file containing the list of

unique sequence identifiers (seqID) that the user wants to retrieve. It then searches the larger fasta file of sequences for each identifier in turn, and when it

is found, copies the sequence, and its identifier, into a new file. The new file is in fasta format ready to use in BLAST. The larger file is not altered (ie target sequences are copied not removed).

The program can cope with an input file that contains duplicates of

the same sequence identifier (eg if using to grab seqs for genome assembly) and it provides a report (a log file) if a sequence is not found, or if it is found more than once (log file is empty if each sequence identifier is only included once in the input sequence identifier list, and only appears once in the target fasta file).

The output fasta file contains sequences in the same order as the list of sequence identifiers in the input file.

If a seqID is included only once in input sequence identifier file, and the sequence is only

included once in the target fasta file, then there will be no message in the log file. If a sequence is not found at all, there will be a message saying that. If any seqID is included more than once in the input sequence identifier file (ie is included x times) then there will be x messages for that seqID. If target sequence is only found once in the target fasta file then each message will say the sequence is found x times. If the messages say that the sequence was found more than x times (eg y times) then it appears more than once (ie y/x times) in the target fasta file; that should never happen, each sequence identifier should only be attached to a single sequence in the target fasta file. So if y does not equal x, further investigation is needed to tell if the input fasta file has duplicate entries for that sequence or, worse, if the same sequence identifier is attached to two different sequences.

the same directory, and you need to be in this directory in the terminal. Also need to make sure that the directory doesn't already contain files with the name output_seqs.txt or you will end up adding the new seqs to that file, instead of creating new file (so once program has run, be sure to re-name that file with a more descriptive, and unique, name). Similarly need to ensure directory doesn't already contain a file named log_file.txt

Program reports when it has finished in the terminal.

Program is run from the command line with the command:

python grab_seqs.py argument1 argument2

leave a space (no comma) between program name and the arguments (and between arguments)

argument1 is the file name of the file containing the sequence identifiers (including suffix such as .txt) argument2 is the target file of fasta seqs (including suffix such as .txt)

The sequence identifier input file should have each sequence identifier on a separate line,

with no spaces before or after, and no carriage return after the last entry. The sequence identifiers should not have the > sign before them.

IMPORTANT - it makes a huge difference whether the carraige return \r is used (program assumes this) or if the carriage return \n is used (program doesn't work). If the sequence identifiers were in a single column on an Excel spreadsheet and copied to a txt file they should be in the correct format automatically. Can use the script check_seq_ident_input.py to check seq ID input files are in the correct format (it splits seqIDs into a list, using the \r to separate, just like this program does, and then prints the list - so it is possible to see if the list seqs1 has been

created correctly ie one list item = one seqID or not). If the return button was used to create a new line this will leave a \n at the end - so amend grab_seqs.py to recognize \n instead of \r

The target fasta file (from which the sequences are going to be copied) MUST have a carriage return at the end of the file, after the last sequence.

"""

import sys # sys module allows program to read command line arguments arg1 = sys.argv[1]

# imports 1st command line argument (input file of sequence identifiers) and assigns it a variable (string) name

arg2 = sys.argv[2]

# imports 2nd command line argument (target file of fasta sequences) and assigns it a variable name

myfile1 = open(arg1,'r') # opens file for reading myfile2 = open(arg2,'r')

# creates new file with the name output_seqs.txt to which output (ie sequences program has copied) will be written

# if there is already a file with the name output_seqs.txt the sequences will be added to the end of that file

myfile4 = open('log_file.txt','a')

# creates a new file (or opens existing file if it exists) with name log_file.txt s1 = myfile1.read() # program reads file into 1 string, with variable name s1 s2 = myfile2.read()

seqs1 = s1.split('\r')

# creates a list where each sequence identifier is a list entry (function splits fasta string at \r) # As this is a list, if it is printed all entries will be in ''

seqs2 = s2.split('>')

# creates a list where each sequence in fasta string (split at >) is a list entry, in '' with > removed output1 = []

for seqID in seqs1: for seq in seqs2: if seqID in seq:

output1.append(seq) else:

pass

output1.insert(0,'')

output2 = '>'.join(output1) # re-joins list into string, removing '' and replacing > for seqID in seqs1:

a = output2.count(seqID) if a == 0:

myfile4.write(seqID)

myfile4.write(' was not found\n') elif a == 1:

pass elif a >= 1: b = str(a)

myfile4.write(seqID)

myfile4.write(' was copied ') myfile4.write(b)

myfile4.write(' times\n')

# reports on whether each sequence was found once (no message), not at all, or multiple times myfile3.write(output2) # writes output2 into the file 'output_seqs.txt'

myfile1.close() # closes file myfile2.close()

myfile3.close() myfile4.close()

APPENDIX 2 PYTHON SCRIPT check_seq_ident_input.py

"""check_seq_ident_input.py

written by [email protected] version 2.0

12 November 2011

Purpose of the program is the check the format of a file of sequence identifiers

which will be used as an input file for grab_seqs.py (as the first argument for that program), ie the file which contains the sequence identifiers of the sequences which one wants to copy from a large fasta file of sequences (the second argument for grab_seqs.py).

The sequence identifier input file should have each sequence identifier on a separate line, with no spaces before or after, and no carriage return after the last entry.

This program will check that format by printing a list of each sequence identifier. If the input file is formatted correctly, each sequence identifier will be a separate list item, there will be no extra spaces or extra characters and no empty list items (will get an empty list item at the end of the printed list if there is a carriage return after the last sequence identifier in the input file)

To run from the command line, command is:

python check_seq_ident_input.py name_of_seq_ident_input_file.txt

leave a space between the program name and the name of the file to be checked

"""

import sys # sys module allows program to read command line arguments arg1 = sys.argv[1]

# imports 1st command line argument (input file of sequence identifiers) and assigns it a variable (string) name

myfile1 = open(arg1,'r') # opens file for reading

s1 = myfile1.read() # program reads file into 1 string, with variable name s1 seqs1 = s1.split('\r')

# creates a list where each sequence identifier is a list entry (function splits fasta string at carriage return ie '\r')

# As this is a list, if it is printed all entries will be in '' print seqs1