Bioinformatic analysis 46 - Material and Methods 26

2. Material and Methods 26

2.2. Methods 36

2.2.6. Bioinformatic analysis 46

2.2.6.1.Calculation of genetic linkage

LOD scores and genetic linkage were calculated with the Kosambi functionusing the Joinmap program (van Ooijen and Voorrips, 2001).

2.2.6.2.Analysis of the Oenothera plastid genomes

2.2.6.2.1. Repeat analysis

Applying the programs palindrome and etandem of the EMBOSS suite (Rice et al., 2000) two different types of repeats, palindromes and direct tandems, were analyzed. The minimal cut- off identity between two copies was set to 90% for both repeat types. In case of multiple copies for one tandem, each copy was required to have at least one other member matching this constraint. 16 to 100 bp for palindromic and 10 to 100 bp for tandem repeats, respectively, were investigated for minimal and maximal copy sizes. Gap size between palindromes was restricted to a maximal length of 3 kb. Overlapping repeats with sequence similarity were grouped into one repeat motif. Both, the direct and inverted part of two repeats, had to overlap for palindromes. The longest element is provided as its representative for each repeat motif. Inversion breakpoints were analyzed separately in berteriana Schwemmle the plastomes of the subsection Oenothera.

2.2.6.2.2. Analysis of variable amino acid sites

The degree of conservation of a single amino acid site between 30 reference species covering di- as well as several monocotyledonous plant species was investigated to estimate the impact of single amino acid exchanges to PGI within Oenothera. To exclude variable positions that were computationally deduced from misaligned indel regions, multiple alignments made were also inspected manually. In general, mutations exchanging amino acids with highly different biochemical properties have an increased likelihood to alter or destroy protein function. In contrast, highly conserved sites indicated by alignments of the reference species are less likely to undergo a drastic change. The distribution of biochemical properties of each Oenothera site was compared with the property distribution of the corresponding site in all reference species,

particular site were derived. Statistically significant differences between the two distributions

were tested by a non-parametric Wilcoxon rank sum test (http://www.r-

project.org/index.html). P-values ≤ 0.05 were considered as significant. Sites representing an

Oenothera specific adaptation, i.e. sites that are similar within each data set but dissimilar between both sets, are excluded by the test. To gather additional evidence for selected sites, the Oenothera mutations were checked whether they are located within known functional regions. Protein domains were detected applying HMMER 2.3.1 and the PFAM database, release of July 22, 2005 (Bateman et al., 2004). Only alignments with e-values ≤1e-10_were

considered. Whether the corresponding domain position is highly conserved was checked by

manual inspection of HMM-logos of the PFAM domain (http://www.pfam.org).

Transmembrane domains were deduced by the InterPRO database (http://www.ebi.ac.uk/

interpro) and analyzed with the online DAS server (Cserzo et al., 1997). Since an alignment

of single amino acids is not possible in regions of size variation, variant Oenothera proteins were inspected manually for functional domains.

2.2.6.2.3.Computational prediction of sigma factor and T7 binding sites

To search for polymerase binding sites, multiple alignments of intergenic regions were delimited either by the 5’ neighbouring gene or a maximum size of 600 bp. No attempt was made to differentiate between individual σ-factors, due to overlapping binding specificity of different bacterial polymerase-like (PEP) σ-factors in plastids (Liere and Börner, 2006).

Therefore candidate sites were predicted by a consensus sequence, TYRMNN(N)16-

20WANNWT, a search pattern, which covers a wide range of experimentally reported sites.

The regular expression found was similar to but less specific than the consensus suggested by Homnn and Link (2003) and Kanamaru and Tanaka (2004). Matches to the consensus ATA0- 1N0-1GAA(N)15-23YRT (Silhavy and Maliga, 1998; Kapoor and Sugiura, 1999) were defined

as binding sites of the phage type polymerase (NEP-promoters), representing the NEP type Ib-promoters. The sequences were not investigated for the type Ia and type II NEP-promotors. The consensus of type Ia (YRTa), which can be considered as derivatives of the type Ib consensus, is too low for computational predictions. The type II NEP-promoter element is known just from a single case (Liere and Börner, 2006). Candidate binding sites were positioned within the multiple alignments and edited by manual supervision to correct for misaligned regions, e.g. due to small repeats.

2.2.6.2.4.Prediction of Shine-Dalgarno sequences

To search for candidate Shine-Dalgarno regions sequences 50 bp upstream of the start codon were investigated, using the program free2bind (Starmer et al., 2006) and the 3’ 16S RNA sequence of Oenothera. A minimum free energy of 4.4 kcal and a maximum distance to the start codon of 23 bp was required for the reported matches.

2.2.6.2.5. Calculation of phylogenetic trees

To generate phylogenetic trees, multiple codon-based alignments of the 47 genes variable in the five Oenothera plastomes were used. Including the corresponding sequences of the Lotus japonicus plastome (Kato et al., 2000) as outgroup for tree rooting, the dataset comprises 44,472 aligned characters present in six species. Neighbor-Joining (NJ), Maximum- Likelihood (ML) and Maximum Parsimony (MP) from the PHYLIP package (Felsenstein, 1993) were applied to infer trees. With 1000 random samples each, bootstrap analysis for NJ and ML was performed. In addition, gene specific phylogenetic trees for all variable genes were determined by NJ and ML. Using the consense program of PHYLIP, a species tree was then built from individual gene trees. Trees for non-coding sequences were derived from 76 intergenic regions, which showed nucleotide substitutions between the five Oenothera

plastomes.

2.2.6.2.6. Determination of Ka/Ks-values

Applying the yn00 program of the PAML package (Yang, 1997), synonymous and non- synonymous substitution rates were estimated. Ka and Ks were determined by the Nei- Gojobori method as implemented in yn00 and F3x4 were selected as substitution matrix. From pairwise codon-based alignments, rates for protein-coding genes variable among at least two of the five Oenothera plastomes were estimated. For five different plastomes, there are 10 pairwise combinations for each gene, resulting in a total of 780 rates for all and 470 rates for variable genes. The computation of ω = Ka/Ks was not always applicable (e.g. for Ks = 0). Therefore, ω could be determined for only 215 pairwise combinations. A concatenated alignment of individual protein coding regions was analyzed, to compare average Ka and Ks rates between species.

In document Greiner, Stephan (2008): Oenothera, a unique model to study the role of plastids in speciation. Dissertation, LMU München: Fakultät für Biologie (Page 63-66)