• No results found

3. Results Isolation and Characterisation of a Gene Cluster for Ergovaline Biosynthesis

3.1 Isolation and Analysis of the NRPS Gene PS12 and Upstream Sequence

3.1.2 LpsB, a Single-Module NRPS

The predicted translated product, LpsB, is 1352 amino acids in length, a predicted unmodified molecular mass of 148 kD. Analysis of the predicted amino acid sequence by InterproScan revealed the single-module to contain an A-domain, T- domain and a C-domain (Fig. 3.3). A fourth uncharacterised domain of 301 amino acids is present at the amino terminal end of the protein. A query of this region against the Conserved Domain Database (CDD) returned a weakly significant match (E-value 0.00001) to the carboxy-end of C-domains but no conserved C- domain motifs were discernable in this region. Manual analysis showed that all ten sequence motifs conserved among A-domains are present in the predicted LpsB protein, as is the conserved T-domain motif (Table 3.1). Although C domain motifs

Figure 3.3. Domain structure of LpsB. A, A-domain; T, thiolation (PCP)-domain; C, condensation domain. Triangles indicate position of conserved motifs described in Table 3.1.

are typically not as conserved as those for the other domains, the C3, C5 and C6 motifs were detected (Table 3.1).

Table 3.1. Conserved motifs found in LpsB

*From Marahiel et al. (1997). aMay be any of three different potential A4 motifs. nd, not

detected Motif LpsB Consensus* A1 LYSEEL L(T/S)YxEL A2 IKAGGAFVLLDP LKAGxAYL(V/L)P(L/I)D A3 LAEDLVTSIVEVGDDK LAYxxYTSG(S/T)TGxPKG A4a FQFS, FDVT, FDTL FDxS A5 HLYGASE NxYGPTE A6 GELVIEGTIVGRGYL GELxIxGxG(V/L)ARGYL A7 YKSGDL Y(R/K)TGDL A8 GRKDTQVKLRGQRVELGEVE GRxDxQVKIRGXRIELGEIE A9 LPSYMVP LPxYM(I/V)P A10 NRKLLR NGK(V/L)DR T DDHFFQRGGDSL DxFFxxLGG(H/D)S(L/I) C1 nd SxAQxR(L/M)(W/Y)Xl C2 nd RHExLRTxF C3 VHHAVYDGYT MHHxISDG(W/V)S C4 nd YxD(F/Y)AVW C5 TIPTVATIPCR (I/V)GxFVNT(Q/L)(C/A)XR C6 STDAWFE (H/N)QD(Y/V)PFE C7 nd RDxSRNPL

If the LpsB protein were involved in the synthesis of ergovaline, its substrate would be D-lysergic acid. Bioinformatics methods to predict A-domain substrates utilise the crystal structure of the GrsA PheA domain, for which the amino acids lining the binding site are known (Conti et al., 1997). The sequence of the 10 amino acids lining the binding sites of A-domains constitutes a so-called A-domain specificity code (Stachelhaus et al., 1999; Challis et al., 2000). The LpsB A-domain sequence was aligned with that of PheA and the 10 amino acids likely to line the binding pocket were extracted at the NRPS Predictor website. Alignment of these amino acids with those in a database of NRPSs with known substrate was performed and showed that 9 out of 10 of the N. lolii LpsB amino acids matched those found in the C. purpurea LpsB binding pocket (Table 3.2). No other significant matches were identified using this software. As lysergic acid contains an indole ring from tryptophan the binding-pocket amino acids from tryptophan-activating A-domains from characterised bacterial NRPSs CdaI, CdaIII (Hojati et al., 2002) and ComA (Chiu et al., 2001) (Table 3.2) were manually aligned with those of LpsB. This showed a similar level of identity between the LpsB sequence and tryptophan- activating A-domain sequences as between the three tryptophan-domain sequences themselves. Sequence from an alanine-activating A-domain from HC toxin synthetase (Scott-Craig et al., 1992) showed no such similarity (Table 3.2). This may suggest that the indole ring is the moiety of the lysergyl substrate that is recognised by the LpsB enzyme.

Table 3.2. Substrate-specificity determining amino acids

position in A-domain* Protein† 235 236 239 278 299 301 322 330 331 517 Substrate NlLpsB D V F S V G L I M K CpLpsB D V F S V G L V M K LSA CdaI (III) D A W S V G L T T K Trp CdaIII (II) D G W A V A R T T K Trp ComA (II) D V A V V G E V V K Trp HTS1 (II) D A G G C A M V A K Ala

*based on the sequence of the phenylalanine-activatine domain of GrsA (Conti et al., 1997).

Figure 3.4. Multiple sequence alignments with representative bacterial COM domains.

A. Alignment of N. lolii (Nl) and C. purpurea (Cp) LpsB carboxy terminal amino acid sequences with bacterial donor COM domains. The TPSD motif at the junction between bacterial epimerisation and COM domains is underlined. Invariant residues in bacterial domains are marked with an asterisk.

B. Alignment of N. lolii and C. purpurea LpsA amino-terminal sequences with bacterial acceptor COM domains. The PLS motif at the junction between bacterial condensation and COM domains is underlined. An invariant leucine residue in bacterial domains is marked with an asterisk.

Bb, Brevibacillus brevis; Bl, Bacillus licheniformis; Sp, Streptomyces pristinaespiralis.

Accessions: GrsA, CAA33603; GrsB, CAA43838; BacB, AAC06347; BacC, AAC06348; SnbC, CAA72311; SnbDE, CAA72312

3.1.2.1 Analysis of Potential COM Domains in LpsB and LpsA

As the LpsB NRPS is predicted to interact with a second NRPS, LpsA, in order to catalyse formation of lysergyl peptide lactam, the N. lolii and C. purpurea LpsB carboxy terminal sequences were aligned with those of bacterial NRPSs involved in multi-enzyme complexes, which contain short COM domains required for protein- protein interaction (Fig. 3.4). No significant alignment was found with these domains although proline and leucine residues are found in conserved positions between LpsB proteins and bacterial COM domains.

The amino termini of the LpsA protein sequence from N. lolii and the LpsA-1 and LpsA-2 protein sequences from C. purpurea were aligned with acceptor COM domains from bacterial systems (Fig. 3.4). This analysis showed no significant alignment between the three LpsB-interacting proteins and COM domains. Interestingly, the three LpsA homologues also had very little identity with each other.