4.3.1 Level of Conservation
The median identity between CSP protein sequences is 34%. The median and the range of identities found within species are displayed on gure 4.1. These levels of identity are well above those observed for OBPs, which are typically around 20% or below (see chapter 3 and [63]).
4.3.2 Phylogeny
Figure 4.3 shows a neighbour joining tree based on all the CSP sequences from the genomes of A. gambiae, A. mellifera, B. mori, D. melanogaster and T. castaneum, for which the full coding sequence could be annotated. Also included were 22 unique CSP sequences from L. migratoria available from GenBank, as representative of the non-eumetabolan Neoptera, and the non-
AmelCSP2 1 −−MASAIKALLIVCALFIYTVT−−−−−−−−−−−−−−−−−−−AETEEGQSGRSRVSDEQLNMALSDQRYLRRQLKCALGEA 59 AmelCSP3 1 −−−−−−MKVSIICLVLMAAIV−−−−−−−−−−−−−−−−−−−LVAARPDESYTSKFDNINVDEILHSDRLLNNYFKCLMDEG 55 AmelCSP4 1 −−−−−−MKTI−LIALVPVCFL−−−−−−−−−−−−−−−−−−−LGEVFSEDKYTTKYDNVDIDVVLNTERLLNAYVNCLLDQG 54 AmelCSP5 1 −−−−−−MKIKILLFFTILALIN−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−VKAQDDISKFLKDRPYVQKQLHCILDRG 44 AmelCSP6 1 −−−−−−MKIY−−−ILLFVLVT−−−−−−−−−−−−−−−−−−−−ITCVIAEDYTTKYDDMDIDRILQNGRILTNYIKCMLDEG 51 DmelOS−D 1 MGQPGFRRAIGHVSLVVALMCTTCFQVEGLPHPPATSPSPMMERMVEQAYDDKFDNVDLDEILNQERLLINYIKCLEGTG 80 DmelPhk2 1 −−−−−−MKMI−−−−LALVVLG−−−−−−−−−−−−−−−−−−−LVLVAAEDKYTTKYDNIDVDEILKSDRLFGNYFKCLVDNG 51 DmelPhk3 1 −−−−−−MKAS−LALVFCVCVG−−−−−−−−−−−−−−−−−−−LAAAAPEKTYTNKYDSVNVDEVLGNNRVLGNYLKCLMDKG 54 DmelCSP4 1 −−−MLLLNKNRVISLVVNFIF−−−−−−−−−−−−−−−−−−−−LIILISS−−SVQADERNINKLLNNQVVVSRQIMCILGKS 55 AgamCSP1 1 −−−−−−MKLF−−−−VVVALAL−−−−−−−−−−−−−−−−−−−VAAVAAQDKYTSKYDNINVDEILKSDRLFGNYYKCLLDQG 51 AgamCSP2 1 −−−−−−MKLF−−−−VAIAFAL−−−−−−−−−−−−−−−−−−−LALAAAQEQYTTKYDGIDLDEILKSDRLFNNYFKCLMDEG 51 AgamCSP3 1 −−−−−−MKFF−−−−VVVALAL−−−−−−−−−−−−−−−−−−−VAAVAAQDKYTTKYDGVDLDEILKSDRLFNNYYKCLMDTG 51 AgamCSP4 1 −−−−−−MERF−LLLLLFVAIV−−−−−−−−−−−−−−−−−−−LGETANET−YVTKYDNIDLEEIFSSKRLMDNYMNCLKNVG 53 AgamCSP5 1 −−−−−−MRKVWLLASVVLAFLDFVK−−−−−−−−−−−−−−−−SQEVARTLYSTRYDNLDIDTILASNRLVTNYVDCLLSRK 58 AgamCSP6 1 −−−−−−MKHLTMVAIFAMVVV−−−−−−−−−−−−−−−−−−−LASA−−−QKYTDKFDNIDVDRVLSNDRILNNYLKCLLDKG 52 AgamCSP7 1 MSSKALPNLFMLSAAVIVVMA−−−−−−−−−−−−−−−−−−−−ALVIVGPQPAAANDSQNINRLLNNQVIVSRQIMCVLEKS 60 AmelCSP1 54 SCLTPDSVFFKSHITEAFQTQCKKCTEIQKQNLDKLAEWFTTNEPEKWNHFVEIMIKKKDEGA−−−−−−−−−−−−−−−−− 116 AmelCSP2 60 PCD−PVGRRLKSLAPLVLRGACPQCSPEETRQIKKVLSHIQRTYPKEWSKIVQQYAGVS−−−−−−−−−−−−−−−−−−−−− 117 AmelCSP3 56 RCT−AEGNELKRVLPDALATDCKKCTDKQREVIKKVIKFLVENKPELWDSLANKYDPDKKYRVKFEEEA−−−−−−KKLGI 128 AmelCSP4 55 PCT−PDAAELKRNLPDALENECSPCSEKQKKIADKVVQFLIDNKPEIWVLLEAKYDPTGAYKQHYLQN−−−−−RVKEESY 128 AmelCSP5 45 HCD−VIGKKIKELLPEVLNNHCNRCTSRQIGIANTLIPFMQQNYPYEWQLILRRYKIMKYY−−−−−−−−−−−−−−−−−−− 104 AmelCSP6 52 PCT−NEGRELKKILPDALSTGCNKCNEKQKHTANKVVNYLKTKRPKDWERLSAKYDSTGEYKKRYEHGL−−−−−−QFAKN 124 DmelOS−D 81 PCT−PDAKMLKEILPDAIQTDCTKCTEKQRYGAEKVTRHLIDNRPTDWERLEKIYDPEGTYRIKYQEMK−−−−−−SKANE 153 DmelPhk2 52 KCT−PEGRELKKSLPDALKTECSKCSEKQRQNTDKVIRYIIENKPEEWKQLQAKYDPDEIYIKRYRATA−−−−−−EASGI 124 DmelPhk3 55 PCT−AEGRELKRLLPDALHSDCSKCTEVQRKNSQKVINYLRANKAGEWKLLLNKYDPQGIYRAKHEGH−−−−−−−−−−−− 121 DmelCSP4 56 ECD−QLGLQLKAALPEVITRKCRNCSPQQAQKAQKLTTFLQTRYPDVWAMLLRKYDSA−−−−−−−−−−−−−−−−−−−−−− 112 AgamCSP1 52 RCT−PDGNELKRILPDALQTNCEKCSEKQRDGAIKVINYLIQNRKDQWDVLQKKFDPENKYLEKYRGQA−−−−−−QKEGI 124 AgamCSP2 52 RCT−PDGNELKKILPEALQTNCEKCSEKQRSGAIKVINYVIENRKEQWDALQKKYDPENLYVEKYREEA−−−−−−KKEGI 124 AgamCSP3 52 RCT−PDGNELKRILPDALKTDCAKCSEKQKSGTEKVINYLIDNRKDQWENLQKKYDPENIYVNKYREDA−−−−−−KKKGI 124 AgamCSP4 54 PCT−PDGRELKDNLPDALMSDCVKCSEKQRIGSDKVIKFIVANRPDDFAILEQLYDPTGEYRRKYMQSDALAEHVKQEDR 132 AgamCSP5 59 PCP−PEGKDLKRILPEALRTKCARCSPIQKENALKIITRLYYDYPDQYRALRERWDPSGEYHRRFEEYLRGLQFNQIGGS 137 AgamCSP6 53 PCT−QEGRELKKTLPDALKTNCEKCSEKQRTSSRKVIAHLEERKPQEWKKLLDKYDPEGIYKSKFEKIN−−−−−−KRS−− 123 AgamCSP7 61 PCD−QLGRQLKAALPEVIQRNCRNCSPQQAQNAQKLTNFLQTRYPEVWAMLIRKYGAV−−−−−−−−−−−−−−−−−−−−−− 117
Figure 4.2: Alignment of the predicted polypeptides encoding CSPs in A. mellifera, D. melanogaster and A gambiae. Conserved residues are high- lighted, and the signal peptides are in yellow boxes. The intron position is indicated by the red separator.
insect sequences that were assembled from the traces of D. pulex (Arthro- poda, Mandibulata, Crustacea) and I. scapularis (Arthropoda, Chelicerata). 22 unique CSP sequences from L. migratoria available from GenBank were included as representative of the non-eumetabolan Neoptera. The non- insect sequences, assembled from the traces of D. pulex (Arthropoda, Mandibu- lata, Crustacea) and I. scapularis (Arthropoda, Chelicerata), were also used.
Overall, the phylogenic information content of these sequences is rather poor: a likelihood mapping analysis conducted with the tree-puzzle software shows that more than a third of 10,000 random quartets are unresolved or only partially resolved. As a result, a number of clades in this tree have little bootstrap support. This phylogeny, however, is still informative, and the global structure of the tree oers a novel view on the evolution of the CSP gene family.
The majority of CSPs share a common motif at their N termini (YT- TKYDN[VI][ND][LV]DEIL). CSPs containing this motif are found in all the insects considered here and in the tick I. scapularis, suggesting an older origin than the split between Chelicerata and Mandibulata.
A group of CSPs has clearly departed from this pattern, and forms a monophyletic clade well supported by bootstrap. This clade contains CSPs from most of the insects under study, and from the crustacean D. pulex. This group is characterised a truncated C-terminus, is missing what would normally be the sixth alpha helix in the 3-D structure, and has a glutamine residue that is conserved at the third position before the rst conserved
AmelCSP5 TcasCSP8 DmelCSP4 AgamCSP7 AmelCSP2 TcasCSP1 BmorCSP16 DpulCSP3 DpulCSP1 DpulCSP2 TcasCSP9 LmigCSP21 LmigCSP22 LmigCSP20 LmigCSP18 LmigCSP19 LmigCSP17 TcasCSP6 BmorCSP9 AgamCSP5 BmorCSP7 BmorCSP15 BmCSP11 BmorCSP8 BmorCSP6 BmorCSP14 BmorCSP13 BmorCSP1 BmorCSP5 BmorCSP3 BmorCSP4 BmorCSP2 AmelCSP1 TcasCSP14 TcasCSP16 TcasCSP15 TcasCSP17 TcasCSP13 TcasCSP18 TcasCSP2 LmigCSP10 LmigCSP11 LmigCSP7 LmigCSP4 LmigCSP2 LmigCSP9 LmigCSP6 LmigCSP8 LmigCSP5 LmigCSP3 LmigCSP12 LmigCSP16 TcasCSP20 TcasCSP19 BmorCSP10 DmelOS−D AgamCSP4 LmigCSP1 AmelCSP4 BmorCSP12 TcasCSP5 LmigCSP13 TcasCSP12 TcasCSP10 TcasCSP11 TcasCSP04 DmelPhk3 AgamCSP6 AmelCSP3 DmelPhk2 IscaCSP01 AgamCSP2 AgamCSP1 AgamCSP3 LmigCSP15 LmigCSP14 TcasCSP7 AmelCSP6 A. gambiae T.castaneum L. migratoria T. castaneum B. mori L. migratoria Divergent CSPs B. mori
Figure 4.3: Phylogeny of the CSP protein families in Arthropods. An un- rooted tree was constructed with aligned protein sequences from A. mellifera, T. castaneum, D. melanogaster, A. gambiae, L. migratoria, D. pulex and I. scapularis using neighbour-joining. The red circles indicate nodes with more than 70% bootstrap support. The clade of orthologous divergent CSPs is highlighted in yellow, and paralogous expansions are highlighted in blue.
cysteine residue (QxxC), a position normally occupied by a highly conserved tyrosine (YxxC).
Most of the other clades supported by bootstrap are lineage specic expan- sions. Such expansions have occurred in L. migratoria, T. castaneum and, to a lesser extent, in A. gambiae. In these insects, genes grouped in such clades are typically found adjacent in the genome, suggesting an evolution by tan- dem duplications. Interestingly, honey bee genes organised in tandem and D. melanogaster CSPs from the same cytological band do not show a close phylogenetic relation, and may be the product of very ancient duplications.
4.3.3 Tests for Positive Selection
The predicted binding sites of vertebrate olfactory receptors have a higher amino acid substitution rate than the rest of the protein [124]. It has also been shown that some of the genes involved in chemosensation are under positive selection pressure [187]. These include gustatory receptors in ver- tebrates [151, 187], olfactory receptors in nematodes [161] and OBPs in the honey bee (see section 3.3.2). In summary, the binding site of these proteins implicated in chemosensation seems to evolve faster as the result of positive selection pressure.
Thus, the question of whether positive selection has been shaping the se- quences of CSP genes, was assessed. Two datasets were used, namely the large expansion specic to L. migratoria and that of T. castaneum identied in the previous section. Positive selection pressure on these sequences was
not detected, thus it appears that these genes are evolving neutrally.
4.3.4 Evolution of the Binding Pocket
The rate of amino acid substitution in the binding pocket (illustrated in gure 4.4) was compared to the rest of the protein using the alignment of all the CSPs described in the previous section. It was found that the diver- gence of the positions of the core proteins is signicantly higher than that of the binding pocket (P=0.002, Kruskal-Wallis test). A similarly signi- cant dierence is also observed when sequences of an individual species are considered.