Considering ours and previous biophysical binding data, it is probable that FOXM1 exhibits some degree of DNA binding to the consensus FKH motif. Indeed, we observe that while FOXM1/FOXM1-GFP ChIP-seq re- veals a majority of peaks at non-FKH motifs, the FKH motif is still apparent present in a small but significantly enriched set (14 %, P = 10 −144 ). Future experiments such as genetic deletion of discrete motifs by CRISPR would be needed to unambiguously establish which FKH or other motifs FOXM1 directly associates. Nonetheless, our data support the previously proposed model 1 as well as that suggested by model 3, namely that FOXM1 binding in chromatin operates through a mechanism dependent on a functional DBD assisted by local protein- protein recruitment regardless of sequence content. In support of this, mutant FOXM1 is unable to induce trans- activation of known FOXM1 target genes, all of which lack a canonical FKH consensus within the FOXM1 bind- ing site. It is also notable that binding studies with WT FOXM1 confirmed that the protein additionally binds non-consensussequences, further supporting an assisted model of FOXM1 binding whereby protein recruitment stabilizes the association.
There are many potential uses for PCP-consensussequences in virology, for example in classifying strains, identifying functional alterations , and in designing novel, multivalent antigens for vaccines and diagnostics. Here we will show some applications based on data stored in our Flavitrack database (http://carnot.utmb. edu/flavitrack), which is a compendium of annotated Flavivirus sequences [9,10]. Flaviviruses (FV), which include yellow fever (YFV), DENV, and West Nile viruses (WNV), are important human and animal pathogens which typically require insect vectors to infect mamma- lian hosts [30-35]. While mosquito control can be effec- tive, antiviral agents and wide-spectrum vaccines are being sought to protect those in endemic areas [36-43]. To design effective vaccines, the areas of the viral pro- teins required for virus function or infectivity should be targeted by antibodies. Flaviviruses are variable, with many sequence variants found even in single virus iso- lates from the same patient, so-called “ quasispecies ” . However, when catalogued, the strains appear redundant
quence(s) outside of the intergenic consensus sequence affects coronavirus transcription. An MHV mutant virus, MHV-S no. 8, which was isolated from cells persistently infected with MHV-S, possesses the 3 9 half of the genomic leader sequence inserted in the 5 9 region of gene 7 (44). This insertion results in the presence of two consensussequences separated by 0.1 kb within gene 7. Interestingly, the amount of ‘‘larger’’ mRNA 7, which is synthesized from the upstream consensus sequence, is only 5% of that of the ‘‘smaller’’ mRNA 7, which is synthesized from the downstream consensus sequence (44). Perhaps, in MHV-S no. 8, the insertion of the 3 9 half of the genomic leader sequence into gene 7 inhibits transcription of the larger mRNA 7 (44). Hofmann et al. (9) demonstrated that in bovine coronavirus, a subgenomic mRNA is not synthesized from the predicted intergenic consensus sequence; instead, a sub- genomic mRNA is synthesized from another sequence, located 15 nt downstream of the predicted intergenic consensus se- quence. Their data led us to speculate that subgenomic mRNA transcription from the upstream consensus sequence is inhib- ited by the presence of a downstream cryptic transcription consensus sequence. From these studies of MHV-S no. 8 and bovine coronavirus, we hypothesized that two coronavirus intergenic consensussequences that are located in close prox- imity may interact in such a way that the presence of a downstream consensus sequence may inhibit transcription of subgenomic mRNA from an upstream consensus sequence. We examined this possibility and present new aspects of coronavirus transcription regulation.
ERIC-PCR was able to differentiate strains of M. tuberculo- sis that infected the same patient (SS70, SS71, and SS73), whereas IS6110 fingerprinting was not able to discriminate among these strains. These strains were also confirmed to be different by PCR-GTG and PCR-ribotyping methods (8, 13, 19). This example suggests how valuable the use of different methods of following M. tuberculosis infections can be. The results obtained indicate that the ERIC marker could be used as an independent marker, since the relationship between the profiles obtained by ERIC-PCR and those obtained by IS6110 fingerprinting were not always evident. In our experience the best way of differentiating M. tuberculosis strains is to use different markers such as IS6110, GTG, and ERIC sequences to increase the accuracy of epidemiological studies and the
potential as a therapeutic target. Another study reported that the upregulation of METTL3 caused an increase in miR-25-3p and that miR-25-3p could promote the ex- pression of its target protein PHLPP2 to promote pan- creatic ductal adenocarcinoma occurrence . Similar to DGCR8, the protein PHLPP2 also shows potential for use in targeted therapy. Targeting PHLPP2 specifically could reduce its expression, thereby inhibiting the oc- currence and proliferation of tumor cells in pancreatic ductal adenocarcinoma. In addition, it was speculated that noncoding RNAs could be used as breakthrough points in targeted therapy. Targeting the consensus se- quence RRACH might block the binding of m6A to non- coding RNAs. The m6A modification of MALAT1 could regulate gene expression. Studies have confirmed that MALAT1 is upregulated in various tumor tissues, such as NSCLC, breast cancer, cervical cancer, and bladder cancer, and is closely associated with the occurrence, de- velopment, and metastasis of tumors . A recent study also revealed that altering the modification levels of m6A in lncRNA 1281 could significantly affect ESC differentiation . Therefore, noncoding RNAs have the potential to become new therapeutic targets. By acting specifically on the consensussequences of noncoding RNAs, the levels of m6A modification transcripts are decreased, which affects the expression of downstream genes, thereby regulating the biological functions of tumor cells. Thus, as a potential target, noncoding RNAs can provide new possibilities for clinical treatment through their associations with m6A modifications. However, the specific mechanism of noncoding RNAs for use in tar- geted therapy needs to be further confirmed.
senting the amino acids valine, leucine, arginine and lysine, respectively; Bardwell et al.  define a consensus MAPK-binding site sequence of (R/K) 2 X (2–6) (L/I)X(L/I), with I representing the amino acid isoleucine; and Korn- feld and colleagues  reported two consensussequences for the DEJL domain: (K/R)X(X/K/R)(K/R)X (1–4) (L/I)X(L/ I) and (K/R)(K/R)(K/R)X (1–5) (L/I)X(L/I). In the present studies we use the term D-domain and the consensussequences reported by Kornfeld and colleagues . Sharrocks and colleagues  report that D-domains are characterized by a cluster of basic residues positioned amino-terminal to an (L/I)X(L/I) motif followed by a tri- plet of hydrophobic amino acids that precedes a series of proline residues [17,21]. These investigators assessed the role of each of these regions in the binding of ERK2 and p38 to transcription factors, MEF2A, SAP-1, and Elk-1. They determined that mutation of the basic region of the transcription factors reduced their phosphorylation by both phospho-ERK2 and phospho-p38 . This suggests that the basic residues are important for both ERK2 and p38 targeting of MAPK substrates. Mutation of the (L/ I)X(L/I) motif (also called the LXL motif) diminished phosphorylation of phospho-ERK substrates, whereas it is not required for phosphorylation of substrates by the MAPK, phospho-p38 . It was also determined that the hydrophobic patch plays an important role in phosphor- ylation of the substrates by both phospho-ERK and phos- pho-p38; however, this patch is more important for p38 binding than ERK2 binding. Barsyte-Lovejoy et al.  concluded that the proline residues were not important in specificity determination of MAPK substrates. Therefore, the authors hypothesize that the proline residues may play a structural role within the motif.
consensussequences showed that Ptep_Hel2Ca was evo- lutionarily different from other Hel1 elements, and in- sects of the same order were not clustered together. The incongruence of Hel1 elements and host phylogeny as well as the patchy distribution and high sequence simi- larity of Hel1 elements among distantly related lineages suggest the recurrence of HT and that multiple mecha- nisms may underlie the horizontal spread of Hel1. Not- ably, Lepidopteran Prap_Hel1Aa and Hymenopteran Cves_Hel1Aa, Dipteran Cvic_Hel1Ca and Lepidopteran Bmor_Hel1Ca, Hemipteran Hvit_Hel1Ga and Lepidop- teran Pgla_Hel1Ga were clustered into distinct clades, which diverged 325, 272 and 358 million years ago, re- spectively (http://www.timetree.org/)  (Fig. 7). Fur- thermore, several paralogous and orthologous empty sites were also detected in these insect genomes (Add- itional file 1: Figure S11). It is also noteworthy that the genetic distance between species of the same cluster was less than 0.1, indicating that these elements have spread horizontally among these species within a relatively nar- row timeframe.
Abstract: We reply to two recently published, multi-authored opinion papers by opponents of sequence- based nomenclature, namely Zamora et al. (%0# 9: 167–175,2018) and Thines et al. (%0# 9: 177–183, 2018). While we agree with some of the principal arguments brought forward by these authors, we address misconceptions and demonstrate that some of the presumed evidence presented in these papers has been wrongly interpreted. We disagree that allowing sequences as types would fundamentally alter the nature of types, since a similar nature of abstracted features as type is already allowed in the Code (Art. 40.5), namely an illustration. We also disagree that there is a high risk of introducing artifactual taxa, as +" X errors. Contrary to apparently widespread misconceptions, sequence-based nomenclature cannot be based on similarity-derived OTUs and their consensussequences, but must be derived from rigorous, multiple alignment-based phylogenetic methods and quantitative, single-marker species recognition algorithms, using original sequence reads; it is therefore identical in its approach to single-marker studies based on physical types, an approach allowed by the Code. We recognize the limitations of the ITS as a single fungal barcoding marker, but point out that these result in a conservative approach, with “false negatives” surpassing “false positives”; a desirable feature of sequence-based nomenclature. Sequence-based nomenclature does not aim at accurately resolving species, but at naming sequences that represent unknown fungal lineages so that these can serve as a means of communication, so ending the untenable situation of an exponentially growing obtained by a reference library of named sequences spanning the full array of fungal diversity. Finally, we elaborate provisions in addition to our original proposal to amend the Code that would take care of the issues brought forward by opponents to this approach. In particular, taking up the idea of the Candidatus status of invalid, provisional names in prokaryote nomenclature, we propose a compromise that would allow valid publication of voucherless, sequence-based names in a consistent manner, but with the obligate designation as “nom. seq.” (nomen sequentiae). Such names would not have priority over specimen- or culture-based committee of the ##'!0 following evaluation based on strict quality control of the underlying studies based on established rules or recommendations.
A phylogenetic reconstruction of the full set of ERV9_XII sequences (106 insertions) by the NJ method, after excluding CpG positions, is shown in Fig. 3. It can be seen that insertions belonging to the groups previously established according to shared diagnostic differences tended to lie close together in the tree, thus lending further support to our classification. A basal cluster, including insertions from the oldest groups A to C, lying close to the ERV9_XI outgroup, was clearly separated from the other members of the subfamily. The topology of this tree, however, bears several differences with the tree of the consensussequences depicted in Fig. 2, mainly affecting groups J, M1, and M2, which appear now to be more closely related to the lineage leading to G than to L. This is due to the loss of several phylogenetically informative sites associated to the CpG positions that we removed prior to this analysis (shown by underlining in Fig. 1), precisely to reduce noise in the group- ings of insertions and to avoid overestimating the ages of some particular groups. The importance of these sites for correctly inferring the phylogeny of ERV9_XII groups is manifest by the results of our analyses of maximum parsimony. If all nucleotide sites are included in the analysis, only 3 equally most parsimo- nious trees are obtained (Fig. 2), whereas this number in- creases to 681 if CpG positions are removed from the align- ment (average of 10 MP tree searches; data not shown).
In this section, we present our algorithm for prediction of PSSM matrices based on their catalytic domains. The idea is that those catalytic domains in different kinases which have similar SDRs tend to have similar patterns in the phosphosite regions. To quantify the similarity of catalytic domains of kinases we perform multiple sequence alignment (MSA) of catalytic domains using ClustalW algorithm . The result of the MSA is not quite accurate as it has many gaps, therefore, the align- ments were manually modified. We perform this align- ment on 488 catalytic domains of the typical protein kinases. The length of each kinase catalytic domain after MSA is 247. For 224 domains in the alignment we com- pute consensussequences using 6, 515 confirmed kinase – phosphosite pairs. Figure 1 represents portions of the catalytic domain after MSA of some of the best characterized kinases for which the most phosphosites have been identified. To generate the consensus sequence of each kinase, profile matrix of each kinase is computed using the confirmed phosphosite regions of each kinase. For each position in the consensus sequence the amino acids with the maximum probability in that position is selected. If the probability is bigger than 15% then a capital letter is used to represent that amino acid, if it is less than 15% and bigger than 8%, a small letter is used, and if it is less than 8%, symbol ‘x’ is used in that position of the consensus sequence. ‘ x ’ here is a ” don ’ t care ” letter and it means that any amino acid can appear in that position of the phosphosite region of a kinase. Therefore, those kinases that have more ‘ x ’ in their consensus sequence are more general and can phosphorylate more sites than the others. In Figure 2 consensussequences of some of the well stu- died kinases are presented.
LCMV shows higher sensitivity to FU mutagenesis than foot-and-mouth disease virus (FMDV) in cell culture when the two viruses are subjected to similar FU doses (41, 65, 67, 72, 79). The basis for this difference is not known, but it may relate either to the size of the essential polymerase gene, which occupies 62.6% and 17.4% of the total genetic information of LCMV and FMDV, respectively; to a larger number of essen- tial genomic sites susceptible to FU-induced mutations in LCMV than in FMDV; or to other replicative differences be- tween the two viruses still to be elucidated (different affinities of the two polymerases for FU-triphosphate, etc.). The com- parison of complete genomic LCMV sequences reported here has indicated that neither an abrupt decrease in infectivity associated with FU treatment nor the ensuing recovery of the capacity to produce infectious progeny upon replication in the absence of mutagen modified the consensus nucleotide se- quence of the virus. Likewise, no variations were detected among LCMV populations subjected to one or several rounds of replication in the absence or presence of FU, or among populations that underwent alternations between the two pas- sage regimens. In these cases the consensussequences ana- lyzed involved three genomic regions, one of which was a part of the polymerase (L) that included premotif A and motifs A to E (11, 24, 52, 70, 86, 87). The possibility that FU, in addition to its mutagenic activity documented by increases in mutant- spectrum complexity, could exert some mutagenesis-indepen- dent, inhibitory activity (i.e., directly on the viral polymerase or indirectly through alteration of nucleotide pools, etc.) on LCMV replication cannot be excluded. This would mean that FU could contribute to decreases in viral load, thereby favor-
xi Figure 32 Bayesian phylogeny of consensussequences of R3 ‘ID’ motif of the PAP genes. The R3 ‘ID’ motif is responsible for MYB-bHLH protein-protein interaction (Zimmerman et al., 2004) and is therefore highly conserved, likely providing an accurate phylogeny of MYB genes. AtMYB82, the most closely related Arabidopsis thaliana MYB gene to the PAP genes, was used as an outgroup. The gene sequence was taken from www.arabidopsis.org. ...................................................................... 75 Figure 33 Bayesian phylogeny of consensussequences of R2R3 MYB domain sequences of the PAP genes with the R3 ‘ID’ motif removed to determine the level of unique information in the R3 ‘ID’ motif compared with the remainder of the MYB domain. AtMYB82, the most closely related
s in which notions of “sequence pattern” or “profile” and PSSM are used, e.g., [23,24]. These notions imply an “average state” of sequences, whereas the network ap- proach simultaneously considers a complete list, often of samples rather than different sequences linked to a com- mon specific structure and function. This list may corre- spond to different “average states” yet be connected to- gether through chains (walks) of pair-wise close similari- ties, which suggests evolutionary connections between the “average states”. The same function for a given segment may be encoded by different consensussequences. More- over, in some cases a consensus may not exist at all.
Figure 2.—(A) Sequence of IES6649 in wild-type and AIM-2 micronuclear DNA and AIM-2 macronuclear DNA. The IES sequence is uppercase and the macro- nuclear sequence is lower- case. A base at either end of IES6649 is numbered for reference. The base change is bold and starred; the dele- tion is shown as a series of dots in the AIM-2 macro- nuclear sequence. The num- bers in parentheses indi- cate the number of bases not included in the figure. The micronuclear sequence of AIM-2 was directly determined from beyond the left end of IES6649 into the 29-bp internal IES. The remainder of the sequence was deduced from the AIM-2 macronuclear DNA. (B) The terminal inverted-repeat consensussequences (59-39) for mariner/Tc1 transposons and Paramecium IESs are shown along with the corresponding terminal sequences from wild-type IES6649, AIM-2 IES6649, and the internal 29-bp IES. Left and right indicate the ends of the deleted elements. The black and gray shading indicates nucleotides that are identical and similar to the Paramecium IES consensus, respectively. K, G or T; S, C or G; N, any base; Y, C or T; R, G or A.
In order to investigate the potential presence of compensa- tory mutations in the six reassortants, consensussequences of all eight pdm/H1N1 and A/H1N1 gene segments were gener- ated using “early” pandemic (April-June 2009) or “late” sea- sonal (June 2008 to June 2009) H1N1 isolates (see Materials and Methods). Ten protein sequences were used for amino acid comparisons: PB2, PB1, PA, HA, NA, NP, M1, M2, NS1, and NS2. The A/H1N1 isolate A/New Zealand/3958/2009 (A/ H1N1/3958) and the pdm/H1N1 isolate A/New Zealand/2047/ 2009 (A/H1N1/2047), along with reference sequences from Oceania, were also compared to the reassortant viruses (Table 4). Overall, nine positions of interest were identified that were present in the reassortants but absent from the worldwide con- sensus sequences, the New Zealand A/H1N1 and pdm/H1N1 viruses, or other available sequences from Oceania during this time period (Table 4).
The top important codons in each gene for differenti- ating human from 2009 pandemic H1N1, avian, and swine viruses were displayed in single figure (Figures 1-10). The comparison of amino acid markers in  and nucleotide markers found in this study revealed several shared sites in each protein/gene, illustrating their sig- nificance as host markers. The consensus nucleotides (codons) comprising these sites in each gene were pre- sented in Tables 2-11, which could also serve as a con- firmation and refinement of the results in .
In this study, we examined the genetic diversity of 24 hMPV isolates collected in Canada. Phylogenetic analysis based on the nucleotide sequences of the G ORF of hMPV isolates revealed the presence of two genetic clusters (1 and 2) and that cluster 1 evolved with multiple lineages that cocirculated during epidemics. The results also showed a low level of amino acid identity between the G proteins of the two hMPV clusters (32.5 to 37.2%). Furthermore, se- quence analysis of 22 hMPV cluster 1 isolates, all collected in 2002, revealed extensive variability of the G protein within cluster 1, with identity levels as low as 61.4%, which clearly indicates that two different genetic lineages of cluster 1 isolates were cocirculating in Canada during 2002. These results confirm data from recent studies showing levels of amino acid identity ranging from 31 to 37% between the G proteins of the two hMPV clusters and the presence of additional diversity within clusters (3, 13). As with HRSV, the G protein of which is involved in neutralizing and protec- tive immunity, the high percentage of nucleotide changes that resulted in amino acid changes suggests that there may be a selective advantage to G protein changes (5, 10, 16). Another similarity with HRSV is the use of different stop codons, re- sulting in G proteins of different lengths and the correlation observed between the G protein length and the position of the isolate in the phylogenetic tree (6, 8, 15). Changes in stop codon usage have been found in HRSV escape mutants se- lected with monoclonal antibodies that recognize strain-spe- cific epitopes (9). Furthermore, reports show that, for HRSV, carbohydrates in the C-terminal third of the G protein influ- ence the expression of certain epitopes by either masking or contributing to antibody recognition (11). Although the N- linked and O-linked glycosylation sites were conserved in some positions for the hMPV G protein, sequence analysis also re- vealed lineage-specific patterns. Therefore, as with HRSV, the extensive glycosylation of the hMPV G protein may help the virus to evade the immune system. Although strain-specific monoclonal antibodies are not available to determine the pre- cise antigenic and genetic relatedness of hMPV isolates, these results taken together suggest that hMPV may evade the im- mune system through epitope modification like HRSV. The highly specific reactivity of antisera raised against viruses from the two clusters of hMPV supports this hypothesis.
Manual performance of the assays, interpretation of the electropherograms, and data analysis all require careful and skilled evaluation. An added problem in reporting is the non- standardization of the databases used to associate mutations with particular drugs. Many of the discrepancies among the mutations reported by the VG, AB, and reference laboratory assays appeared to be due to differences in databases. The Los Alamos HIV database (http://hiv-web.lanl.gov/) and the Stan- ford database (http://hivdb.stanford.edu/hiv/) are often refer- enced, and there are periodic consensus statements concerning notable mutations and patterns (11, 12); however, there is no standardized interpretation of significant mutations. Even more complexity is added to interpretation when reporting goes beyond just providing mutation lists to specifying resis- tance or susceptibility to a given drug. Interpretation of indi- vidual mutations and mutation patterns to provide information about drug resistance currently depends on the interpretation rules used in individual laboratories. As a consequence, com- mercially available assays are using diverse databases and in- terpretation rules, and these are continually being updated as new information becomes available. Clinical interpretation of drug resistance reports should take into account that variations in the databases can affect the reported mutations and that variations in interpretation rules can affect the identification of resistance.