Whole genome sequencing, assembly and partial annotation of a novel Entomopathogenic nematode Oscheius sp TEL-

3.3 Protein coding genes

Prediction of protein-coding genes was performed using Augustus and Blastx. Augustus predicted 49947 in both nematodes lines, among them most of them were complete models

with both the start and stop codon, whereas some of the genes where incomplete Open reading frames as seen in the ggf files obtained. Augustus and Blastx findings revealed some important proteins and conserved domains crucial for gene transcription and translation; some proteins also have a role in DNA replication. Other proteins have been hypothesised to be involved in desiccation tolerance. Protein names and functions were obtained by using BLASTp to align protein sequences generated by Augustus to the protein databases on NCBI. The domain diagrams were obtained from courtesy of NCBI Conserved Protein Domain.

A hypothetical protein CAEBREN_17421 found in Caenorhabditis brenneri was also present in the inbred line 13 of Oscheius sp. TEL-2014. This protein was isolated from position 1 (start) to 1249 (end) according to Augustus findings.

Fig.21 a 112 amino acid long hypothetical protein CAEBREN_17421 predicted by augustus and blastx. Image taken from NCBI Blasx.

WD40 repeat domain was predicted and found on position 5208 (start) to 5385 (end). This protein was identified from Haemonchus contortus also known as Barber pole worm which also belongs to the phylum Nematoda, Chromadorea, Rhabditida.

Fig. 22 WD40 repeat domain in line 13 of Oscheius sp. TEL-2014. Information from NCBI Conserved Protein Domain

A hypothetical protein Y032_0569g73 found in Ancylostoma ceylanicum (Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida)

Fig. 23 Helix-loop-helix domain highlighting the specific hits which the Query from line13 of

Oscheius sp. TEL-2014 contigs had high affinity to.

The Helix-loop-helix domain is found in specific DNA- binding proteins that act as transcription factors. This domain is 60-100 amino acids long and is present in most eukaryotes. It was predicted in position 1072 (start) to 1145 (end) on the genome.

Topoisomerase II large subunit originally found in Escherichia phage PBECO 4 was predicted to be present I the Oscheius nematodes genome. Histidine kinase-like ATPases is one of the domains present in this protein.

Fig. 24 TOPRIM superfamily found in line 13 of Oscheius sp. TEL-2014

A Histidine kinase-like ATPases was predicted to be present in Oscheius nematodes. The TOPRIM superfamily also comprises of numerous ATP-binding proteins such as histidine kinase, DNA gyrase B, topoisomerases, heat shock protein HSP90, phytochrome-like ATPases and DNA mismatch repair proteins. The heat shock protein HSP90 may be hypothesised to be involved in desiccation tolerance of these entomopathogenic nematodes.

An uncharacterized protein CELE_C24H12.4 originally found in Caenorhabditis elegans was predicted to be present in Oscheius sp. TEL-2014 on position 2795 (start) to 2969 (end). A P-loop_NTPase domain superfamily was identified to be associated with this protein.

Fig. 25 A P-loop_NTPase domain superfamily observed in line 13 of Oscheius sp. TEL-


One of the specific domain hits on this protein was the Helicase superfamily c-terminal domain which is linked with DEXDc, DEAD and DEAH box proteins. The DEAD-box helicases is defined as assorted family of proteins involved in ATP-dependent RNA unwinding. Members of this family are the DEAD and DEAH box helicases.

Non-specific hits include the DEXDc which belongs to the DEAD-like helicases superfamily and Helicase_C which is a helicase conserved C-terminal domain.

Superfamilies and multi-domains identified are the SrmB which contain the II DNA and RNA helicase involved in replication, repair and recombination.

Provisional domains isolated are PRK11192 domain recommended as an ATP-dependent RNA helicase SrmB, PTZ00424 which was reported to be part of the helicase and PLN00206 classified as a DEAD-box ATP-dependent RNA helicase. The DECH_helic helicase had no specific function or association mentioned.

A hypothetical protein Y032_0043g788 present in Ancylostoma ceylanicum genome, was seen to be present on position 4335(start) to 4335 (end) in the Oscheius nematodes most inbred line genome.

Fig. 26 NAD(P) binding domain of glutamate dehydrogenase, subgroup 1 linked with a hypothetical protein Y032_0043g788 seen in 13 of Oscheius sp. TEL-2014

NCBI Conserved Protein Domain has provided information about the amino acid dehydrogenase (DH) is an extensively dispersed family of enzymes that play a role in the catalysis the oxidative deamination of an amino acid to its keto acid and ammonia with concomitant reduction of NADP+.

ELFV_dehydrog_N is one of the specific hits simply described as a dimerization domain and also known as Glutamate/Leucine/Phenylalanine/Valine dehydrogenase. The NAD(P) binding site was found.

The ELFV_dehydrog Glu/Leu/Phe/Val dehydrogenase categorised under superfamilies along with NADB_Rossmann.

Multidomains such as Glutamate dehydrogenase/leucine dehydrogenase (GdhA) have been implicated in amino acid transport and metabolism. PLN02477 which is glutamate dehydrogenase was also identified and ELFV_dehydrog, a Glutamate/Leucine/Phenylalanine/Valine dehydrogenase.

Provisional multidoamins such as PTZ00079, a NADP-specific glutamate dehydrogenase and PRK14030, glutamate dehydrogenase were found.

Fig. 27 Zinc-finger domains in 13 of Oscheius sp. TEL-2014

Zinc-finger domains were identified and are involved in gene expression, cell-adhesion and protein folding. The predicted zinc finger protein 271-like is present in Acyrthosiphon pisum genome. This protein is found on position 2196 (start) to 2196 (end) based on the ggf file report provided by Augustus.

A hypothetical protein Y032_0004g2073 found in Ancylostoma ceylanicum (Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida) was also identified in line 7 of

Oscheius sp. TEL-2014

Fig. 28 Trehalase (EC: is recognized as an enzyme that recycles trehalose to glucose. Trehalose is a biological trademark of heat-shock response in yeast and thus and protects it against a variety of stresses.

The eLRR (extracellular Leucine-Rich Repeat) isolated from Caenorhabditis elegans was identified and found in line 7 of Oscheius sp. TEL-2014

Fig. 29 Leucine rich repeat domains are often seen to have a function in stabilisation protein structures

A hypothetical protein CAEBREN_28360 found in Caenorhabditis brenneri genome was also identified. This protein may be further hypothesised to be involved in nematodes chemotaxis and behaviour.

Fig. 30 Neurotransmitter-gated ion-channel ligand binding domain in line 7 of Oscheius sp. TEL-2014. It has been reported that members of the LIC family of ionotropic neurotransmitter receptors are found only in vertebrate and invertebrate animals.

