• No results found

Genome structure and genome sequencing

3. CHAPTER 3: Genome level analysis of Ugandan isolates TgCkUg8 and 9

3.1.1. Genome structure and genome sequencing

It is essential to study the genome structure of T. gondii and how it is organized in order to understand how the cellular processes are regulated. Genomic DNA is bound with a complex of nuclear proteins to form chromatin. There are two forms, a gene-rich

decondensed chromatin (euchromatin) where transcription is enhanced and a gene-poor condensed chromatin (heterochromatin) which is refractory to transcription. Generally, euchromatin is the active chromatin, while heterochromatin represents the constitutive chromatin. In fact, these epigenetic related structures are preserved from yeast to human including, of course, T. gondii (Gissot et al., 2007; Gissot et al., 2012). In Toxoplasma, histone and histone variants are considered to be epigenetic markers of both chromatins (euchromatin and heterochromatin) (Dalmasso et al., 2009).

The heterochromatin region has two identified domains, the telomeres and the centromere. The telomeres are located at the ends of the chromosome and consisting of the telomeric repeats and the subtelomeric region (Telomeric Associated Sequence - TAS). The subtelomere consists of repetitive elements and, in some cases, subtelomeric genes (Ottaviani et al., 2008; Pryde and Louis, 1999).

The subtelomeric genes play a significant role in many protozoan pathogens, such as

Plasmodium falciparum, Trypanosoma brucei, Trypanosoma cruzi and Leishmania major. The genomes of Trypanosoma brucei (Berriman et al., 2005), Trypanosoma cruzi (El-Sayed et al., 2005) and Leishmania major (Ivens et al., 2005) were sequenced and published in 2005. Although there are considerable differences in lifestyles of these parasites, they were shown to have a conserved core gene set and a high level of diversity in the subtelomeres, which included several of the surface antigen genes. This difference in lifestyles among these parasites was also reflected by the fact that intracellular parasites, Trypanosoma cruzi

and Leishmania major, exhibited a higher level of similarity compared to the extracellular parasite, Trypanosoma brucei. This pattern of telomeric diversity and central genetic conservation had been shown in many species such as the fungal parasite Pneumocystis carinii (Stringer and Cushion, 1998) and the apicomplexan parasite Babesia bovis (Brayton et al., 2007).

Although the significance of, and evidence for, the role of both telomeres and subtelomeric regions on the surrounding genes expression is established for many pathogens, the telomeres of Toxoplasma have not been studied. This is probably due to the fact that the genes at chromosome ends have not been analysed in this parasite and because the majority of the current genomes do not have complete assembled chromosomal ends.

frequently located in subtelomeric regions. This feature may play a role in enabling higher recombination among members of these gene families. The capability of recombination between central genes is restricted, while subtelomeric genes have greater ability to recombine, which results in generation of high diversity level in these antigen genes (Barry et al., 2005).

In T. brucei, sequencing of the genome showed that the percentage of encoded genes of variable surface glycoproteins (VSGs) in the genome was only 5% as complete genes, the majority being pseudogenes, which might produce new variants via recombination (Marcello and Barry, 2007). In addition, this pattern has been shown in other protozoans such as

Babesia spp (Brayton et al., 2007).

In P. falciparum, it is known that the VAR family which includes about 60 genes has a significant role in the production of antigenic variation. The sequencing of the complete genome of this parasite contributed to understanding this family more clearly, by showing that they were not randomly allocated but more likely to be localized in telomeres. Additionally, it had been revealed the association of these genes in the regulation of antigenic variation (Gardner et al., 2002).

One of the common characteristics of parasitism is the capability to get nutrition from the host, as genes are lost that are not required. This pattern is shown as a reduction in the size of the genome, as some obligatory intracellular parasites have shorter genomes compared to their relatives free living parasites (Sakharkar et al., 2004). For instance, reduction in genome sizes had been shown in two of the Microsporidian obligatory intracellular parasites,

Trachipleistophora hominis (Heinz et al., 2012) and Encephalitozoon cuniculi (Katinka et al., 2001). In contrast, Trichomonas vaginalis genome is an exception to this genome reduction pattern, and is considered as the largest protozoan sequenced genome with a size of 160Mb (Carlton et al., 2007).

Although understanding the genomic structure of many parasites has improved via the generation of many genome sequences, the generation of a high quality genome reference is still lacking. For instance, the genome of Trypanosoma brucei was sequenced and reported with 30 contigs (Berriman et al., 2005), while the genome of the relative species,

Trypanosoma congolense, was sequenced with 3181 contigs (Jackson et al., 2012). This increasing pattern is due to a reduction in the cost of generating whole genome sequence

data while the closing gap cost of raw assemblies is high. Recently, new technology has been applied to generate complete closed genomes from shotgun sequencing of bacteria, which is called single molecule sequencing (Koren et al., 2013). As parasite genomes are more complicated compared to bacterial ones, the development of this technology would facilitate generation of complete parasitic genomes at a reduced cost.

Genomic analysis was usually performed on only a single representative strain from a particular species. After development of next generation sequencing methods, sequencing of many strains of a particular species improved our understanding of the population structure. For example, whole genome sequencing had been applied to detect recombination in

Trypanosoma brucei rhodesiense (Goodhead et al., 2013) or in T. gondii, where ancestral recombination events were indicated through the pattern of polymorphism at the genome level (Boyle et al., 2006).