The Major Histocompatibility Complex Class I Genes of Zebrafish.

(1)

ABSTRACT

DIRSCHERL, HAYLEY LAUREN. The Major Histocompatibility Complex Class I Genes of Zebrafish. (Under the direction of Dr. Jeffrey A. Yoder).

While the genes of the major histocompatibility complex (MHC) were named after the discovery that they encode the primary antigens responsible for determining transplant compatibility, MHC molecules also play a central role in adaptive immunity through

presenting antigens to T cells. Mammalian MHC class I molecules present cytosolic antigens to CD8+ cytotoxic T cells and serve as inhibitory ligands to natural killer (NK) cells. MHC class I genes can be classified as classical or nonclassical where the classical MHC class I genes encode molecules that present antigens to T cells, are highly polymorphic, and are ubiquitously expressed. The nonclassical MHC class I genes encode molecules that are structurally similar to classical molecules but exhibit different functions, have low-level polymorphism, and display tissue-specific expression.

Genes encoding MHC class I molecules have been identified in all jawed vertebrate species examined, including cartilaginous fish, but are absent from jawless fish. After the MHC had been described in several mammalian species, researchers became interested in identifying orthologous genes in bony fish, or teleosts, in order to gain perspective on the evolutionary origins of adaptive immunity. Zebrafish provide an ideal animal model in which to study the MHC class I genes as they have a complete reference genome, though the

description of all the zebrafish MHC class I genes is incomplete. The goal of this research was to characterize the sequence diversity of the zebrafish MHC class I genes and

hypothesize about the functional consequences of this diversity.

(2)

lineage proteins are predicted to bind peptides in a manner similar to mammalian classical MHC class I. Zebrafish encode a U lineage locus on chromosome 19 that displays significant haplotypic variation and is the only zebrafish locus that shares conserved synteny with the mammalian MHC core locus. In chapter 3, two additional MHC class I U lineage genes encoded on chromosome 22 are characterized that differ from the other genes in the U lineage in that they do not exhibit many of the characteristics of classical MHC class I genes. A second, alternate haplotype at this locus is also described with a ~30 kb deletion that completely removes the MHC class I genes. In chapter 2 twelve unique MHC class I genes of the Z lineage are introduced that are encoded on zebrafish chromosomes 1 and 3. This

chapter provides genomic and experimental evidence that these twelve Z genes represent at least two different haplotypes for each locus. The functional significance of the Z lineage MHC class I genes remains unclear as they exhibit features of both classical and nonclassical molecules. In chapter 4 a preliminary analysis of the zebrafish L lineage is provided

identifying 14 unique genes in the latest draft of the reference genome encoded on

(3)

(4)

The Major Histocompatibility Complex Class I Genes of Zebrafish

by

Hayley Dirscherl

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Biomedical Engineering

Raleigh, North Carolina 2015

APPROVED BY:

________________________________ ________________________________ Dr. Jeffrey A. Yoder Dr. Michael Gamcsik

Committee Co-Chair Committee Co-Chair

________________________________ ________________________________ Dr. Albert Banes Dr. Jeffrey Macdonald

(5)

ii DEDICATION

(6)

iii BIOGRAPHY

Hayley Dirscherl was born in Dunedin, FL and grew up in the sunny town of

Clearwater, FL. She is a proud graduate of the International Baccalaureate program at Palm Harbor University High School where her love for science was made clear. For her

undergraduate studies, she moved to New York City to attend the Columbia University School of Engineering and Applied Science. Outside of her studies in Biomedical

Engineering she was a member of the Columbia University Dance Team for four years and captain for two. Some of her favorite memories from this time include performing at Madison Square Garden and participating with her classmates in a midnight bicycle ride all across Manhattan.

In 2009 she came to Raleigh to join the Joint Department of Biomedical Engineering at UNC and NC State where as a fully-funded Royster Fellow, she was able to join the lab of her choice. Having recently learned about the zebrafish model, she was lucky enough to find the Yoder Lab using zebrafish as a model of human immunity. She plans to continue

(7)

iv ACKNOWLEDGMENTS

(8)

v TABLE OF CONTENTS

LIST OF TABLES ... ix

LIST OF FIGURES ... x

CHAPTER 1: Introduction and literature review ... 1

Discovery of the Major Histocompatibility Complex ... 1

Structure and function of MHC genes ... 2

Classical and nonclassical MHC class I molecules ... 5

Evolutionary context of the MHC ... 6

Analysis of MHC class I in fish ... 7

The U lineage ... 8

The Z lineage... 11

The nonclassical L lineage... 12

The S and P lineage genes are absent from zebrafish ... 13

Outline of dissertation ... 14

References ... 15

CHAPTER 2: Characterization of the Z lineage Major histocompatibility complex class I genes in zebrafish... 20

Abstract ... 20

Introduction ... 20

Materials and methods ... 21

Gene nomenclature ... 21

Zebrafish lines ... 22

Genomic annotation and data mining... 22

Phylogenetic analyses ... 22

Amplification of zebrafish cDNAs ... 22

Genotyping PCR ... 23

(9)

vi

Results and discussion ... 24

Identification of 12 MHC class I Z loci in zebrafish ... 24

Loci encode highly similar MHC class I Z proteins ... 25

Phylogenetic analyses ... 27

Variable expression of MHC class I Z genes ... 27

Evidence for polymorphic, haplotypic, and transcriptional variation ... 29

Zebrafish Z genes primarily show ubiquitous expression ... 30

Z gene polymorphism and genetic background ... 30

Concluding remarks ... 32

Acknowledgements ... 32

CHAPTER 3: MHC class I gene loss through haplotypic variation ... 47

Abstract ... 47

Materials and methods ... 51

Zebrafish and cell lines ... 51

Transcriptome analyses ... 52

Data mining and sequence analyses ... 53

Amplification of zebrafish mhc1ula and mhc1uma sequences ... 54

Southern blot ... 55

Chromosome walking... 56

Chromosome 22 U locus genotyping ... 56

Evaluation of tissue expression ... 57

Results ... 57

Zebrafish mhc1ula and mhc1uma map to chromosome 22 ... 57

Transcript variants... 61

Divergent residues at locations of functional consequence ... 63

(10)

vii

A genotyping strategy for the chromosome 22 U locus ... 69

Tissue-specific expression ... 71

Discussion... 72

Acknowledgements ... 75

CHAPTER 4: Preliminary Characterization of the L Lineage MHC class I genes in zebrafish ... 84

Methods ... 86

Database queries ... 86

Sequence analyses ... 86

Transcriptome analysis of a CG2 individual ... 87

Amplification of L cDNAs ... 87

Results ... 88

Improved sequence coverage in GRCz10 reveals fewer L genes on chromosome 25 ... 88

Genomic organization of the zebrafish L genes ... 89

Phylogenetic analyses of the L lineage ... 91

Sequence features of the zebrafish L genes... 94

Evidence for L gene expression ... 97

Discussion... 99

CHAPTER 5: Summary and future directions ... 103

Summary: The genetic diversity of the three lineages of zebrafish MHC class I genes ... 103

The Z lineage... 103

The U lineage ... 104

The L lineage... 105

(11)

viii

β2M-binding assay ... 106

Cytotoxicity assay ... 111

Characterize the cytotoxic cells within the mixed effector population ... 114

Concluding thoughts ... 115

APPENDICES ... 118

Appendix A – Published review article ... 119

Appendix B – β2M-binding assay protocol... 132

Appendix C – Cytotoxicity assay protocol ... 136

(12)

ix LIST OF TABLES

CHAPTER 2

Table 1 Primer sequences and PCR cycling parameters for cloning

full-length cDNAs ... 23 Table 2 Primer sequences and PCR cycling parameters for genotyping PCR ... 24 Table 3 Zebrafish Z gene information and accession numbers ... 25 Table S1 Zebrafish Z genes identified in the de novo genomic assemblies

from double haploid homozygous AB (DHAB) and Tübingen (DHTu2)

individuals ... 41 Table S2 Predicted sizes of Southern blot fragments containing Z gene

α1 sequences based on HindIII restriction sites ... 41 CHAPTER 3

Table S1 Summary of primer sequences and PCR conditions for all figures ... 79 CHAPTER 4

Table 1 Zebrafish MHC class I L gene nomenclature ... 89 Table 2 Summary of L genes detected in the CG2 zebrafish transcriptome ... 98 Table 3 Primer sequences and PCR cycling parameters for cloning

(13)

x LIST OF FIGURES

CHAPTER 1

Figure 1 Structure and function of MHC class I and class II proteins ... 3

CHAPTER 2 Figure 1 Genomic organization of zebrafish Z genes ... 23

Figure 2 Z protein sequences ... 26

Figure 3 Phylogenetic analyses of MHC class I Z genes ... 27

Figure 4 Variable expression of Z genes across individual zebrafish... 28

Figure 5 Genotyping PCR and Southern blot analysis ... 29

Figure 6 Z gene expression ... 31

Figure 7 Allelic variation of Z genes ... 31

Figure S1 Full-length zebrafish MHC class I Z proteins ... 34

Figure S2 Organization and classification of zebrafish Z sequences ... 39

Figure S3 Alignment of transmembrane and cytoplasmic domains ... 40

Figure S4 Sequence variation of MHC class I Z proteins... 42

CHAPTER 3 Figure 1 Genomic annotation of mhc1ula and mhc1uma on chromosome 22 ... 58

Figure 2Phylogenetic comparison of Ula and Uma to representative MHC class I sequences of teleosts ... 60

Figure 3Zebrafish MHC class I U lineage protein sequence alignment ... 64

Figure 4Presence or absence of mhc1ula and mhc1uma from individual zebrafish ... 67

Figure Sequencing an alternate haplotype ... 68

Figure 6 Genotyping strategy for the chromosome 22 locus ... 70

Figure 7 Tissue-specific expression of mhc1ula and mhc1uma ... 71

Figure S1 Full-length zebrafish Ula and Uma protein sequences ... 81

(14)

xi

Figure S3Genomic PCR to identify regions for chromosomal walking ... 83

CHAPTER 4 Figure 1 Genomic map of the zebrafish L genes ... 90

Figure 2 Phylogenetic analysis of the L lineage ... 92

Figure 3 Phylogenetic analysis of individual α domains ... 95

Figure 4 Alignment of the zebrafish L proteins ... 96

Figure 5 Variability plot of the L lineage MHC class I proteins ... 97

CHAPTER 5 Figure 1 Strategy for expressing recombinant proteins in the β2M-binding assay ... 108

Figure 2 Preliminary β2M-binding assay results ... 110

Figure 3 Validation of the fluorescence-based cytotoxicity assay ... 113

(15)

1 CHAPTER 1: Introduction and literature review1

Discovery of the Major Histocompatibility Complex

Since the early 1900s there has been a great clinical interest in achieving successful

tissue transplantations and blood transfusions between related and non-related individuals

(Hakim and Danovitch 2001). Research done by Peter Medawar in the 1940s involving skin

grafts between rabbits helped establish the fact that a specific adaptive immune response was

the cause of transplant rejection (Gibson and Medawar 1943). This encouraged researchers to

identify immunosuppressive therapies to improve transplantation success rates. The therapies

they developed included whole-body irradiation as well as chemical treatments like

cyclosporine, but these methods had many side effects and often still resulted in chronic

transplant rejection (Hakim and Danovitch 2001). In 1951, research published by George

Snell was the first to identify a genetic locus, termed the histocompatibility-2 locus, that was

responsible for determining susceptibility or resistance to tumor transplantation between

congenic mouse strains (Snell and Higgins 1951; Klein 2001). It wasn’t until the early 1970s

that this locus was designated as the major histocompatibility complex, or MHC, after the

identification of an orthologous gene cluster in humans (Bach et al. 1972).

Having an understanding of the genetic determinants of transplant compatibility was

critical to enabling many of the medical procedures we have today. Matching at MHC loci

between donors and recipients is important to ensure successful transplant outcomes by

(16)

2

minimizing acute graft rejection and/or graft-versus-host disease in the case of bone marrow

transplantation (Groth et al. 2000). However, this wasn’t the only impact of Snell’s research.

As Bach et al. wisely pointed out in 1976, researchers who were drawn to the

histocompatibility antigens never could have anticipated the complexity of the genes

encoding them, as the MHC also plays a central role in the immune response and in

determining disease susceptibility (Bach et al. 1976). While detailing the complete history of

this field of research is beyond the scope of this dissertation, I will include some of the

interesting and historical findings that gave us the understanding of the MHC we have today.

Structure and function of MHC genes

When the membrane-bound histocompatibility antigens were solubilized by

enzymatic digestion and analyzed by chromatography, two classes of protein fragments were

revealed that differed in molecular weight: class I fragments ranged from 58 to 65 kilodaltons

(kDa) while class II fragments were only about 33 to 37 kDA (Stimpfling 1971). As we

know today, this is because MHC class I molecules are heterodimers of a membrane-bound

heavy chain with three immunoglobulin-like α domains (α1, α2, and α3) and a

non-covalently associated light chain encoded outside the MHC (β2-microglobulin) while MHC

class II molecules are heterodimers of α and β chains, each with two immunoglobulin-like

domains (α1 and α2; β1 and β2) tethered to the cell surface (Figure 1). The original MHC

characterization studies were limited in methodology with one of the most prevalent

(17)

3

technique led to the key observation that some histocompatibility antigens (class I) are

present on nearly all cells while others (class II) are only encoded by a subset of leukocytes

(Hakim and Danovitch 2001).

Another method that was frequently used was the mixed leukocyte culture (MLC) or

mixed leukocyte reaction (MLR) in which stimulating lymphocytes from one individual are

prevented from proliferating by treatment with a DNA cross-linker and co-cultured for

several hours with responding lymphocytes from another individual (Bach et al. 1976). If the

individuals encode different MHC molecules, then the responder lymphocytes will undergo

proliferation as measured by the incorporation of tritiated thymidine. In humans it was

determined that three class II loci are responsible for determining a positive MLC response

(18)

4

(Hakim and Danovitch 2001). This is because the class II molecules present extracellular

antigens, such as bacterial peptides, to CD4+ helper T cells which can then stimulate the

increased uptake by macrophages of extracellular pathogens and/or activate B cells to elicit a

humoral response (Alberts et al. 2002). In contrast, the class I molecules present cytosolic

antigens, such as viral peptides, to CD8+ cytotoxic T cells which can directly kill the infected

target cell by causing it to undergo apoptosis, or programmed cell death.

More recently, an additional role for MHC class I molecules as inhibitory ligands for

natural killer (NK) cells has been described (Raulet and Vance 2006). NK cells are

lymphocytes that are generally considered part of the innate immune system due to their lack

of recombining antigen receptors. Instead they express an assortment of inhibitory and

activating receptors, the stimulation of which is balanced when an NK cell comes in contact

with a normal cell that is recognized as “self”. As even normal cells express a certain level of

activating ligands, the expression of MHC class I as the primary inhibitory ligand of NK cells

is critical to protect normal cells from lysis. In contrast, cells that have been infected or

transformed can be recognized as “non-self” and targeted for lysis by NK cells if they have

been induced to express additional activating cell surface ligands (stress proteins, virally

encoded proteins, etc.) or recognized as “missing self” and targeted for lysis by NK cells if

they are lacking or expressing low levels of the appropriate MHC class I allele(s). The later

scenario is thought to have evolved to counter the immune-evasion strategy of MHC class I

down-regulation employed by many viruses.

The human MHC locus comprises a 3.6 Mb section of chromosome 6 and is

(19)

5

(Vandiedonck and Knight 2009). This level of polymorphism results in diverse MHC class I

sequences that can present diverse peptides for interrogation by T cells. There is evidence

that MHC loci are subject to over-dominant selection in which heterozygosity is favored over

homozygosity, likely due to the associated ability to respond to diverse pathogens (Shiina et

al. 2009). The human MHC remains of great interest to many researchers as it has been

associated with over 100 diseases, mostly of the autoimmune or infectious variety (Shiina et

al. 2009).

Classical and nonclassical MHC class I molecules

The focus of this dissertation is on the characterization of MHC class I genes and the

molecules they encode. MHC class I molecules can be further classified as being classical or

nonclassical. Classical MHC class I molecules are defined based on their function of

presenting peptide antigens to CD8+ T cells in order to initiate an immune response. Because

of this function, classical MHC class I molecules have extremely high levels of

polymorphism, often found as specific patterns of substitutions among residues associated

with the peptide binding site in the α1 and α2 domains (Bjorkman et al. 1987). Additional

characteristics of classical MHC class I molecules include nearly ubiquitous expression,

conservation of peptide anchor residues, and conservation of surface residues that promote

binding to other molecules such as 2M and CD8 (Rodgers and Cook 2005; Hee et al. 2013).

Nonclassical MHC class I molecules have a similar structure as the classical

(20)

6

with other immune cells or present non-peptide antigens while others have non-immune

functions (Watanabe et al. 2004; Rodgers and Cook 2005). The nonclassical MHC class I

genes are frequently at least an order of magnitude less polymorphic than their classical

counterparts and are often expressed in limited tissues (Shiina et al. 2009). Interestingly,

some MHC molecules considered classical, such as HLA-C, also have nonclassical

characteristics, and are thus capable of initiating an immune response by more than one

mechanism (Colonna et al. 1993). It has been hypothesized that classical MHC genes are

periodically duplicated within the genome and thus over time may be free to be selected upon

to acquire diverse roles (Klein et al. 2007). Of note, a large fraction of the nonclassical MHC

genes are not found within the core MHC locus but are instead scattered on additional

chromosomal loci in humans and many other vertebrate species (Horton et al. 2004).

Evolutionary context of the MHC

While they have been best characterized in humans, mice, and other mammals, MHC

molecules have been found in all jawed vertebrate species examined including cartilaginous

fishes (Flajnik and Kasahara 2010). It is thought that the adaptive immune system that relies

on the existence of MHC molecules, B and T cells, immunoglobulins, and T cell receptors

came about in a “big bang” as no jawed vertebrates have been found with an intermediate

version of the system (Parham 2005). Its emergence could have been made possible by the

incorporation of a transposon encoding the recombination activating gene (RAG) capable of

(21)

7

(WGD) that produced an abundance of raw genetic material (Flajnik and Kasahara 2010).

After the MHC had been described in several mammalian species, researchers became

interested in trying to identify orthologous genes in bony fish, or teleosts, in order to gain

perspective on the evolutionary origins of adaptive immunity. The teleost species have been

diversifying for ~350 million years and represent the largest group of living vertebrates (Stet

and Egberts 1991).

Analysis of MHC class I in fish

The first experiments looking for evidence of an MHC in a teleost relied on the

indirect methods of allograft rejection and the MLR. Due to the popularization of the

polymerase chain reaction (PCR), researchers were eventually able to find direct evidence for

MHC-like genes. First, a genomic locus encoding an MHC class I-like sequence called

TLAIα-1 was amplified from Cyprinus carpio, the common carp (Hashimoto et al. 1990).

The first evidence of MHC class I expression from any teleost was the identification of a

full-length class I transcript from a cDNA library from Salmo salar, the Atlantic salmon

(Grimholt et al. 1993). Eventually, this first MHC class I transcript was designated as being a

part of the U sequence lineage (for “uno” meaning “one”), the lineage that is best conserved

throughout diverse teleost families (Pinto et al. 2013). Four other sequence lineages have

been identified in teleosts which are designated Z, L, S, and P. The focus of this dissertation

is on the characterization of the MHC class I genes of zebrafish which include the U, Z, and

(22)

8

time of the zebrafish make it a suitable model for evaluating the effects of selection on an

immunologically important gene such as MHC class I (Takeuchi et al. 1995). I will

summarize what was known about each of these sequence lineages prior to the start of my

dissertation research with an emphasis on zebrafish data.

The current zebrafish MHC class I gene nomenclature follows the guidelines

proposed for all species (Klein et al. 1990). If the zebrafish “mhc1uaa” gene symbol is dissected as an example, the “mhc1” is included to define the sequence as class I. The first

letter “u” (or “z” or “l”) corresponds to the MHC class I lineage. The variable second letter

“a” (or “b” or “c” etc.) designates the gene, named alphabetically in order of discovery. The

third letter “a” indicates the alpha chain. For MHC class II genes that form heteromeric

complexes of an alpha and a beta chain, the beta chain gene is designated with “b”. When

zebrafish MHC class I genes are compared to sequences from other species, then the

inclusion of a species identifier is crucial. In this case, the four letter prefix, Dare (which refers to the first two letters of the genus and species, Danio rerio), is added to the gene symbol (Dare-mhc1uaa).

The U lineage

There is abundant evidence to suggest that the MHC class I molecules of the U

lineage exhibit classical function in teleosts. First, there is direct experimental evidence of

peptide presentation for U molecules from studies performed in grass carp

(23)

9

binding of peptide to U molecules, 3) association of U molecules with 2M, and 4) the

interaction of labeled U complexes with T cells (Chen et al. 2010). Notably, cytotoxic T cell

function appears to be highly conserved in teleosts (Nakanishi et al. 2011), which likely

relies on antigen presentation via U molecules to activate the immune response of CD8+

cells. It has been demonstrated in rainbow trout (Oncorhynchus mykiss) that a U protein known as UBA is expressed in similar cell types as mammalian classical MHC class I

molecules (Dijkstra et al. 2003). Specific alleles of the single classical U locus, UBA, in

Atlantic salmon have been implicated in pathogen resistance (Grimholt et al. 2003; Kjøglum

et al. 2008). Finally, at the genomic level, only the U lineage genes are consistently

associated with the core MHC locus within teleosts, meaning that they maintain linkage to

additional class I pathway genes such as psmb8, tapbp, and abcb3/tap2 (Matsuo et al. 2002). Full-length transcripts and genomic sequence for β2M from zebrafish were reported

in 1993 (Ono et al. 1993) followed shortly by the cloning of three MHC class I genes of the

U lineage (Takeuchi et al. 1995). The sequences identified from a cDNA library were

designated mhc1uaa and mhc1uba while a third sequence identified from a genomic library was designated mhc1uca. Linkage studies in haploid zebrafish embryos showed that the class I loci are closely linked on chromosome 19 (Bingulac-Popovic et al. 1997). In zebrafish,

haplotypic variation greatly contributes to the diversity of U gene loci. Genomic sequences

have been assembled for two different haplotypes on zebrafish chromosome 19 (Michalova

et al. 2000). Chromosome walking on BAC and PAC clones revealed one haplotype

(24)

10

linked flanking genes, with each distinct haplotype carrying one, two, or three unique U

lineage genes (McConnell et al. 2014). Southern blot analyses of genomic DNA from

zebrafish that are homozygous for each of the six haplotypes support the conclusion that each

haplotype contains one to three U genes.

The zebrafish has been increasingly used in transplantation assays due to its several

advantages over the more traditional mouse including higher fecundity, optical clarity at the

embryonic and larval stages, and ease of manipulation for forward or reverse genetics

(Taylor and Zon 2009). Zebrafish transplantation assays have been especially useful for the

study of hematopoietic stem cells (Li et al. 2011). However their use in this type of assay

highlights a current limitation: attempts at generating inbred zebrafish lines met with only

limited success (Shinya et al. 2011), resulting in standard laboratory “lines” of zebrafish

harboring a multitude of polymorphisms as well as haplotypic variation especially at immune

loci (Patowary et al. 2013). As immunologically compatible zebrafish donors and recipients

are not readily available, genotyping strategies have been and are being developed for

identifying MHC-matched zebrafish. It has been shown that MHC matching at the U locus

on chromosome 19 significantly increases engraftment in hematopoietic transplantation

assays (de Jong et al. 2011). This result represents the only functional data for MHC class I

genes in zebrafish. The researchers subsequently performed hematopoietic transplantations

between individual zebrafish that were matched at an additional MHC class I locus on

chromosome 1 (encoding Z lineage genes) and the MHC class II locus on chromosome 8 in

addition to the U locus on chromosome 19 but saw no significant increase in engraftment

(25)

11

transplants between individuals matched at the chromosome 19 U locus still fail to

successfully engraft, it remains to achieve a better understanding of this and other MHC class

I loci to achieve optimal success rates in zebrafish transplantation assays. Through my

doctoral research I have identified two additional genes of the U lineage that map to a novel

locus on chromosome 22. These genes are described in detail in Chapter 3 of this

dissertation.

The Z lineage

The Z lineage genes in zebrafish have historically been referred to as ZE lineage

genes (Kruiswijk et al. 2002); however, the zebrafish nomenclature committee

(http://zfin.org/) recently agreed to employ a nomenclature similar to that used for the

zebrafish U genes as well as the Z gene nomenclature in Atlantic salmon. In order to

understand the history of the Z gene nomenclature in zebrafish one needs to first understand

the origins of the nomenclature in carp. The first Z lineage genes to be reported were

described from common carp and ginbuna crucian carp (Carassius auratus langsdorfii) and named ZA, ZB, ZC, and ZD (Okamura et al. 1993). Approximately ten years later, an

additional MHC class I gene was described from zebrafish, common carp, and barbus

(Barbus intermedius), that is phylogenetically distinct from the U and ZA/ZB/ZC/ZD gene lineages. This gene shares more similarity with the ZA/ZB/ZC/ZD sequences than the U

lineage sequences, and thus was named ZE (Kruiswijk et al. 2002). However, differences

between ZE and the ZA/ZB/ZC/ZD genes soon became apparent, as the ZA/ZB/ZC/ZD

(26)

12

(Kruiswijk et al. 2002). Also, while the ZA/ZB/ZC/ZD genes have properties consistent with

nonclassical MHC genes, ZE genes are distinguished by having features of both classical and

nonclassical MHC genes. Since the only MHC class I Z genes identified outside of carp

species are the ZE genes, the zebrafish ZE genes are now referred to simply as Z genes in

accordance with MHC class I nomenclature guidelines.

The initial full-length Z transcripts from zebrafish were named Dare-ZE*0101 and

Dare-ZE*0102 not knowing if they represented different genes or different alleles of a single gene (Kruiswijk et al. 2002). However, it was clear then that these sequences were likely not

derived from a single copy gene as 14 unique Z sequences were amplified from only five

individual zebrafish. Further evidence of the Z lineage being a multi-gene family came in

2005 when a genome-wide survey of zebrafish MHC loci placed one full length Z gene on an

unplaced scaffold as well as four similar genes on chromosome 1 (based on the fourth

zebrafish genome assembly, Zv4) but no sequence data was provided (Sambrook et al. 2005).

My in-depth analysis of the Z lineage MHC class I genes is provided in Chapter 2.

The nonclassical L lineage

A third lineage of MHC class I genes in teleosts was first reported in 2007 and

designated the L lineage due to the linkage of at least one of its members with MHC class II

(Dijkstra et al. 2007). This linkage of MHC class I and class II genes in zebrafish was in

contrast to the previously described absence of linkage between these genes in teleosts

(Bingulac-Popovic et al. 1997). While this lineage was most extensively characterized in

(27)

13

Atlantic salmon, and fathead minnow (Pimephales promelas) but not in species outside the teleosts (Dijkstra et al. 2007). The L lineage molecules are thought to be nonclassical in

function mostly due to the fact that they lack most of the residues thought to be important for

anchoring the ends of bound peptide. My preliminary analysis of the L lineage genes in

zebrafish is presented in Chapter 4.

S and P lineage genes are absent from zebrafish

A fourth MHC class I lineage in bony fish is referred to as the S lineage as it was first

described in salmonid species, including Atlantic salmon and rainbow trout, although a

related cDNA also has been identified from channel catfish (Ictalurus punctatus) which is a siluriforme species (Lukacs et al. 2010). Atlantic salmon appear to encode a single S lineage

gene, mhc1saa, which was renamed from mhc1uaa based on its lack of identity to other U lineage genes (Lukacs et al. 2010). We have strengthened the claim that the S lineage is

absent in zebrafish (Grimholt et al. 2015) by showing that there are no sequences in the

zebrafish reference genome (Zv9) with significant identity to Atlantic salmon SAA and that

the sequences with the greatest similarity are more similar to genes of the U lineage

(Dirscherl et al. 2014). A fifth MHC class I lineage was recently described from teleosts

named P for “penta” (Grimholt et al. 2015). P lineage sequences were identified in seven

species including Atlantic salmon but were not found in zebrafish. Both the S and P lineages

are thought to encode nonclassical MHC class I molecules due to their lack of

peptide-binding residues (Grimholt et al. 2015). For the P lineage this assertion is supported by the

(28)

14 Outline of dissertation

While our understanding of the MHC class I genes in zebrafish will continue to grow

in the coming years, I have made significant contributions in this regard through my doctoral

research. Chapter 2 shows my published work characterizing the unique

classical/nonclassical Z lineage encoded on chromosomes 1 and 3. In Chapter 3, which has been submitted for publication, I report my finding of two additional genes of the U lineage

that map to a novel locus on chromosome 22. Chapter 4 presents my initial analysis of the L lineage MHC class I genes which map to chromosomes 3, 8, and 25. Finally, in Chapter 5 I highlight some of the future directions of this research geared toward the functional

characterization of the three different lineages of MHC class I molecules in zebrafish. This

includes my development of a β2M-binding assay as well as a fluorescence-based

cytotoxicity assay and some preliminary results. It remains of interest to determine if the

MHC class I molecules of an early vertebrate species such as the zebrafish carry out the

diverse array of functions as do the MHC class I molecules that have been well characterized

(29)

15 References

Alberts B, Johnson A, Lewis J, et al. (2002) Molecular Biology of the Cell, 4th Edition. Garland, New York

Bach FH, Bach ML, Sondel PM (1976) Differential function of major histocompatibility complex antigens in T-lymphocyte activation. Nature 259:273–281. doi:

10.1038/259273a0

Bach FH, Widmer MB, Segall M, et al. (1972) Genetic and immunological complexity of major histocompatibility regions. Science 176:1024–1027. doi:

10.1126/science.176.4038.1024

Bingulac-Popovic J, Figueroa F, Sato A, et al. (1997) Mapping of mhc class I and class II regions to different linkage groups in the zebrafish, Danio rerio. Immunogenetics 46:129–134. doi: 10.1007/s002510050251

Bjorkman PJ, Saper MA, Samraoui B, et al. (1987) The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature 329:512–518. doi: 10.1038/329512a0

Chen W, Jia Z, Zhang T, et al. (2010) MHC class I presentation and regulation by IFN in bony fish determined by molecular analysis of the class I locus in grass carp. J Immunol 185:2209–2221. doi: 10.4049/jimmunol.1000347

Colonna M, Borsellino G, Falco M, et al. (1993) HLA-C is the inhibitory ligand that determines dominant resistance to lysis by NK1- and NK2-specific natural killer cells. Proc Natl Acad Sci U S A 90:12000–12004. doi: 10.1073/pnas.90.24.12000

De Jong JLO, Burns CE, Chen AT, et al. (2011) Characterization of immune-matched hematopoietic transplantation in zebrafish. Blood 117:4234–4242. doi: 10.1182/blood-2010-09-307488

De Jong JLO, Zon LI (2012) Histocompatibility and hematopoietic transplantation in the zebrafish. Adv Hematol. doi: 10.1155/2012/282318

(30)

16 Dijkstra JM, Köllner B, Aoyagi K, et al. (2003) The rainbow trout classical MHC class I

molecule Onmy-UBA*501 is expressed in similar cell types as mammalian classical MHC class I molecules. Fish Shellfish Immunol 14:1–23. doi: 10.1006/fsim.2001.0407 Dirscherl H, McConnell SC, Yoder JA, de Jong JLO (2014) The MHC class I genes of

zebrafish. Dev Comp Immunol 46:11–23. doi: 10.1016/j.dci.2014.02.018

Flajnik MF, Kasahara M (2010) Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat Rev Genet 11:47–59. doi: 10.1038/nrg2703 Gibson T, Medawar PB (1943) The fate of skin homografts in man. J Anat 77:299–310.4. Grimholt U, Hordvik I, Fosse VM, et al. (1993) Molecular cloning of major

histocompatibility complex class I cDNAs from Atlantic salmon (Salmo salar). Immunogenetics 37:469–473.

Grimholt U, Larsen S, Nordmo R, et al. (2003) MHC polymorphism and disease resistance in Atlantic salmon (Salmo salar); facing pathogens with single expressed major

histocompatibility class I and class II loci. Immunogenetics 55:210–219. doi: 10.1007/s00251-003-0567-8

Grimholt U, Tsukamoto K, Azuma T, et al. (2015) A comprehensive analysis of teleost MHC class I sequences. BMC Evol Biol. doi: 10.1186/s12862-015-0309-1

Groth CG, Brent LB, Calne RY, et al. (2000) Historic landmarks in clinical transplantation: Conclusions from the consensus conference at the University of California, Los

Angeles. World J Surg 24:834–843. doi: 10.1007/s002680010134

Hakim NS, Danovitch GM (2001) Transplantation Surgery. Springer, New York Hashimoto K, Nakanishi T, Kurosawa Y (1990) Isolation of carp genes encoding major

histocompatibility complex antigens. Proc Natl Acad Sci U S A 87:6863–6867. Hee CS, Beerbaum M, Loll B, et al. (2013) Dynamics of free versus complexed

β2-microglobulin and the evolution of interfaces in MHC class I molecules. Immunogenetics 65:157–172. doi: 10.1007/s00251-012-0667-4

Horton R, Wilming L, Rand V, et al. (2004) Gene map of the extended human MHC. Nat Rev Genet 5:889–899. doi: 10.1038/nrg1489

(31)

17 Klein J (2001) George Snell’s first foray into the unexplored territory of the major

histocompatibility complex. Genetics 159:435–439.

Klein J, Bontrop RE, Dawkins RL, et al. (1990) Nomenclature for the major

histocompatibility complexes of different species: a proposal. Immunogenetics 31:217– 219. doi: 10.1007/BF00204890

Klein J, Sato A, Nikolaidis N (2007) MHC, TSP, and the origin of species: from immunogenetics to evolutionary genetics. Annu Rev Genet 41:281–304. doi: 10.1146/annurev.genet.41.110306.130137

Kruiswijk CP, Hermsen TT, Westphal AH, et al. (2002) A novel functional class I lineage in zebrafish (Danio rerio), carp (Cyprinus carpio), and large barbus (Barbus intermedius) showing an unusual conservation of the peptide binding domains. J Immunol 169:1936– 1947. doi: 10.4049/jimmunol.169.4.1936

Li P, White RM, Zon LI (2011) Transplantation in Zebrafish, Third Edit. Methods Cell Biol 105:403–417. doi: 10.1016/B978-0-12-381320-6.00017-5

Lukacs MF, Harstad H, Bakke HG, et al. (2010) Comprehensive analysis of MHC class I genes from the U-, S-, and Z-lineages in Atlantic salmon. BMC Genomics 11:154. doi: 10.1186/1471-2164-11-154

Matsuo MY, Asakawa S, Shimizu N, et al. (2002) Nucleotide sequence of the MHC class I genomic region of a teleost, the medaka (Oryzias latipes). Immunogenetics 53:930–940. doi: 10.1007/s00251-001-0427-3

McConnell SC, Restaino AC, de Jong JLO (2014) Multiple divergent haplotypes express completely distinct sets of class I MHC genes in zebrafish. Immunogenetics 66:199– 213. doi: 10.1007/s00251-013-0749-y

Nakanishi T, Toda H, Shibasaki Y, Somamoto T (2011) Cytotoxic T cells in teleost fish. Dev Comp Immunol 35:1317–1323. doi: 10.1016/j.dci.2011.03.033

Okamura K, Nakanishi T, Kurosawa Y, Hashimoto K (1993) Expansion of genes that encode MHC class I molecules in cyprinid fishes. J Immunol 151:188–200.

Ono H, Figueroa F, O’hUigin C, Klein J (1993) Cloning of the β2-microglobulin gene in the zebrafish. Immunogenetics 38:1–10. doi: 10.1007/BF00216384

(32)

18 Patowary A, Purkanti R, Singh M, et al. (2013) A sequence-based variation map of zebrafish.

Zebrafish 10:15–20. doi: 10.1089/zeb.2012.0848

Pinto RD, Randelli E, Buonocore F, et al. (2013) Molecular cloning and characterization of sea bass (Dicentrarchus labrax, L.) MHC class I heavy chain and  β2-microglobulin. Dev Comp Immunol 39:234–254. doi: 10.1016/j.dci.2012.10.002

Raulet DH, Vance RE (2006) Self-tolerance of natural killer cells. Nat Rev Immunol 6:520– 531. doi: 10.1038/nri1863

Rodgers JR, Cook RG (2005) MHC class Ib molecules bridge innate and acquired immunity. Nat Rev Immunol 5:459–471. doi: 10.1038/nri1635

Sambrook JG, Figueroa F, Beck S (2005) A genome-wide survey of Major

Histocompatibility Complex (MHC) genes and their paralogues in zebrafish. BMC Genomics 6:152. doi: 10.1186/1471-2164-6-152

Scharsack JP, Kalbe M, Schaschl H (2007) Characterization of antisera raised against stickleback (Gasterosteus aculeatus) MHC class I and class II molecules. Fish Shellfish Immunol 23:991–1002. doi: 10.1016/j.fsi.2007.03.011

Shiina T, Hosomichi K, Inoko H, Kulski JK (2009) The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 54:15–39. doi: 10.1038/jhg.2008.5 Shinya M, Sakai N, Andrews BJ (2011) Generation of Highly Homogeneous Strains of

Zebrafish Through Full Sib-Pair Mating. Genes|Genomes|Genetics 1:377–386. doi: 10.1534/g3.111.000851

Snell GD, Higgins GF (1951) Alleles at the histocompatibility-2 locus in the mouse as determined by tumor transplantation. Genetics 36:306–10.

Stet RJM, Egberts E (1991) The histocompatibility system in teleostean fishes: From multiple histocompatibility loci to a major histocompatibility complex. Fish Shellfish Immunol 1:1–16. doi: 10.1016/S1050-4648(06)80016-1

Stimpfling JH (1971) Recombination within a histocompatibility locus. Annu Rev Genet 5:121–142. doi: 10.1146/annurev.ge.05.120171.001005

Takeuchi H, Figueroa F, O’hUigin C, Klein J (1995) Cloning and characterization of class I Mhc genes of the zebrafish, Brachydanio rerio. Immunogenetics 42:77–84. doi:

(33)

19 Taylor AM, Zon LI (2009) Zebrafish tumor assays: the state of transplantation. Zebrafish

6:339–346. doi: 10.1089/zeb.2009.0607

Vandiedonck C, Knight JC (2009) The human Major Histocompatibility Complex as a paradigm in genomics research. Briefings Funct Genomics Proteomics 8:379–394. doi: 10.1093/bfgp/elp010

(34)

20 CHAPTER 2: Characterization of the Z lineage Major histocompatibility complex class I

(35)

(36)

(37)

(38)

(39)

(40)

(41)

(42)

(43)

(44)

(45)

(46)

(47)

(48)

(49)

(50)

(51)

(52)

(53)

(54)

(55)

(56)

(57)

(58)

(59)

(60)

(61)

47 CHAPTER 3: MHC class I gene loss through haplotypic variation1

Abstract

Three sequence lineages of MHC class I genes have been described in zebrafish (Danio rerio): U, Z, and L. The U lineage genes encoded on zebrafish chromosome 19 are predicted to provide the classical function of antigen presentation. This MHC class I locus displays significant haplotypic variation and is the only MHC class I locus in zebrafish that shares conserved synteny with the core mammalian MHC. Here we describe two MHC class I U lineage genes, mhc1ula and mhc1uma, that map to chromosome 22. Unlike the U lineage proteins encoded on chromosome 19, Ula and Uma may play a nonclassical role as they lack conservation of key peptide binding residues, display limited polymorphic variation, and exhibit tissue-specific expression (primarily in the gills). We also describe a second

haplotype at this chromosome 22 locus in which the mhc1ula and mhc1uma genes are absent due to a ~30 kb deletion with no other MHC class I sequences present. Functional and non-functional transcript variants of mhc1ula and mhc1uma were identified; however, mhc1uma

transcripts were often not amplified or amplified at low levels from individuals encoding the gene. Ula and Uma show sequence similarity to U proteins recently predicted from the genome of the blind cavefish (Astyanax mexicanus) suggesting that these distinct U lineage genes may be restricted to the superorder Ostariophysi.

(62)

48 Introduction

While the genes of the major histocompatibility complex (MHC) were named after the discovery that they encode the primary antigens responsible for determining transplant compatibility (Klein 2001), MHC molecules also play a central role in adaptive immunity through presenting antigens to T cells. MHC class I molecules present cytosolic antigens to CD8+ cytotoxic T cells while MHC class II molecules present exogenous peptides to CD4+ helper T cells (Horton et al. 2004). MHC class I molecules also serve as inhibitory ligands to natural killer (NK) cells to prevent the killing of normal “self” cells (Raulet and Vance 2006). MHC class I molecules can be further classified as classical or nonclassical where the classical MHC class I molecules are defined by their function of peptide presentation to CD8+ T cells. The genes encoding classical MHC class I molecules are highly polymorphic and ubiquitously expressed. The nonclassical MHC class I molecules have a similar structure to the classical molecules but exhibit different, often immune-related functions. They’re also characterized as having low-level polymorphism and tissue-specific expression. The classical MHC class I molecules serve as the primary marker of “self” to NK cells, though many nonclassical MHC class I proteins are also recognized by different classes of NK receptors (Halenius et al. 2014).

(63)

49 evolutionary origin falls shortly after that of the MHC class I genes themselves, especially in comparison to other common model species. However, it has yet to be determined if the zebrafish MHC class I molecules carry out a similar repertoire of functions as the well-characterized human HLA molecules or if these functions arose later in mammalian evolution. This question is especially relevant since the genomic organization of the genes encoding MHC class I molecules in zebrafish differs greatly from that of mammals.

Five phylogenetic lineages of MHC class I genes, termed U, Z, L, S, and P, have been identified from multiple bony fish (Dirscherl et al. 2014; Grimholt et al. 2015). The U and Z lineage proteins are predicted to bind peptides in a manner similar to classical MHC class I; however, only the U lineage genes share conserved synteny with the mammalian MHC core locus (Michalová et al. 2000). Although no functional data has been reported on the MHC class I Z lineage, there is direct experimental evidence of peptide presentation by U lineage MHC class I proteins (Scharsack et al. 2007; Chen et al. 2010). In contrast the L, S and P lineage proteins are considered nonclassical due to their lack of a classical peptide binding groove, although virtually nothing has been reported on their function (Dijkstra et al. 2007; Lukacs et al. 2010; Dirscherl et al. 2014; Grimholt et al. 2015).

Zebrafish possess only the MHC class I U, Z and L lineages and have been reported to encode a single U lineage locus on chromosome 19 that displays significant haplotypic variation (McConnell et al. 2014). The zebrafish Z lineage genes are encoded on

(64)

50 It is unclear to what extent the different zebrafish MHC class I genes encoded across multiple chromosomes contribute to the immune system’s ability to differentiate “self” from “non-self”. Using tissue transplantation as a model for evaluating the role of MHC class I matching in allorecognition, de Jong et al. have shown that matching the donor and recipient zebrafish at the MHC class I U locus on chromosome 19 significantly increases the percent of donor cells that successfully engraft; however, matching at this locus alone is insufficient to predict successful engraftment 100% of the time (de Jong et al. 2011). Additionally, it has been shown that matching individuals at the MHC class I and class II loci on chromosomes 1 and 8, respectively, in addition to the chromosome 19 locus does not significantly improve the success of engraftment (de Jong and Zon 2012). This suggests that additional MHC class I loci may be important for allorecognition.

Recent work characterizing the haplotypes of the U genes on chromosome 19 identified a putative U gene on chromosome 22 which was designated mhc1ula in accordance with nomenclature conventions (McConnell et al. 2014). McConnell et al. predicted that this chromosome 22 locus may display haplotypic variation due to the occasional presence of a band matching the predicted size of the fragment containing the

mhc1ula gene in a Southern blot hybridized with a mhc1uda probe. We recently reported the presence of a second putative U gene adjacent to mhc1ula which has been designated

mhc1uma (Dirscherl et al. 2014). This U locus was also depicted in a recent analysis of MHC class I sequences predicted from various teleost genomic databases (Grimholt et al. 2015).

Here, we report the cloning and characterization of full-length transcripts for both

(65)

51 ~30 kb deletion that completely removes the mhc1ula and mhc1uma genes. In addition, we show that mhc1ula and mhc1uma differ from the other genes in the U lineage in that they do not exhibit many of the characteristics of classical MHC class I genes including polymorphic variation, conservation of key residues, and ubiquitous expression. A better understanding of the recently identified U MHC class I locus on chromosome 22 and its relation to the core MHC class I locus on chromosome 19 will inform the evolutionary history of this dynamic gene family and may improve the success of experimental tissue transplants in zebrafish.

Materials and methods

Zebrafish and cell lines

All experiments involving live zebrafish were performed in accordance with relevant institutional and national guidelines and regulations and were approved by the North

(66)

52 Chicago (Mizgirev and Revskoy 2010). The zebrafish ZF4 fibroblast cell line was purchased from ATCC and grown at 28°C with 5% CO2 in DMEM:F-12 medium (ATCC) with 10% fetal bovine serum (Atlanta Biologicals).

Transcriptome analyses

A single CG2 zebrafish was euthanized and the kidney, intestine, gills and spleen were dissected and combined for RNA extraction (TRIzol Reagent, Life Technologies). RNA was prepared for sequencing with the TruSeq RNA kit (Illumina) and sequenced (2 x 100 bp paired end reads) on a single lane of a HiSeq2000 (Illumina). In effort to detect rare

(67)

53

Data mining and sequence analyses

Protein domains were identified using SMART software (Letunic et al. 2012). Ula and Uma α domains were used as queries to search (BLASTp) the zebrafish non-redundant protein database as well as to search (tBLASTn) the zebrafish non-redundant nucleotide database, the zebrafish EST database, and the de novo genomic assemblies from double haploid homozygous AB and TU individuals generated by the Wellcome Trust Sanger Institute (www.sanger.ac.uk). Protein sequences were aligned by Clustal Omega (Sievers et al. 2011) and phylogenetic trees constructed using MEGA4 software (Tamura et al. 2007). MHC class I sequences were from blind cavefish (Asme, Astyanyx mexicanus), zebrafish (Dare, Danio rerio), Atlantic cod (Gamo, Gadus morhua), rainbow trout (Onmy,

Oncorhynchus mykiss), Atlantic salmon (Sasa, Salmo salar), pufferfish (Taru, Takifugu rubripes), and green pufferfish (Teni, Tetraodon nigroviridis). MHC class I sequences used for comparisons (and their GenBank accession identities) are as follows: Dare-Laa

(NP_001017904.1), Dare-Lba (XP_001920787.3), Dare-Lca (XP_001340413.4), Dare-Lda (XP_001920290.4), Dare-Uba (AAH74095.1), Dare-Uca (XP_005159526.1), Dare-Uda (AAI28863.1), Dare-Uea (AAH53140.1), Dare-Ufa (AAH66754.1), Dare-Uga

(AAH56726.1), Dare-Uha (AAI24258.1), Dare-Uia (AAH93149.1), Dare-Uja

(AAH54592.1), Dare-Uka (AAI22402.1), Dare-Ula (KR086339), Dare-Uma (KR086346), Dare-Zaa (AHA37369.1), Dare-Zba (AHA37371.1), Dare-Zca (AHA37375.1), Dare-Zda (NP_001274026.1), Zea (AHA37389.1), Zfa (AHA37394.1), Zga and Dare-Zha (XP_001919291.4, two sets of α domains in one predicted transcript), Dare-Zia

(68)

54 (AHA37403.1), Gamo-Paa (GW844691.1), Onmy-Laa (ABI21842.1), Onmy-Lba

(ABI21844.1), Onmy-Lca (ABI21845.1), Onmy-Lda (BAF37937.1), Onmy-Saa (AAB57877.1), Onmy-Uba (AAG25197.1), Onmy-Uca (BAD89552.1), Onmy-Uda (AAS93695.1), Onmy-Uea (BAD89553.1), Sasa-Saa (ACY30362.1), Sasa-Uba (AAN75116.1), Sasa-Uda (ACY30371.1), Sasa-Uga (ACX35601.1), Sasa-Uha1 (ACY30367.1), Sasa-Uha2 (ACY30368.1), Sasa-Ula (ABO13870.1), Sasa-Zaa (ACX35596.1), Sasa-Zba (ACX35613.1), and Sasa-Zca (ACX35618.1). Additional

sequences include those predicted in a recent publication characterizing teleost MHC class I sequences (Grimholt et al. 2015). The sequence identifiers used by Grimholt et al. are included in the phylogenetic analyses.

Amplification of zebrafish mhc1ula and mhc1uma sequences

Adult zebrafish were euthanized, frozen in liquid nitrogen, and pulverized by mortar and pestle. Total RNA was purified from half of each individual as well as from ZF4 cells (TRIzol® Reagent, Invitrogen) and reverse transcribed into cDNA (SuperScript® III Reverse Transcriptase, Invitrogen). Primers pairs were designed to amplify the entire coding

sequences of mhc1ula and mhc1uma by RT-PCR. A primer pair that amplifies a portion of the β-actin transcript was used as a positive control for cDNA quality as previously described (Dirscherl and Yoder 2014).

(69)

55 span a single intron. The same β-actin primer pair described above was used as a positive control for genomic DNA quality.

Primer sequences and cycling parameters are provided in Online Resource 1, Table S1. All primer pairs were validated by sequencing of representative amplicons. New sequence data including transcript variants have been deposited with GenBank under accession numbers: KR086338- KR086356.

Southern blot

(70)

56

Chromosome walking

PCR primer pairs were designed to amplify 150 to 250 bp flanking genomic

amplicons (FGAs) at intervals upstream (-) and downstream (+) of mhc1ula and mhc1uma on chromosome 22 based on the reference genome (Zv9). Primer pairs were used with genomic DNA from different individual zebrafish to identify conserved FGAs. Using the Universal GenomeWalker 2.0 kit (Clontech), adapter-ligated libraries were generated by digesting genomic DNA with four different blunt-end restriction endonucleases. Chromosome walking was then accomplished by performing long-range nested PCR with gene-specific and

adapter-specific primers starting from the genomic regions encoding the -5 kb and +13 kb FGAs. FGA and chromosome walking primer sequences are provided in Online Resource 1, Table S1.

Chromosome 22 U locus genotyping

Based on the sequence results from chromosome walking, primers were designed to genotype the chromosome 22 U locus by a duplex PCR assay (primer sequences in Online Resource 1, Table S1). In this assay a single forward primer upstream of the MHC class I locus is used with two different reverse primers that amplify a 634 bp fragment from the chromosome 22 haplotype in the reference genome encoding mhc1ula and mhc1uma

(haplotype A) or a 747 bp fragment from a haplotype in which the mhc1ula and mhc1uma

sequences have been deleted (haplotype B). The assay was applied to the same panel of genomic DNA from individual zebrafish described above. The assay was also used to

(71)

57 used to genotype representative offspring at three days post fertilization (dpf) in order to define inheritance ratios.

Evaluation of tissue expression

Additional primer pairs were designed to amplify transcript segments encoding the α1 to α3 domains of mhc1ula, mhc1uma, and mhc1uga from a panel of tissue-specific cDNAs derived from an EKW individual that had been genotyped as described above. β-actin primers were used as a control for tissue cDNA quality. Primer sequences and cycling parameters are provided in Online Resource 1, Table S1.

Results

Zebrafish mhc1ula and mhc1uma map to chromosome 22

In an effort to characterize novel MHC class I sequences, transcriptome (RNA-Seq) analysis was conducted from pooled tissues (spleen, kidney, intestine, and gills) of a single zebrafish from the double haploid, clonal CG2 line (Mizgirev and Revskoy 2010). BLASTn searches of this transcriptome database with mhc1uga, which is the only U lineage MHC class I gene encoded on chromosome 19 in CG2 zebrafish (McConnell et al. 2014), as a query identified mhc1uga as well as additional transcripts encoding two novel MHC class I sequences. The genes encoding these sequences have since been named mhc1ula and

(72)

58 map to a single scaffold on chromosome 22 (Zv9_scaffold3011; GenBank:

NW_001878374.3). Using the predicted CG2 transcripts as a reference, primer pairs (Online Resource 1, Table S1) were designed to amplify transcripts encoding the entire open reading frames of mhc1ula and mhc1uma (discussed in detail below). Recovered transcripts indicate that each gene possesses six exons and that the genes are separated by 248 bp in a head to head configuration in the reference genome (Figure 1).

As all previously identified U lineage genes map to a single locus on zebrafish chromosome 19, it was unexpected that mhc1ula and mhc1uma map to a region of

chromosome 22 with no apparent shared synteny to previously described MHC class I loci (Figure 1). The nearest identifiable genes, s1pr4 and zgc:171566, are located 200 to 300 kb from mhc1ula and mhc1uma in either direction. It is noted that stx6 and ier5, which

consistently flank the MHC class I-related gene MR1 in mammals (Tsukamoto et al. 2013),

Figure 1 Genomic annotation of mhc1ula and mhc1uma on chromosome 22. Exons are represented as rectangles and introns are represented as arrows pointing in the direction of transcription. Black triangles indicate the position of the genes flanking mhc1ula and

mhc1uma. Gray triangles indicate the approximate location of stx6 and ier5 which, in mammals, flank the MHC class I-related gene MR1. Annotation is based on

(73)

59 are present on zebrafish chromosome 22 (Figure 1); however there is no identifiable MHC class I-like gene between stx6 and ier5 in zebrafish and they are located approximately 11 Mb from mhc1ula and mhc1uma.

A phylogenetic comparison of Ula and Uma sequences to those representing all known MHC class I lineages in bony fish confirms that they are most similar to genes of the MHC class I U lineage (Figure 2) and validates their nomenclature. This comparison includes MHC class I sequences of interest that were predicted from a recent analysis of various teleost genomic databases (Grimholt et al. 2015). Ula and Uma show similarity to three U sequences predicted from a single scaffold (GenBank: KB882234.1) of the blind cavefish (Asme, Astyanyx mexicanus), a teleost species that diverged ~150 million years before the zebrafish (Carlson et al. 2014). However, the genes that are reported to flank this predicted U locus in cavefish display conserved synteny with zebrafish chromosome 5 rather than

chromosome 22 (Grimholt et al. 2015).

Multiple loci encoding U lineage genes have been described in the genome of Atlantic salmon (Lukacs et al. 2010), but orthologs of the genes that flank the salmon U loci are not found near the U locus on zebrafish chromosome 22. An additional lineage of MHC class I genes has been described in salmonids and has thus been termed the S lineage. The Atlantic salmon SAA gene was originally named UAA but renamed to indicate its

(74)

60 Figure 2 Phylogenetic comparison of Ula and Uma to representative MHC class I sequences of teleosts. A minimum evolution tree was constructed from the alignment of the α1-α3 domain sequences from the different MHC class I lineages of blind cavefish (Asme, Astyanyx mexicanus), zebrafish (Dare, Danio rerio), Atlantic cod (Gamo, Gadus morhua), rainbow trout (Onmy, Oncorhynchus mykiss), Atlantic salmon (Sasa, Salmo salar), pufferfish (Taru,

(75)

61 or P lineages. While the minimum evolution tree in Figure 2 shows that Ula and Uma are the most divergent members of the group, they do fall within the U lineage.

Transcript variants

Full-length transcripts of mhc1ula and mhc1uma were sequenced from each individual zebrafish from which they were detected. The amino acid sequences of each functional transcript variant (tv) are included in Online Resource 1, Figure S1. While almost no polymorphic variation was detected for either gene, both mhc1ula and mhc1uma are subject to alternative splicing as summarized in Online Resource 1, Figure S2. All identified splice variants of mhc1ula are presumably non-functional due to a frameshift caused by the retention of an intron (tv3) or by deletions at the beginning of the α1 domain and/or the α3 domain (tv4-7). A splice variant of mhc1uma was identified that lacks the α1 domain but with the reading frame maintained (tv4-5), possibly representing a functional isoform. All other recovered mhc1uma splice variants were presumably non-functional as they retained one or two introns resulting in frameshifts (tv6-11).

Predicted nucleotide and protein sequences were identified from GenBank that correspond to some of the transcript variants detected by PCR. The zebrafish non-redundant protein database includes one predicted protein corresponding to Ula (XP_001336749.4) and three predicted protein isoforms corresponding to Uma (isoform X1, XP_005161940.1; isoform X2, XP_005161941.1; isoform X3, XP_009294249.1). The majority of the

(76)

62 three nucleotide deletion at the beginning of exon 5, a polymorphism that was observed in Uma_tv5 and Uma_tv7. The sequence encoding Uma isoform X3 has a six nucleotide deletion at the beginning of exon 6, a polymorphism that was observed in Uma_tv3. A non-coding predicted transcript corresponding to mhc1ula (XR_659472.1) is present in the non-redundant nucleotide database with a deletion of 22 nucleotides at the beginning of the α3 domain, the same deletion that was observed in Uma_tv6 and Uma_tv7.

No full-length transcripts encoding Ula or Uma can be identified in the current EST database. However, the complete coding sequence of mhc1ula can be accounted for by the joining of two ESTs (start to α3 domain, EH452054.1; α3 domain to stop, EH477854.1). The

mhc1uma coding sequence is only supported by one EST that spans the start codon through part of the α1 domain (EB901873.1). In the de novo genomic assembly of a double haploid homozygous TU individual (www.sanger.ac.uk), contigs can be found that encode all three α domains for both mhc1ula and mhc1uma. In a similar de novo assembly from an AB

individual (www.sanger.ac.uk), all three α domains of mhc1uma are present, but only the α1 domain, transmembrane, and cytoplasmic exons of mhc1ula are present. It is unclear if this represents an alternate haplotype in which a functional copy of only mhc1uma is present or if this is due to insufficient sequence coverage in the assembly.

Searching outside the zebrafish database did not reveal any definitive orthologs of

mhc1ula or mhc1uma. When the α1 to α3 domains of Ula or Uma are compared to the

(77)

63 match U proteins encoded on chromosome 19. As discussed above, a phylogenetic

comparison of Ula and Uma to U proteins that were recently predicted from the genome of cavefish revealed three putative cavefish genes similar to zebrafish Ula and Uma (Figure 2).

Divergent residues at locations of functional consequence

(78)

64 Figure 3 Zebrafish MHC class I U lineage protein sequence alignment. The zebrafish U proteins included in Figure 2 were organized into domains and aligned. The leader and α domains are each encoded by a single exon while the transmembrane (TM) and cytoplasmic (Cyt) domains are encoded by a variable and sometimes unknown number of exons.

Numbering of amino acids is based on the Uba sequence starting with the first residue of the α1 domain. Positions that are at least 50% identical are shaded in black and similar residues are shaded in gray. Conserved features are indicated with the following symbols below the alignment: cysteine residues likely involved in the Ig-fold (*), putative anchor residues implicated in binding the amino- (n) or carboxy-terminus (c) of peptides, acidic residues in the α3 domain predicted to bind CD8 (a), and the putative transmembrane domains (/). Note: the indicated transmembrane region does not accurately portray the borders of all

(79)