Differential Gene Expression

(1)

The Punchline

The selective production of different proteins within cells creates cellular diversity.

As the single-celled zygote divides to start the generation of all the cells making up an organism, differences in the expression of genes in these cells govern maturation toward distinct cell types. Many regulatory mechanisms targeting DNA access, RNA production and processing, and protein synthesis and modification lead to this differential gene expression. They include using a specific repertoire of transcription factors that bind gene promoters to enhance or repress transcription, modifying histones to modulate the accessibility of chromatin, and degrading and alternative splicing of RNA to change the coded message for different protein construction. In addition, translational controls and posttranslational modifications of proteins as well as changes in protein transport affect what proteins are created and where they function. Use of these numerous mechanisms at different times and in different cells fuels the creation of different cell types as the embryo develops.

Differential Gene Expression

Mechanisms of Cell Differentiation

From onE cEll comE many, and of many different types. That is the seemingly miraculous phenomenon of embryonic development. How is it possible that such a diversity of cell types within a multicellular organism can be derived from a single cell, the fertilized egg? Cytological studies done at the start of the twentieth century established that the chromosomes in each cell of an organism’s body are the mitotic descendants of the chromosomes established at fertilization (Wilson 1896; Boveri 1904). In other words, each somatic cell nucleus has the same chromosomes—and therefore the same set of genes—as all other somatic cell nuclei. This fundamental concept, known as genomic equivalence, presented a significant conceptual dilemma. If every cell in the body contains the genes for hemoglobin and insulin, for example, why are hemoglobin proteins made only in red blood cells and insulin proteins only in certain pancreatic cells? Based on the embryologi- cal evidence for genomic equivalence (as well as on bacterial models of gene regulation), a consensus emerged in the 1960s that the answer lies in differential gene expression.

Defining Differential Gene Expression

Differential gene expression is the process by which cells become different from one another based upon the unique combination of genes that are active or “expressed.” By expressing

3

What underlies cell

differentiation?

(2)

different genes, cells can create different proteins that lead to the differentiation of different cell types. There are three postulates of differential gene expression:

1. Every somatic cell nucleus of an organism contains the complete genome established in the fertilized egg. In molecular terms, the DNAs of all differentiated cells are identical.

2. The unused genes in differentiated cells are neither destroyed nor mutated; they retain the potential for being expressed.

3. Only a small percentage of the genome is expressed in each cell, and a portion of the RNA synthesized in each cell is specific for that cell type.

By the late 1980s, it was established that gene expression can be regulated at four levels such that different cell types synthesize different sets of proteins:

1. Differential gene transcription regulates which of the nuclear genes are transcribed into nuclear RNA.

2. Selective nuclear RNA processing regulates which of the transcribed RNAs (or which parts of such a nuclear RNA) are able to enter into the cytoplasm and become messenger RNAs.

3. Selective messenger RNA translation regulates which of the mRNAs in the cytoplasm are translated into proteins.

4. Differential protein modification regulates which proteins are allowed to remain and/

or function in the cell.

Some genes (such as those coding for the globin protein subunits of hemoglobin) are regulated at all these levels.

Dev TuTorial Differential Gene Expression in this tutorial, Dr.

Michael Barresi discusses the basics of gene regulation and how differences in this regulation can lead to unique developmental patterns.

WEb TopIc 3.1 DoEs ThE GEnomE or ThE cyToPlasm DirEcT DEvEloPmEnT?

The geneticists versus the embryologists. Geneticists were certain that genes controlled development, whereas embryologists generally favored the cytoplasm. Both sides had excellent evidence for their positions.

WEb TopIc 3.2 ThE oriGins oF DEvEloPmEnTal GEnETics The first hypotheses for differential gene expression came from C. H. Waddington, Salome Gluecksohn- Waelsch, and other scientists who understood both embryology and genetics.

Quick Primer on the central Dogma

To properly comprehend all the mechanisms regulating the differential expression of a gene, you must first understand the principles of the central dogma of biology. The central dogma pertains to the sequence of events that enables the use and transfer of information to make the proteins of a cell (FiGUrE 3.1). Central to this theory is the sequenced order of deoxyribonucleotides in double-stranded DNA that provides the informative code or blueprints for the precise combination of amino acids needed to build specific proteins. Proteins are not made directly from DNA, however; rather, the information laid out in the sequence of DNA bases is first copied or transcribed into a single-stranded polymer of similar molecules called a nuclear ribonucleic acid (nRNA).

The process of copying DNA into RNA is called transcription, and the RNA produced from a given gene is often referred to as a transcript. Although the transcribed nRNA includes the information to code for a protein, it can also hold non-protein-coding (simply called “noncoding”) information. The nRNA strand will undergo processing to excise the noncoding domains and protect the ends of the strand to yield a messenger rna (mrna) molecule. mRNA is transported out of the nucleus into the cytoplasm

(3)

where it can interact with a ribosome and convey its message for the synthesis of a specific protein. mRNA unveils the comple- mentary sequence of DNA three bases at a time, each triplet being called a codon. Each codon calls for a specific amino acid that will be covalently attached to its neighboring amino acid denoted by the codon next in line. In this manner, translation leads to the synthesis of a polypeptide chain that will undergo protein folding and potential modification by the addition of various functional moieties such as carbohydrates, phosphates, or cholesterol groups. The completed protein is now ready to carry out its specific function serving to support the structural or functional properties of the cell. Cells that express different proteins will therefore possess different structural and functional properties, making it a distinct type of cell.

Evidence for Genomic Equivalence

Until the mid-twentieth century, genomic equivalence was not so much proved as it was assumed (because every cell is the mitotic descendant of the fertilized egg). One of the first tasks of developmental genetics was to determine whether every cell of an organism indeed does have the same genome—that is, the same set of genes—as every other cell.

Evidence that every cell in the body has the same genome originally came from the analysis of Drosophila chromosomes, in which the DNA of certain larval tissues under- goes numerous rounds of DNA replication without separation such that the structure of the chromosomes can be seen. In these polytene (Greek, “many strands”) chromosomes, no structural differences were seen between cells; however, different regions of the chromosomes were “puffed up” at different times and in different cell types, which suggested that these areas were actively making RNA (FiGUrE 3.2a; Beermann 1952).

When Giemsa dyes allowed such observations to be made in mammalian chromosomes, it was also found that no chromosomal regions were lost in most cells. These observations, in turn, were confirmed by nucleic acid in situ hybridization studies, a technique that enables the visualization of the spatial and temporal pattern of specific gene (mRNA) expression in the embryo (see Figure 3.35). For instance, the mRNA of the odd-skipped gene is present in cells that display a segmented pattern in the Dro- sophila embryo, a pattern that changes over time (FiGUrE 3.2B). Similarly, the mouse homolog of odd-skipped, called odd-skipped related 1, is differentially expressed in cells of specific structures such as the segmented branchial arches, the limb buds, and the

nRNA

mRNA AAAA

AAAA

AAAA DNA Nucleus

Transcription 1

Processing 2

Transport out of nucleus 3

Translation 4

Protein folding and modi�cation 5

Carry out function 6

Ribosome Amino

acid chain FiGUrE 3.1 The central dogma of biology. A simplified schematic

of the key steps in the process of gene and protein expression.

(1) Transcription. In the nucleus, a region of the genomic DNA is seen accessible to a RNA polymerase, which transcribes an exact comple- mentary copy of the gene in the form of a single-stranded nuclear RNA molecule. The gene is now said to be “expressed.” (2) Process- ing. The nRNA transcript undergoes processing to make a finalized messenger RNA strand, which is transported out of the nucleus (3).

(4) Translation. mRNA complexes with a ribosome, and its information is translated into an ordered polymer of amino acids. (5) Protein folding and modification. This polypeptide adopts secondary and tertiary structures through proper folding and potential modifications (such as the addition of a carbohydrate group as seen here). (6) Carry out function. The protein is now said to be “expressed” and can carry out its specific function (such as functioning as a transmembrane receptor).

(4)

heart (FiGUrE 3.2c). Is the DNA in an organism’s cells that is now expressing different genes truly still the same, however? Does it still possess the same potential to make any cell? The ultimate test of whether the nucleus of a differentiated cell has undergone irreversible functional restriction is to have that nucleus generate every other type of differentiated cell in the body. If each cell’s nucleus is identical to the zygote nucleus, each cell’s nucleus should also be capable of directing the entire development of the organism when transplanted into an activated enucleated egg. Although such experi- ments had been proposed in the 1930s, the first demonstration that a nucleus from an adult mammalian somatic cell could direct the development of an entire animal didn’t come until 1997, when Dolly the sheep was cloned.

Ian Wilmut and colleagues took cells from the mammary gland of a 6-year-old pregnant ewe and placed them in culture (FiGUrE 3.3a; Wilmut et al. 1997). The culture medium was formulated to keep the cell nuclei at the intact diploid stage (G1) of the cell cycle; this cell-cycle stage turned out to be critical. The researchers then obtained oocytes from a different strain of sheep and removed their nuclei. These oocytes had to be in the second meiotic metaphase, the stage at which they are usually fertilized. The donor cell and the enucleated oocyte were brought together, and electric pulses were sent through them, thereby destabilizing the cell membranes and allowing the cells to fuse.

The same electric pulses that fused the cells activated the egg to begin development. The resulting embryos were eventually transferred into the uteri of pregnant sheep.

WEb TopIc 3.3 ThE 2012 noBEl PrizE For PhysioloGy or mEDicinE: clon- inG anD nUclEar EQUivalEncE The final “proof” of genomic equivalence was the demonstration that the nuclei of differentiated somatic cells could generate any cell type in the body.

Of the 434 sheep oocytes originally used in this experiment, only one survived:

Dolly¹ (FiGUrE 3.3B). DNA analysis confirmed that the nuclei of Dolly’s cells were derived from the strain of sheep from which the donor nucleus was taken (Ashworth et al. 1998; Signer et al. 1998). Cloning of adult mammals has been confirmed in guinea pigs, rabbits, rats, mice, dogs, cats, horses, and cows. In 2003, a cloned mule became the first sterile animal to be so reproduced (Woods et al. 2003). Thus, it appears that

1The creation of Dolly was the result of a combination of scientific and social circumstances.

These circumstances involved job security, people with different areas of expertise meeting one another, children’s school holidays, international politics, and who sits near whom in a pub.

The complex interconnections giving rise to Dolly are told in The Second Creation (Wilmut et al. 2000), a book that should be read by anyone who wants to know how contemporary science actually works. As Wilmut acknowledged (p. 36), “The story may seem a bit messy, but that’s because life is messy, and science is a slice of life.”

(A) (C) odd-skipped related 1

odd-skipped (stage 9) odd-skipped (stage 5) (B)

FiGUrE 3.2 Gene expression.

(A) Transmission electron micrograph of a polytene chromosome from a salivary gland cell of Chironomus tentans show- ing three giant puffs indicating active transcription in these regions (arrows).

(B) mRNA expression of the odd-skipped gene in a stage 5 and a stage 9 Drosoph- ila embryo (blue). (C) mRNA expression of the odd-skipped related 1 gene in a 11.5 days post-conception mouse embryo (blue). (A from Daneholt, 1975; B from Weiszmann et al. 2009; C from So and Danielian, 1999.)

(5)

the nuclei of vertebrate adult somatic cells contain all the genes needed to generate an adult organism. No genes necessary for development have been lost or mutated in the somatic cells; their nuclei are equivalent.²

2Although all the organs were properly formed in the cloned animals, many of the clones developed debilitating diseases as they matured (Humphreys et al. 2001; Jaenisch and Wilmut 2001; Kolata 2001). As we will see shortly, this problem is due in large part to the differences in methylation between the chromatin of the zygote and the differentiated cell.

(A) OOCYTE DONOR (B)

(Scottish blackface strain) NUCLEAR DONOR (Finn-Dorset strain)

Udder cells placed in culture, grown in G1 stage Meiotic

spindle

Eggs removed Udder cells removed

Remove spindle

Transfer cell into enucleated egg Micropipette

Enucleated egg

Egg and cell fused with electric current

Embryo cultured 7 days

Blastocyst forms

Embryo transferred to surrogate mother (Scottish blackface)

Birth of Dolly (Finn-Dorset lamb genetically identical to nuclear donor)

FiGUrE 3.3 Cloning a mammal using nuclei from adult somatic cells. (A) Procedure used for cloning sheep. (B) Dolly, the adult sheep on the left, was derived by fusing a mammary gland cell nucleus with an enucleated oocyte, which was then implanted in a surrogate mother (of a different breed of sheep) that gave birth to Dolly. Dolly later gave birth to a lamb (Bonnie, at right) by normal reproduc- tion. (A after Wilmut et al. 2000; B photograph by Roddy Field © Roslin Institute.)

(6)

sciEnTisTs sPEaK 3.1 listen to Sir ian Wilmut discuss cloning and cellular reprogramming.

modulating access to Genes

So how does the same genome give rise to different cell types? To address this ques- tion, we need to first understand the anatomy of genes. A fundamental difference distinguishing most eukaryotic genes from prokaryotic genes is that eukaryotic genes are contained within a complex of DNA and protein called chromatin. The protein component constitutes about half the weight of chromatin and is composed largely of histones. The nucleosome is the basic unit of chromatin structure (FiGUrE 3.4a,B). It is composed of an octamer of histone proteins (two molecules each of histones H2A, H2B, H3, and H4) wrapped with two loops containing approximately 147 base pairs of DNA (Kornberg and Thomas 1974). Histone H1 is bound to the 60 to 80 or so base pairs of “linker” DNA between the nucleosomes (Weintraub 1984, 1985). There are more than a dozen contacts between the DNA and the histones (Luger et al. 1997;

Bartke et al. 2010), which function to enable the remarkable packaging of more than 6 feet of DNA into the approximately 6 micrometer (in diameter) nucleus of each human cell (Schones and Zhao 2008).

Whereas classical geneticists have likened genes to “beads on a string,” molecular geneticists liken genes to “string on the beads,” an image in which the beads are nucleosomes. Much of the time, the nucleosomes appear to be wound into tight structures called solenoids that are stabilized by histone H1 (FiGUrE 3.4c). This H1-depen- dent conformation of nucleosomes inhibits the transcription of genes in somatic cells by packing adjacent nucleosomes together into tight arrays that prevent transcription factors and RNA polymerases from gaining access to the genes (Thoma et al. 1979;

Schlissel and Brown 1984). Chromatin regions that are tightly packed are called heterochromatin, and regions loosely packed are called euchromatin. One way to achieve differential gene expression is by regulating how tightly packed a given region of chromatin may be, thereby regulating whether genes are even accessible for transcription.

Loosening and tightening chromatin: Histones as gatekeepers

Histones are critical because they appear to be responsible for either facilitating or forbidding gene expression (FiGUrE 3.4D). Repression and activation are controlled to a large extent by modifying the “tails” of histones H3 and H4 with two small organic groups: methyl (CH₃) and acetyl (COCH₃) residues. In general, histone acetylation—the addition of negatively charged acetyl groups to histones—neutralizes the basic charge of lysine and loosens the histones, which activates transcription. Enzymes known as histone acetyltransferases place acetyl groups on histones (especially on lysines in H3 and H4), destabilizing the nucleosomes so that they come apart easily (become more euchromatic). As might be expected, then, enzymes that remove acetyl groups—histone deacetylases—stabilize the nucleosomes (which become more het- erochromatic) and prevent transcription.

histone methylation is the addition of methyl groups to histones by enzymes called histone methyltransferases.Although histone methylation more often results in het- erochromatic states and transcriptional repression, it can also activate transcription depending on the amino acid being methylated and the presence of other methyl or acetyl groups in the vicinity (see Strahl and Allis 2000; Cosgrove et al. 2004). For instance, acetylation of the tails of H3 and H4 along with the addition of three methyl groups on the lysine at position four of H3 (i.e., H3K4me3; remember that K is the abbrevia- tion for lysine) is usually associated with actively transcribed chromatin. In contrast, a combined lack of acetylation of the H3 and H4 tails and methylation of the lysine in the ninth position of H3 (H3K9) is usually associated with highly repressed chromatin (Norma et al. 2001). Indeed, lysine methylations at H3K9, H3K27, and H4K20 are often

(7)

associated with highly repressed chromatin. FiGUrE 3.5 depicts a nucleosome with lysine residues on its H3 tail. Modifications of such residues regulate transcription.

If methyl groups at specific places on histones repress transcription, getting rid of these methyl moieties should be expected to permit transcription. That has been shown to be the case in the activation of the Hox genes, a family of genes that are critical in giving cells their identities along the anterior-posterior body axis. In early development, Hox genes are repressed by H3K27 trimethylation (the lysine at position 27 on histone 3 has three methyl groups: H3K27me3). In differentiated cells, however, a demethylase specific for H3K27me3 is recruited to these regions, eliminating the methyl groups and Histone

octamer

H1histones Nucleosome (B)

H3

H3 tail

H3

H4

H4 tail

H4 H4 tail H2B

H2B

H2A H2A tail

H2B tail

H2B tail H2A tail

Histone octamer Histone octamer

DNA Nucleosome DNA

(D)

H3, H4 tails methylated

H2, H3, H4 tails acetylated Condensed nucleosomes:

Histone tails largely methylated

Uncondensed nucleosomes:

Histone tails largely unmethylated and acetylated

DNA “wrap” Linker DNA

Histone core of nucleosome H1

Chromatin DNA

(A)

(C)

FiGUrE 3.4 Nucleosome and chromatin structure. (A) Model of nucleosome structure as seen by X-ray crystallography at a resolution of 1.9 Å. Histones H2A and H2B are yellow and red, respectively; H3 is purple; and H4 is green. The DNA helix (gray) winds around the protein core. The histone “tails” that extend from the core are the sites of acetylation and methylation, which may disrupt or stabilize, respectively, the formation of nucleosome assemblages. (B) Histone H1 can draw nucleosomes together into compact forms. About 147 base pairs of DNA encircle each histone octamer, and about 60 to 80 base pairs of DNA link the nucleosomes together. (C) Model for the arrangement of nucleosomes in the highly compacted solenoidal chromatin structure.

Histone tails protruding from the nucleosome subunits allow for the attachment of chemical groups. (D) Methyl groups condense nucleosomes more tightly, preventing access to promoter sites and thus preventing gene transcription. Acetylation loosens nucleosome packing, exposing the DNA to RNA polymerase II and transcription factors that will activate the genes. (A after Davey et al. 2002.)

(8)

enabling access to the gene for transcription (Agger et al. 2007; Lan et al. 2007). The effects of methylation in controlling gene transcription are extensive.

Maintaining a memory of methylation

The modifications of histones can also signal the recruitment of proteins that retain the memory of the transcriptional state from generation to generation as cells go through mitosis. They are the proteins of the Trithorax and Polycomb families. When bound to the nucleosomes of active genes, Trithorax proteins keep these genes active, whereas Polycombproteins, which bind to condensed nucleosomes, keep the genes in a repressed state.

The Polycomb proteins fall into two categories that act sequentially in repression. The first set has histone methyltransferase activities that methylate lysines H3K27 and H3K9 to repress gene activity. In many organisms, this repressed state is stabilized by the activity of a second set of Polycomb factors, which bind to the methylated tails of histone 3 and keep the methylation active and also methylate adjacent nucleosomes, thereby forming tightly packed repressive complexes (Grossniklaus and Paro 2007; Margueron et al. 2009).

The Trithorax proteins help retain the memory of activation; they act to counter the effect of the Polycomb proteins. Trithorax proteins can modify the nucleosomes or alter their positions on the chromatin, allowing transcription factors to bind to the DNA previously covered by them. Other Trithorax proteins keep the H3K4 lysine trimethylated (preventing its demethylation into a dimethylated, repressed state; Tan et al. 2008).

anatomy of the Gene

So far, we have documented that modulating the access to a gene, largely by histone methylation, affects gene expression. Later in this chapter, we will discuss the excit- ing research on the direct control of transcription by DNA methylation. Now that we understand that modifying histones can grant access to regions of the genome, we can ask, what mechanisms exist to influence gene transcription more directly? More simply, once a gene is accessible, how can it be turned on and off? Before we answer, we need a basic understanding of the parts that make up a gene and how those parts can influence gene expression.

Exons and introns

A fundamental feature that distinguishes eukaryotic from prokaryotic genes (along with eukaryotic genes being contained within chromatin) is that eukaryotic genes are not co-linear with their peptide products. Rather, the single nucleic acid strand of eukaryotic mRNA comes from noncontiguous regions on the chromosome. Exons are

CHD1BPTF HP1

PC EAF3 53BP1

Transcriptional activation Transcriptional

memory Cell cycle

regulation

Transcriptional

elongation Silent heterochromatin

H3

H3 tail

H3 H4

H4 H2B

H2B H2A Histone octamer DNA

9 4 27 79 38

FiGUrE 3.5 Histone methylations on histone H3. The tail of histone H3 (its amino- terminal sequence, at the beginning of the protein) sticks out from the nucleosome and is capable of being methylated or acetylated.

Here, lysines can be methylated and rec- ognized by particular proteins. Methylated lysine residues at positions 4, 38, and 79 are associated with gene activation, whereas methylated lysines at positions 9 and 27 are associated with repression. The proteins binding these sites (not shown to scale) are represented above the methyl group. (After Kouzarides and Berger 2007.)

(9)

the regions of DNA that code for parts of a protein³; between exons, however, are intervening sequences called introns that have nothing whatsoever to do with the amino acid sequence of the protein. To help illustrate the structural components of a typical eukaryotic gene, we highlight the anatomy of the human β-globin gene (FiGUrE 3.6).

This gene, which encodes part of the hemoglobin protein of the red blood cells, consists of the following elements:

• A promoter region, where RNA polymerase II binds to initiate transcription.

The promoter region of the human β-globin gene has three distinct units and extends from 95 to 26 base pairs before (“upstream from”)⁴ the transcription initiation site (i.e., from –95 to –26). Some promoters have the DNA sequence TATA (called the “TATA-box”), which binds the basal or general transcription factor (TATA-binding protein, TBP) that helps anchor RNA polymerase II to the promoter.

• The transcription initiation site,which for human β-globin is ACATTTG.

This site is often called the cap sequence because it is the DNA sequence

3The term exon refers to a nucleotide sequence whose RNA “exits” the nucleus. It has taken on the functional definition of a protein-encoding nucleotide sequence. Leader sequences and 3′ UTR sequences are also derived from exons, even though they are not translated into protein.

4By convention, upstream, downstream, 5′, and 3′ directions are specified in relation to the RNA. Thus, the promoter is upstream of the gene, near to and “before” its 5′^end.

AAAA...A_OH

“Tail”

H₂N Promoter region

(RNA polymerase II binding)

Transcription terminates Transcription

initiation

Translation terminator codon (TAA)

Translation

initiation codon (ATG) PolyA

addition site

TBPsite

AAAA...A_OH

“Tail”

β-GLOBIN PROTEIN GENE(DNA)

FORβ-GLOBIN

NUCLEAR RNA (nRNA)

MESSENGER RNA (mRNA)

Intron 2

m⁷ GpppAC (“Cap”)

COOH

HEMOGLOBIN

Transcription

Processing

Translation

Posttranslational modi�cation

UAA AUG

Leader Leader sequence

Exon 1

Intron 1 Exon 2 Exon 3

Exon 3 Intron 2

Exon 2 Intron 1

Exon 1

Exon 1 Exon 2 Exon 3

a-globin b-globin

Heme

FiGUrE 3.6 Steps in the production of β-globin and hemoglobin.

Transcription of the β-globin gene creates a nuclear RNA containing exons and introns as well as the cap, tail, and 3′^{and 5}′ untranslated regions. Processing the nuclear RNA into messenger RNA removes the introns. Translation on ribo- somes uses the mRNA to encode a protein. The β-globin protein is inactive until it is modified and com- plexed with α-globin and heme to become active hemoglobin (bottom).

(10)

that will code for the addition of a modified nucleotide “cap” at the 5′ end of the RNA soon after it is transcribed. The specific cap sequence varies among genes. This cap sequence begins the first exon.

• The 5′ untranslated region (5′ UTr), also called the leader sequence. In the human β-globin gene, it is the sequence of 50 base pairs intervening between the initiation points of transcription and translation. The 5′ UTR can determine the rate at which translation is initiated.

• The translation initiation site, aTG. This codon (which becomes AUG in mRNA) is located 50 base pairs after the transcription initiation site in the human β-globin gene (this distance differs greatly among different genes).

The ATG translation start sequence is the same in every gene.

• The protein-encoding portion of the first exon, which contains 90 base pairs coding for amino acids 1–30 of human β-globin protein.

• An intron containing 130 base pairs with no coding sequences for β-globin.

The structure of this intron, however, is important in enabling the RNA to be processed into mRNA and exit from the nucleus.

• An exon containing 222 base pairs coding for amino acids 31–104.

• A large intron—850 base pairs—having nothing to do with β-globin protein structure.

• An exon containing 126 base pairs coding for amino acids 105–146 of the protein.

• A translation termination codon, Taa. This codon becomes UAA in the mRNA. When a ribosome encounters this codon, the ribosome dissociates, and the protein is released. Translation termination can also be represented by the TAG or TGA codon sequences in other genes.

• A 3′ untranslated region (3′ UTr) that, although transcribed, is not translated into protein. This region includes the sequence AATAAA, which is needed for polyadenylation, the insertion of a “tail” of some 200–300 adenylate residues on the RNA transcript, about 20 bases downstream of the AAUAAA sequence. This polyA tail (1) confers stability on the mRNA, (2) allows the mRNA to exit the nucleus, and (3) permits the mRNA to be translated into protein.

• A transcription termination sequence. Transcription continues beyond the AATAAA site for about 1000 nucleotides before being terminated.

The original transcription product is called nuclear rna (nrna) or, sometimes, heterogeneous nuclear RNA (hnRNA) or pre-messenger RNA (pre-mRNA). Nuclear RNA contains the cap sequence, the 5′ UTR, exons, introns, and the 3′ UTR. Both ends of these transcripts are modified before these RNAs leave the nucleus. A cap consisting of methylated guanosine is placed on the 5′ end of the RNA in opposite polarity to the RNA itself, which means that there is no free 5′ phosphate group on the nRNA.

The 5′ cap is necessary for the binding of mRNA to the ribosome and for subsequent translation (Shatkin 1976). The 3′ terminus is usually modified in the nucleus by the addition of a polyA tail. The adenylate residues in this tail are added to the transcript enzymatically; they are not part of the gene sequence. Both the 5′ and 3′ modifications may protect the mRNA from exonucleases that would otherwise digest it (Sheiness and Darnell 1973; Gedamu and Dixon 1978). The modifications thus stabilize the message and its precursor.

Before the nRNA leaves the nucleus, its introns are removed and the remaining exons spliced together. In this way, the coding regions of the mRNA—that is, the exons—are brought together to form a single uninterrupted transcript, and this transcript is translated into a protein. The protein can be further modified to make it functional (see Figure 3.6).

(11)

Cis regulatory elements: The on, off, and dimmer switches of a gene

In addition to the protein-encoding region of the gene, regulatory sequences can be located on either end of the gene (or even within it). These regulatory sequences—the promoter, enhancers, and silencers—are necessary for controlling where, when, and how actively a particular gene is transcribed. When located on the same chromosome as the gene (and they usually are), they can be referred to as cis-regulatory elements.⁵

Promoters are sites where RNA polymerase II binds to the DNA sequence to initiate transcription. Promoters of genes that synthesize messenger RNAs (i.e., those genes that encode proteins⁶) are typically located immediately upstream from the site where RNA polymerase II initiates transcription. Most of these promoters contain a stretch of about 1000 base pairs that is rich in the sequence CG, often referred to as CpG (a C and a G connected through the normal phosphate bond). These regions are called cpG islands (Down and Hubbard 2002; Deaton and Bird 2011). The reason transcription is initiated near CpG islands is thought to involve proteins called basal transcription factors, which are present in every cell and specifically bind to the CpG-rich sites. These basal transcription factor proteins form a “saddle” that can recruit RNA polymerase II and position it appropriately for the polymerase to begin transcription (Kostrewa et al.

2009).

RNA polymerase II does not bind to every promoter in the genome at the same time, however. Rather, it is recruited to and stabilized on the promoters by DNA sequences called enhancers that signal where and when a promoter can be used and how much gene product to make. In other words, enhancers control the efficiency and rate of transcription from a specific promoter (see Ong and Corces 2011). In contrast, DNA sequences called silencers can prevent promoter use and inhibit gene transcription.

Transcription factors are proteins that bind DNA with precise sequence recognition for specific promoters, enhancers, or silencers. Transcription factors that bind enhanc- erscan activate a gene by (1) recruiting enzymes (such as histone acetyltransferases) that break up the nucleosomes in the area or (2) stabilizing the transcription initiation complex as described above. Thus, transcription factors usually work in two nonexclu- sive ways:

1. Once bound, transcription factors can bind cofactors that recruit nucleosome- modifying proteins (such as histone methyltransferases and acetyltransferases) that make that area of the genome accessible for RNA polymerase II to bind and enable the chromatin in that vicinity to be unwound and transcribed.

2. Transcription factors can form bridges, looping the chromatin such that the transcription factors (and their histone-modifying enzymes) on enhancers can be brought into the vicinity of the promoter. In the activation of mammalian β-globin

5 Cis- and trans-regulatory elements are so named by analogy with E. coli genetics and organic chemistry. Therefore, cis-elements are regulatory elements that reside on the same chromo- some (cis-, “on the same side as”), whereas trans-elements are those that could be supplied from another chromosome (trans-, “on the other side of”). The term cis-regulatory elements now refers to those DNA sequences that regulate a gene on the same stretch of DNA (i.e., the pro- moters and enhancers). Trans-regulatory factors are soluble molecules whose genes are located elsewhere in the genome and that bind to the cis-regulatory elements. They are usually tran- scription factors or microRNAs. Some evidence points to the ability of an enhancer to activate a trans-promoter (i.e., a promoter on another chromosome), but such cases appear to be excep- tional and rare events (Noordermeer et al. 2011).

6In the case of protein-encoding genes, RNA polymerase II is used for transcription. There are several types of RNA that do not encode proteins, including the ribosomal RNAs and transfer RNAs (which are used in protein synthesis) and the small nuclear RNAs (which are used in RNA processing). In addition, there are regulatory RNAs (such as the microRNAs and long noncoding RNAs that we will discuss later in this chapter) that are involved in regulating gene expression and are not translated into peptides. These regulatory RNAs often are transcribed by other RNA polymerases.

(12)

genes, such a bridge uniting the promoter and enhancer is formed by proteins that bind to transcription factors on both the enhancer and promoter sequences. These proteins recruit the nucleosome-modifying enzymes and transcription- associated factors (TAFs) that stabilize RNA polymerase II (FiGUrE 3.7; Gurdon 2016; Deng et al. 2012; Noordermeer and Duboule 2013).

ThE mEDiaTor comPlEx: linKinG EnhancEr anD PromoTEr In many genes, a bridge between enhancer and promoter is made by a large, multimeric complex called the mediator, whose nearly 30 protein subunits connect RNA polymerase II to enhancer regions that relay developmental signals (Malik and Roeder 2010). This bridge forms the pre-initiation complex at the promoter. Therefore, the Mediator helps create a chromatin loop, bringing the enhancers to the promoter. This chromatin loop is stabilized by the protein cohesin, which wraps around portions of this loop like a ring upon association with the Mediator after the Mediator is bound by transcription factors (FiGUrE 3.8).

Although the Mediator may help bring the RNA polymerase II to the promoter, for transcription to take place the connection between the Mediator and the RNA polymerase II has to be broken, and RNA polymerase II must be released from the promoter.

The release of RNA polymerase II is accomplished by a transcription elongation complex (TEc), which is made up of several transcription factors and enzymes (e.g., Ikaros, NuRD, and P-TEFb⁷; Bottardi et al. 2015). This release coincides with the capping of the transcript, phosphorylation of the polymerase, and elongation of the transcript. In some instances (discussed later in the chapter), however, the RNA polymerase II either does not dissociate from the Mediator, or it dissociates but only transcribes a short stretch of nucleotides before it pauses. In the latter case, a transcription elongation suppressor (such as NELF) functions to prevent the TEC from associating with the polymerase, and the RNA polymerase II is paused, held in readiness for a new developmental signal.

EnhancEr FUncTioninG One of the principal methods of identifying enhancer sequences is to clone DNA sequences flanking the gene of interest and fuse them to reporter genes whose products are both readily identifiable and not usually made in the organism being studied. Researchers can insert constructs of possible enhancers

7Ikaros is a type of zinc-finger transcription factor that binds the histone deacetylase NuRD, which recruits P-TEFb (Positive transcription elongation factor b) to form a complex that breaks transcriptional pausing and promotes nRNA elongation (Bottardi et al. 2015). Interestingly, the repertoire of bound factors can be gene specific. For example, progenitor blood cells expressing high levels of Ikaros differentiate into various types of white blood cells, and those expressing low levels differentiate mostly into red blood cells (Frances et al. 2011).

Ldb1

Enhancer

No transcription

RNApolymerase Promoter Enhancer

Promoter GATA1 Ldb1 Ldb1

GATA1 GATA1

Transcription (A)

(B)

FiGUrE 3.7 The bridge between enhancer and promoter can be made by transcription factors. Certain transcription factors bind to DNA on the promoter (where RNA polymerase II will initiate transcription), whereas other transcription factors bind to the enhancer (which regulates when and where transcription can occur). Other transcription factors do not bind to the DNA; rather, they link the transcription factors that have bound to the enhancer and promoter sequences. In this way, the chromatin loops to bring the enhancer to the promoter.

The example shown here is the mouse β-globin gene. (A) Transcrip- tion factors assemble on the enhancer, but the promoter is not used until the GATA1 transcription factor binds to the promoter. (B) GATA1 can recruit several other factors, including Ldb1, which forms a link uniting the enhancer-bound factors to the promoter-bound factors.

(After Deng et al. 2012.)

(13)

with reporter genes into embryos and then monitor the spatial and temporal pattern of expression displayed by the visible protein product of the reporter gene (such as green fluorescent protein, GFP; FiGUrE 3.9a). If the sequence contains an enhancer, the reporter gene should become active at particular times and places. For instance, the E.

coli gene for β-galactosidase (the lacZgene) can be used as a reporter gene and fused to (A)

(B) Enzymes

Transcription factors

(C)

(D)

(E)

Acetyl groups

Methyl groups Enhancer

Mediator

Nascent

transcript blocked

RNA PII Cohesin

Elongation Pausing

PE2

RNA PII NELF

Transcript elongates RNA PII TEC

IIE IIF IIH IIA IIB

IID

IIA IID IIA IID

Mediator

Mediator Promoter

FiGUrE 3.8 The role of the Media- tor complex in forming the transcription pre-initiation complex. (A) Relatively open chromatin is composed of DNA coiled around nucleosomes. (B) Transcription factors bind to the enhancer and bind to nucleosome-modifying enzymes that remove nucleosomes from the area, including the enhancer and promoter.

(C) The transcription factors also bind a large protein complex called the Media- tor. (D) The Mediator is able to recruit and stabilize RNA polymerase II (RNA PII) and its cofactors (TAFs IIA, IIB, etc.) at the promoter site. These factors bound with RNA polymerase II is called the pre- initiation complex. The chromatin looping is further stabilized by cohesin. (E) After RNA polymerase II leaves the promoter, there are generally two outcomes. One outcome (right) is that it can associate with the transcription elongation complex (TEC) to elongate the nRNA while the Mediator continues to recruit new RNA polymerase II proteins to the complex.

Alternatively (left), RNA polymerase II can be instructed to stop elongation by a repressive transcription factor (NELF) that prevents the assembly of the TEC. When given a second developmental signal, NELF can be removed and transcription elongation continued. (After Malik and Roeder 2010; Ohlsson 2010.)

(14)

(1) a promoter that can be activated in any cell and (2) an enhancer that directs expres- sion of a particular gene (Myf5) only in mouse muscles. When the resulting transgene is injected into a newly fertilized mouse egg and becomes incorporated into its DNA, β-galactosidase protein reveals the expression pattern of that muscle-specific gene (FiGUrE 3.9B). More recently, genomic techniques such as ChIP-Seq (discussed later in the chapter) have enabled researchers to identify enhancer elements by sequencing the DNA regions bound by specific transcription factors.

Enhancers generally activate only cis-linked promoters (i.e., promoters on the same chromosome); therefore, they are sometimes called cis-regulatory elements.Because of DNA folding, however, enhancers can regulate genes at great distances (some as great as a million bases away) from the promoter (Visel et al. 2009). Moreover, enhancers do not need to be on the 5′ (upstream) side of the gene; they can be at the 3′ end and can be located in the introns (Maniatis et al. 1987). As we will see in Chapter 19, an important enhancer for a gene involved in specifying the “pinky” of each of our limbs is found in an intron of another gene, some one million base pairs away from its promoter (Lettice et al. 2008). In each cell, the enhancer becomes associated with particular transcription factors, binds nucleosome regulators and the Mediator complex, and engages with the promoter to transcribe the gene in that particular type of cell (FiGUrE 3.10a).

EnhancEr moDUlariTy The enhancer sequences on the DNA are the same in every cell type; what differs is the combination of transcription factor proteins that the enhancers experience. Once bound to enhancers, transcription factors are able to enhance or suppress the ability of RNA polymerase II to initiate transcription. Several transcription factors can bind to an enhancer, and it is the specific combination of tran- scription factors present that allows a gene to be active in a particular cell type. That is, the same transcription factor, in conjunction with different combinations of factors, will activate different promoters in different cells. Moreover, the same gene can have several enhancers, with each enhancer binding transcription factors that enable that same gene to be expressed in different cell types.

The mouse Pax6 gene (which is expressed in the lens, cornea, and retina of the eye, in the neural tube, and in the pancreas) has several enhancers (FiGUrE 3.10B,c). The 5′ regulatory regions of the mouse Pax6 gene were discovered by taking regions from its 5′ flanking sequence and introns and fusing them to a lacZ reporter gene. Each of these transgenes was then microinjected into newly fertilized mouse pronuclei, and the resulting embryos were stained for β-galactosidase (FiGUrE 3.10D; Kammandel et al.

1998; Williams et al. 1998). Analysis of the results revealed that the enhancer farthest

(A) (B)

FiGUrE 3.9 The genetic elements regulating tissue-specific transcription can be identified by fusing reporter genes to suspected enhancer regions of the genes expressed in particular cell types. (A) The GFP gene is fused to a zebrafish gene that is active only in certain cells of the retina. The result is expression of green fluorescent protein in the larval retina (below left), specifically in the cone cells (below right). (B) The enhancer region of the gene for the muscle-specific protein Myf5 is fused to a β-galactosidase reporter gene and incorporated into a mouse embryo.

When stained for β-galactosidase activity (darkly staining region), the 13.5-day mouse embryo shows that the reporter gene is expressed in the muscles of the eye, face, neck, and forelimb and in the segmented myotomes (which give rise to the back musculature). (A from Takechi et al. 2003, courtesy of S. Kawamura, T.

Hamaoka, and M. Takechi; B courtesy of A. Patapoutian and B. Wold.)

(15)

Gene A

mRNA expressed in brain and limb

Limb-speci�c enhancer (not used)

mRNA expressed in brain Mediator

Brain-expressed transcription factors

RNA PII

mRNA expressed in limb

Limb-expressed transcription factors

Brain-speci�c enhancer

Brain-speci�c enhancer (not used) (A)

Gene A expressed in brain and limb

Pancreas enhancer

Exons: 1 2 3 4 5 5a 6 7

Neural tube enhancer Lens and

cornea enhancer

(D)

(C) (B)

β-galactosidase

…CCCTTTATTGATTGACAGAAGCTGG…

Pbx1- binding sequence

Meis- binding sequence

Retina enhancer

Promoter

5^′ 3′

Brain-speci�c

enhancer Limb-speci�c

enhancer

Gene A expressed in brain

Gene A expressed in limb (i)

(ii)

(iii)

FiGUrE 3.10 Enhancer region modularity. (A) Model for gene regulation by enhancers. (i) The top diagram shows the exons, introns, promoter, and enhancers of a hypothetical gene A.

In situ hybridization (left) shows that gene A is expressed in limb and brain cells. (ii) In developing brain cells, brain-specific transcription factors bind to the brain enhancer, causing it to bind to the Mediator, stabilize RNA polymerase II at the promoter, and modify the nucleosomes in the region of the promoter. The gene is transcribed in the brain cells only; the limb enhancer does not function. (iii) An analogous process allows for transcription of the same gene in the cells of the limbs. The gene is not transcribed in any cell type whose transcription factors the enhancers cannot bind. (B) The Pax6 protein is critical in the development of several widely different tissues. Enhancers direct Pax6 gene expression (yellow exons 1–7) differentially in the pancreas, the lens and cornea of the eye, the retina, and the neural tube. (C) A portion of the DNA sequence of the pancreas-specific enhancer element. This sequence has binding sites for the Pbx1 and Meis transcription factors; both must be present to activate Pax6 in the pan- creas. (D) When the β-galactosidase reporter gene is fused to the Pax6 enhancers for expression in the pancreas and lens/cornea, the enzyme is seen in those tissues. (A after Visel et al.

2009; D from Williams et al. 1998, courtesy of R. A. Lang.)

(16)

upstream from the promoter contains the regions necessary for Pax6 expression in the pancreas, whereas a second enhancer activates Pax6 expression in surface ectoderm (lens, cornea, and conjunctiva). A third enhancer resides in the leader sequence; it con- tains the sequences that direct Pax6 expression in the neural tube. A fourth enhancer, located in an intron shortly downstream of the translation initiation site, determines the expression of Pax6 in the retina. The Pax6 gene illustrates the principle of enhancer modularity, wherein genes having multiple, separate enhancers allow a protein to be expressed in several different tissues but not expressed at all in others.

comBinaTorial associaTion Although there is modularity among enhancers, there are codependent units within each enhancer. Enhancers contain regions of DNA that bind transcription factors, and it is this combination of transcription factors that activates the gene. For instance, the pancreas-specific enhancer of the Pax6 gene has binding sites for the Pbx1 and Meis transcription factors (see Figure 3.10C). Both need to be present for the enhancer to activate Pax6 in the pancreas cells (Zhang et al. 2006).

Moreover, the product of the Pax6 gene encodes a transcription factor that works in combinatorial partnerships with other transcription factors. Figure 3.11 shows two gene enhancer regions that bind Pax6. The first is that of the chick δ1 lens crystallin gene (FiGUrE 3.11a; Cvekl and Piatigorsky 1996; Muta et al. 2002). This gene encodes crystallin, a lens protein that is transparent and allows light to reach the retina. A promoter in the crystallin gene contains binding sites for TBP and Sp1 (basal transcriptional fac- tors that recruit RNA polymerase II to the DNA). The gene also has an enhancer in its third intron that controls the time and place of crystallin expression. This enhancer has two Pax6-binding sites. The Pax6 protein works with the Sox2 and l-Maf transcription factors to activate the crystallin gene only in those head cells that are going to become lens. As we will see in Chapter 16, this means that the cell (1) must be head ectoderm (which expresses Pax6), (2) must be in the region of the ectoderm capable of forming eyes (expressing l-Maf), and (3) must be in contact with the future retinal cells (which induce Sox2 expression; see Kamachi et al. 1998).

Meanwhile, Pax6 also regulates the transcription of the genes encoding insulin, glucagon, and somatostatin in the pancreas (FiGUrE 3.11B). Here, Pax6 works in coop- eration with other transcription factors such as Pdx1 (specific for the pancreatic region of the endoderm) and Pbx1 (Andersen et al. 1999; Hussain and Habener 1999). So, in the absence of Pax6, the eye fails to form, and the endocrine cells of the pancreas do not develop properly; these improperly developed endocrine cells produce deficient amounts of their hormones (Sander et al. 1997; Zhang et al. 2002).

Other genes are activated by Pax6 binding, and one of them is the Pax6 gene itself.

Pax6 protein can bind to a cis-regulatory element of the Pax6 gene (Plaza et al. 1993).

So, once the Pax6 gene is turned on, it will continue to be expressed, even if the signal that originally activated it is no longer present.

What are the consequences of enhancer modularity to a developing individual?

To a species? How might a mutation in an enhancer affect development? For instance, what might occur in an embryo if there were a mutation in the enhancer region of the Pax6 gene? Could such a mutation have evolutionary importance? Hint: it does, and it’s profound!

Developing Questions

(A) Promoter +1

Sp1 TBP

+1706 +2218

Enhancer DNA

δEF3 δEF1 Pax6

Sox2

L-Maf

crystallin gene (lens)

(B) –450 –45

Pax6

Pdx1 CREB

Pdx1 Pbx1 Pdx1 Pbx1

Enhancer

+1

somatostatin gene (pancreas) TATATBP

FiGUrE 3.11 Modular transcriptional regulatory regions using Pax6 as an activator. (A) Promoter and enhancer of the chick δ1 lens crystallin gene. Pax6 interacts with two other transcription factors, Sox2 and l-Maf, to activate this gene. The protein δEF3 binds factors that permit this interaction; δEF1 binds factors that inhibit it. (B) Promoter and enhancer of the rat somatostatin gene. Pax6 activates this gene by cooperating with the Pbx1 and Pdx1 transcription factors. (A after Cvekl and Piatigorsky 1996; B after Andersen et al. 1999.)

(17)

silEncErs silencers are DNA regulatory elements that actively repress the transcription of a particular gene. They can be viewed as “negative enhancers,” and they can silence gene expression spatially (in particular cell types) or temporally (at particular times). In the mouse, for instance, there is a DNA sequence that prevents a promoter’s activation in any tissue except neurons. This sequence, given the name neural restrictive silencer element (nrsE), has been found in several mouse genes whose expression is limited to the nervous system: those encoding synapsin I, sodium channel type II, brain-derived neurotrophic factor, Ng-CAM, and L1. The protein that binds to the NRSE is a transcription factor called neural restrictive silencer factor (nrsF, sometimes called rEsT). NRSF appears to be expressed in every cell that is not a mature neuron (Chong et al. 1995; Schoenherr and Anderson 1995). When NRSE is delet- ed from particular neural genes, these genes are expressed in non-neural cells (FiGUrE 3.12; Kallunki et al. 1995, 1997). Thus, neural-specific genes are actively repressed in non-neural cells.

A recently discovered “temporal silencer” may play a role in regulating the human globin genes. In most people, a fetal globin gene is active from about week 12 until birth. Then, around the time of birth, the fetal globin gene is turned off, and the adult globin gene is activated. Some families, however, show a hereditary persistence of fetal hemoglobin, with the fetal globin genes remaining active in the adults. Some of these families have a mutation in a region of DNA that usually silences the fetal globin gene at birth. In most people, this silencer contains binding sites for the transcription factors GATA1 and BCL11A, whose combination on the DNA recruits histone modification enzymes.

This action causes the formation of deacetylated and repressive (H3K27me3-containing) nucleosomes (Sankaran et al. 2011).

GEnE rEGUlaTory ElEmEnTs: sUmmary Enhancers and silencers enable genes for specific proteins to use numerous transcription factors in various combinations to con- trol their expression. Thus, enhancers and silencers are modular such that, for example, the Pax6 gene is regulated by enhancers that enable it to be expressed in the eye, pan- creas, and nervous system, as seen in Figure 3.10B; this is the Boolean “OR” func- tion. But within each cis-regulatory module, transcription factors work in a combinatorial fashion such that Pax6, ^l-Maf, and Sox2 proteins are all needed for the transcription of crystallin in the lens (see Figure 3.11A); that is the Boolean “AND” function. The combinatorial association of transcription factors on enhancers leads to the spatiotem- poral output of any particular gene (see Peter and Davidson 2015; Zinzen et al. 2009).

This “AND” function may be extremely important in activating entire groups of genes simultaneously.

Transcription factor function

FamiliEs anD oThEr associaTions The science journalist Natalie Angier (1992) wrote that “a series of new discoveries suggests that DNA is more like a certain type of politician, surrounded by a flock of protein handlers and advisers that must vigor- ously massage it, twist it, and on occasion, reinvent it before the grand blueprint of the body can make any sense at all.” These “handlers and advisers” are the transcription factors. During development, transcription factors play essential roles in every aspect of embryogenesis, controlling differential gene expression leading to differentiation.

When in doubt, it is usually a transcription factor’s fault, a sentiment that is often used by politicians, too.

promoterL1 (A)

sequenceNRSE

promoterL1 (B)

No NRSE sequence lacZ

lacZ

FiGUrE 3.12 A silencer represses gene transcription. (A) Mouse embryo containing a transgene composed of the L1 promoter, a portion of the neuron-specific L1 gene, and a lacZ gene fused to the L1 second exon, which contains the NRSE sequence. (B) Same-stage embryo with a similar transgene but lacking the NRSE sequence. Dark areas reveal the presence of β-galactosidase (the lacZ product). (Photographs from Kallunki et al. 1997.)

(18)

Transcription factors can be grouped together in families based on similarities in DNA-binding domains (TaBlE 3.1). The transcription factors in each family share a common framework in their DNA-binding sites, and slight differences in the amino acids at the binding site can cause the binding site to recognize different DNA sequences.

As we have already seen, DNA regulatory elements such as enhancers and silencers function by binding transcription factors, and each element can have binding sites for several transcription factors. Transcription factors bind to the DNA of the regulatory element using one site on the protein and other sites to interact with other transcription factors and proteins, leading to the recruitment of histone-modifying enzymes.

For example, the association of the Pax6, Sox2, and l-Maf transcription factors in lens cells recruits a histone acetyltransferase that can transfer acetyl groups to the histones and dissociate the nucleosomes in that area (Yang et al. 2006). Similarly, when MITF,⁸ a transcription factor essential for ear development and pigment production, binds to its specific DNA sequence, it also binds to (different) histone acetyltransferase that facilitates the dissociation of nucleosomes (Ogryzko et al. 1996; Price et al. 1998). In addition, the Pax7 transcription factor that activates muscle-specific genes binds to the enhancer region of these genes within the muscle precursor cells. Pax7 then recruits a histone methyltransferase that methylates the lysine in the fourth position of histone H3 (H3K4), resulting in the trimethylation of this lysine and the activation of transcription (McKinnell et al. 2008). The displacement of nucleosomes along the DNA makes it possible for other transcription factors to find their binding sites and regulate expression (Adkins et al. 2004; Li et al. 2007).

In addition to recruiting histone-modifying enzymes, transcription factors can also work by stabilizing the transcription pre-initiation complex that enables RNA polymerase II to bind to the promoter (see Figures 3.7 and 3.8). For instance, MyoD, a transcription factor that is critical for muscle cell development, stabilizes TAF IIB, which supports RNA polymerase II at the promoter site (Heller and Bengal 1998). Indeed, MyoD plays several roles in activating gene expression because it also can bind histone acetyltransferases that initiate nucleosome remodeling and dissociation (Cao et al. 2006).

8MITF stands for microphthalmia-associated transcription factor.

TaBlE 3.1 some major transcription factor families and subfamilies

Family Representative

transcription factors Some functions

Homeodomain:

Hox Hoxa1, Hoxb2, etc. Axis formation

POU Pit1, Unc-86, Oct-2 Pituitary development; neural fate

Lim Lim1, Forkhead Head development

Pax Pax1, 2, 3, 6, etc. Neural specification; eye development

Basic helix-loop-helix (bHLH) MyoD, MITF, daughterless Muscle and nerve specification; Drosophila sex determination; pigmentation

Basic leucine zipper (bZip) C/EBP, AP1, MITF Liver differentiation; fat cell specification Zinc-finger:

Standard WT1, Krüppel, Engrailed Kidney, gonad, and macrophage development;

Drosophila segmentation Nuclear hormone receptors Glucocorticoid receptor, estrogen

receptor, testosterone receptor, retinoic acid receptors

Secondary sex determination; craniofacial development;

limb development

Sry-Sox Sry, SoxD, Sox2 Bend DNA; mammalian primary sex determination;

ectoderm differentiation

(19)

One important consequence of the combinatorial association of transcription factors is coordinated gene expression. The simultaneous expression of many cell-specific genes can be explained by the binding of transcription factors by the enhancer elements. For example, many genes that are specifically activated in the lens contain an enhancer that binds Pax6. So, all the other transcription factors might be assembled at the enhancer, but until Pax6 binds, they cannot activate the gene. Similarly, many of the coexpressed muscle-specific genes contain enhancers that bind the Mef2 transcription factor, and the enhancers on genes encoding pigment-producing enzymes bind MITF (see David- son 2006). In some instances, entire ensembles of transcription factors appear to direct simultaneous gene transcription. Junion and colleagues have shown, for example, that a particular ensemble of five transcription factors is bound on hundreds of enhancers that are active in the developing Drosophila heart muscle cells (Junion et al. 2012).

TranscriPTion FacTor Domains Transcription factors have three major domains.

The first is a Dna-binding domain that recognizes a particular DNA sequence in the enhancer. There are several different types of DNA-binding domains, and they often designate the major family classifications for transcription factors. Some of the most common protein domains that convey DNA binding are the Homeodomain, Zinc Finger, Leucine Zipper, Helix-Loop-Helix, and Helix-Turn-Helix (see Table 3.1). For instance, the homeodomain transcription factor Pax6⁹ uses its paired DNA-binding sites to recognize the enhancer sequence, CAATTAGTCACGCTTGA (Askan and Goding 1998; Wolf et al.

2009). In contrast, the MITF transcription factor involved in ear and pigment cell development contains both leucine zipper and helix-loop-helix domains, and it recognizes shorter DNA sequences called the E-box (CACGTG) and the M-box (CATGTG; Pogen- berg et al. 2012).¹⁰ These sequences for MITF binding have been found in the regulatory regions of genes encoding several pigment-cell-specific enzymes of the tyrosinase family (Bentley et al. 1994; Yasumoto et al. 1994, 1997). Without MITF, these proteins are not synthesized properly, and melanin pigment is not made.

The second domain is a trans-activating domain that activates or suppresses the transcription of the gene whose promoter or enhancer it has bound. Usually, this trans- activating domain enables the transcription factor to interact with the proteins involved in binding RNA polymerase II (such as TAF IIB or TAF IIE; see Sauer et al. 1995) or with enzymes that modify histones. MITF contains such a domain of amino acids in the center of the protein. When the MITF dimer is bound to its target sequence in the enhancer, the trans-activating region is able to bind a transcription-associated factor (TAF), p300/CBP. The p300/CBP protein is a histone acetyltransferase enzyme that can transfer acetyl groups to each histone in the nucleosomes (Ogryzko et al. 1996; Price et al. 1998). Acetylation of the nucleosomes destabilizes them and allows the genes for pigment-forming enzymes to be expressed.

Finally, there is usually a protein-protein interaction domain that allows the transcription factor’s activity to be modulated by TAFs or other transcription factors. MITF has a protein-protein interaction domain that enables it to dimerize with another MITF protein (Ferré-D’Amaré et al. 1993). The resulting homodimer (i.e., two identical protein molecules bound together) is the functional protein that binds to enhancer DNA of certain genes and activates transcription (FiGUrE 3.13).

insUlaTors The boundaries of gene expression appear to be set by DNA sequences called insulators. Insulator sequences limit the range in which an enhancer can activate

9Pax stands for “paired box,” and “box” refers to its DNA-binding domain. Pax proteins are homeodomain transcription factors that contain a paired domain for binding to DNA. Studies on Drosophila have shown that the loss of a homeodomain transcription factor causes dramatic homeotic transformations in structures, such as the transformation of an antenna into a leg.

10 E-box and M-box refer to “Enhancer” and “Myc” respectively, with “box” meaning DNA- binding domain.

(20)

gene expression. They thereby “insulate” a promoter from being activated by another gene’s enhancers. Some insulator DNA regions have been found to bind a zinc-finger transcription factor called CTCF,¹¹which functions to alter the three-dimensional conformation of chromatin and thereby separate (or insulate) enhancer elements from the promoter (Yusufzai et al. 2004; Kim and Kaang 2015). CTCF is ubiquitously expressed in eukaryotes and has been charted to bind tens of thousands of sites on the genome (Chen et al. 2012). Mechanistically, CTCF physically interacts with cohesin, a ring-shaped complex of multiple subunits that function to stabilize chromatin loop structures (see the dis- cussion of the Mediator complex on p. 56). It is hypothesized that CTCF uses its 11 zinc-finger domains to selectively bind DNA, often insulator elements, to create loop structures that distance enhancers from promoters. For instance, the chick β-globin gene has been shown to form a complex with cohesin (Wendt et al.

2008; Wood et al. 2010). This CTCF-cohesin complex may bind to the enhancer-bound Mediator, thereby preventing the enhancer from activating the adjacent promoter.

PionEEr TranscriPTion FacTors: BrEaKinG ThE silEncE Finding an enhancer is not easy because the DNA is usually so wound up that the enhancer sites are not accessible. Given that the enhancer might be covered by nucleosomes, how can a transcription factor find its binding site? That is the job of certain transcription factors that penetrate repressed chromatin and bind to their enhancer DNA sequences (Cirillo et al. 2002; Berkes et al. 2004). They have been called “pioneer” transcription factors, and they appear to be critical in establishing certain cell lineages. One of these transcription factors is FoxA1, which binds to certain enhancers and opens up the chromatin to allow other transcription factors access to the promoter (Lupien et al. 2008; Smale 2010). FoxA1 is extremely important in specifying liver cells, remaining bound to the DNA during mitosis, and providing a mechanism to reestablish normal transcription in presumptive liver cells (Zaret et al. 2008). Another pioneer transcription factor is the Pax7 protein mentioned above. It activates muscle-specific gene transcription in a population of muscle stem cells by binding to its DNA recognition sequence and being stabilized there by dimethylated H3K4 on the nucleosomes. It then recruits the histone methyltransferase that converts the dimethylated H3K4 into the trimethylated H3K4 associated with active transcription (McKinnell et al. 2008).

masTEr rEGUlaTory TranscriPTion FacTors The phrase “master regulator” has been used to describe certain transcription factors that seem to have the power to control

11 CTCF stands for CCCTC-binding Factor. Although we highlight its role as an insulating factor, CTCF can also contribute to chromatin architecture and in some cases activate transcription by bringing enhancers in contact with promoters. (See Kim and Kaang 2015.)

Carboxyl termini

Amino termini

Protein-protein interaction domain DNA-binding

domains

FiGUrE 3.13 Three-dimensional model of the homodimeric transcription factor MITF (one protein shown in red, the other in blue) binding to a promoter element in DNA (white). The amino termini are located at the bottom of the figure and form the DNA-binding domains that recognize an 11-base-pair sequence of DNA having the core sequence CATGTG.

The protein-protein interaction domain is located immediately above.

MITF has the basic helix-loop-helix structure found in many transcription factors. The carboxyl end of the molecule is thought to be the trans-activating domains that bind the p300/CBP transcription-associated factor (TAF). (From Steingrímsson et al. 1994, courtesy of N. Jenkins.)

The precise binding of transcription factors to cis-regulatory elements drives differential gene expression both spatially and temporally in the developing embryo. is a cell’s identity determined by one transcription factor complex binding to one regulatory element, leading to the expression of one gene? How many genes are required to establish a specific cell’s fate?

Developing Questions