CRISPR-MEDIATED KNOCKOUT OF TRANSCRIPTION
FACTOR FOXC1 IN BREAST CANCER CELLS
University of North Carolina at Chapel Hill
Biology Honors Thesis
Koby Amankwah
Dr. Charles Perou
Abstract
Breast cancer is the most commonly occurring cancer in women in the US and accounts for about 40,000 deaths each year. Through gene expression profiling, the Perou Lab has categorized breast cancer into five unique subtypes, some of which are sensitive to chemotherapy (basal-like), and some that are not (luminal A). Gene expression profiling has also revealed that certain transcription factors, such as FOXC1, are highly expressed in basal-like cancers. The goal of this project is to study the functional role of FOXC1 in basal-like breast cancers. We propose to do this by using CRISPR/Cas9 to modify the genome of basal-like breast cancer cell lines and delete the transcription factor FOXC1. We hypothesize that deletion of FOXC1 will cause changes in the expression of other genes important for maintaining cells in the basal-like phenotype. To test this hypothesis, we transfected three basal-like breast cancer cell lines with a CRISPR/Cas9 plasmid to induce a knockout at the FOXC1 locus. We validated knockout clones for modification of the FOXC1 gene by sequencing genomic DNA and showed loss of FOXC1 protein expression by western blot. We then used DNA microarrays to measure global changes in gene expression when FOXC1 is knocked out. Analysis of gene expression after inactivation of FOXC1 identified hundreds of differentially expressed genes, and a possible role for FOXC1 in maintenance of mesenchymal features. By better understanding of what genetic factors determine a given cancer subtype, we believe we can manipulate a tumor to be more basal-like and therefore possibly more sensitive to chemotherapy.
Introduction
further investigate the genetic features of breast cancer, the Perou Lab has used gene expression profiling and categorized breast cancer into five distinct subtypes: claudin-low, basal-like, HER2-enriched, and luminal A and B.10 This genomics-based classification predicts patient prognosis, and may also indicate which therapies a given tumor may be responsive to. For example, the basal-like subtypes have the highest sensitivity to chemotherapy, with a Pathological Complete Response (pCR) rate of approximately 40%. On the other hand, the luminal subtypes have the lowest response to chemotherapy with a pCR rate of about 5% (Figure 1).
To better understand what defines the basal-like subtype, the Perou lab undertook gene expression analysis of primary human tumors, human breast cancer cell lines, and mouse mammary tumors. All these analyses consistently revealed high FOXC1 expression within the basal-like subtype of breast cancer. FOXC1 is a member of the forkhead box family of transcription factors, which means it binds to specific regions of DNA to allow protein-making instructions to be read and regulate the activity of other genes. Through this mechanism, FOXC1 has been found to regulate the epithelial to mesenchymal transition (EMT).4 The EMT transition is a cellular process by which cells lose their adhesion to
neighboring cells and begin to migrate and invade other regions. For people with breast cancer, an EMT transition would mean that the cancer in their breasts would metastasize elsewhere, causing cancer growth in other regions of their bodies.
showed loss of FOXC1 protein expression by western blot. We then used microarrays to measure global changes in gene expression when FOXC1 is knocked out and used flow cytometry to examine the change in cell-surface markers to see if the cell changed phenotype.
Results
Sanger sequencing revealed modification at the FOXC1 locus
To begin the investigations of the role of FOXC1 in basal-like breast cancer, we chose three human breast cancer cell lines for this project – HCC1143, SUM149PT, and MDA-MB-468 – because they have all been previously classified by the Perou lab as basal-like cells with high expression of FOXC1.3 Each cell line was individually transfected with a plasmid to deliver a single guide RNA and Cas9 as well as a selectable marker of antibiotic resistance.5 The antibiotic puromycin was used to select cells successfully transfected with the plasmid and single cell clones were isolated in each line.
26 nucleotides (including the start codon). The results reported here were from experiments performed on clones isolated from the HCC1143, SUM149PT, and MDA-MB-468 cell lines.
Off-target site sequencing confirmed sites remained wild-type
Off-target sites are positions within the genome that have similar sequences to our FOXC1 CRISPR targeting locus. This means that CRISPR/Cas9 in theory could target these alternative regions and cause unwanted mutations. In silico analysis of the FOXC1 gRNA generated a list of possible off-target sites (Table 1). The top four off-target sites based on off-target score were next sequenced. This score is the inverse probability of off-target Cas9 binding. So the higher the score, the lower the probability of binding. Sequencing of these four off-target sites in all knockout clones showed that all these loci remained wild type in the presence of FOXC1 knockout (data not shown).
Western blot confirmed loss of FOXC1 protein
Microarray analysis determined change in global gene expression
Microarray analysis measures global gene expression and changes that may arise from manipulations of cell lines. To assess the changes in gene expression resulting from FOXC1 knockout, we analyzed the expression of all knockout clones in all three cell lines at once; we chose to do all three cell lines combined in order to mitigate any idiosyncratic results coming from any one cell line. Significance Analysis of Microarray (SAM) was used for our ‘supervised gene expression analysis’, which is a statistical method to determine the genes whose expression are significantly altered in the group of knockout cells compared to wild-type cells (Figure 5); SAM also provides an estimate of the False Discovery Rate (FDR) using 1000 random permutations of the data, thus providing a data driven FDR on this exact data matrix. The analysis yielded 1496 altered gene expressions. Depending on what side the genes fell outside of the confidence interval, determined whether they were considered upregulated or downregulated. The 607 genes in red (top-right) have been upregulated (fold change >1.0) and the 889 genes in green (bottom-left) have been downregulated (fold change <1.0).
Gene signatures analyzed a cohort of the significantly altered genes
the signature within a given patient, can prognostically identify patients that will have a better outcome
.
In Figure 8A, the dataset had a p-value of 0.0018 and therefore significant. The highest expression of
downregulated genes had the highest percent survival (~60%) in 10 years. the lowest expression has the
lowest survival (~50%). The medium expression falls in between (~55%). In Figure 8B, the dataset had a
p-value of 0.074 and therefore trending toward significance. The highest expression of upregulated genes
has the lowest percent survival of about 50%. The lowest expression has the highest survival rate (~60%)
and the medium expression falls in between (~55%).
Flow cytometry revealed changing cell phenotype
Cell surface proteins CD49f and EpCAM have been previously used in the Perou lab to help
phenotypically define the basal-like cell population, as well as mammary stem cells and mature luminal cells3. Cells with high expression of CD49f and EpCAM are basal-like whereas cells with high
expression of CD49f but low expression of EpCAM can be classified as claudin-low. Flow cytometry analysis of our knockout cells in the SUM149PT cell line showed a dramatic shift in the cell population profile. The EpCAM-negative/CD49f-positive cell population (23.1% in parental) was decreased to 0.1% in both knockout cell isolates (Figure 6). The EpCAM-positive/CD49f-positive cell population (68.8% in parental) increased to 98.6% and 96.8% in KOc11 and KOc24, respectively. Therefore, the SUM149PT knockout cells seemed to be losing the claudin-low phenotype and becoming more basal-like.
Discussion
Identifying the genetic drivers of key breast tumor phenotypes is both biologically and clinically important. The basal-like subtype, which is the major subtype present within Triple Negative Breast Cancer (TNBC), is typically an aggressive tumor subtype with poor patient outcomes. Thus, identifying the drivers of the phenotype could lead to new treatment approaches. One dominant and recurrent feature of basal-like breast cancers is the expression of FOXC1. Therefore, we performed genetic ablation experiments in basal-like cell lines, to begin to characterize the molecular functions of FOXC1 in human TNBC biology.
Our CRISPR mediated knockout experiments showed that FOXC1 had been significantly altered by the intra-genetic alteration of FOXC1 coding sequences in all three cell lines (HCC1143, SUM149PT, and MDA-MB-468). HCC1143_c2 isolate lost 32 nucleotides, including the start codon, which implies the protein will not get translated. HCC1143_c4 isolate inserted an adenosine (A) nucleotide, which resulted in an early stop codon (amino acid 82 out of 554). This means that only 14% of the protein was
translated, which would not be enough for the FOXC1 protein to function. Even if a cryptic start codon exists downstream in either of these sequences, frameshift mutations introduce multiple stop codons that would inevitably produce a nonfunctional protein. Our western blot assays validated these sequencing results (Figure 5).
The SUM149PT cells went through very similar CRISPR-mediated genetic modifications:
The MDA-MB-468 cells also went through similar modifications: 468_c15 lost 26 nucleotides (including the start codon) and 468_c10 inserted an ‘A’ nucleotide, which resulted in an early stop codon (82 out of 554), leading to nonfunctional FOXC1 protein.
These results mean there were two completely different cell lines being repaired with the same
modifications. A potential theory for this surprising phenomenon is that the repair of double strand breaks induced by CRISPR/Cas9 can occur by identical mechanisms in different cell lines resulting in the same nucleotide insertion seen in both HCC1143 clone 4 and SUM149PT clone 24. The mechanism for how identical insertions can occur is not clear but is supported by the observation that different cells can delete the same number of nucleotides in order to dimerize and repair the region immediately adjacent to the CRISPR cut site.11
After verifying loss of FOXC1 protein within these cell lines, a flow cytometry assay was used to look for phenotypic changes in our cell lines. Protein expression of EpCAM and CD49f on breast epithelial cells has been used as a means of identifying functionally distinct subsets of epithelial cells including
sensitivity assay can also be performed to see if the FOXC1- cells, which are more basal-like and less claudin-low like, are more responsive to chemotherapy, supporting the hypothesis that they have become more basal-like.
Analysis of microarrays identified 1496 genes as either significantly upregulated or downregulated. Some upregulated genes include TRIB117 and STAT219, both involved in invasion. Genes that are
downregulated are likely some of the direct targets of FOXC1. A significantly downregulated gene that was of particular interest is CLDN6. CLDN6 plays an important role in suppressing migration and invasion in vitro and is often found at lower levels in human breast cancers18.ApoM was another downregulated gene. It is known to promote proliferation and invasion in non-small cell lung cancers20. After entering these genes to create a gene ontology profile, it was found that a significant number of downregulated genes were found to be involved in cell adhesion (data not shown). Future experiments could examine this further by in vitro migration and invasion assays or using tumor transplants of our cell line in an animal model. Further investigation into these genes and the several others that changed expression will be crucial in identifying whether or not these basal-like cells are beginning to change phenotype.
The SAM analysis provided a compilation of upregulated and downregulated genes. In order for this information to be used to help patients, a gene signature was developed based upon the top 300
upregulated and top 300 downregulated genes. An average value for these gene sets were determined for
each patient from a publicly available dataset that contains overall survival for 1584 patients. The data
presented in Figure 8A shows the highest expression of downregulated genes has the highest percent
survival, the lowest expression has the lowest survival, and the medium falls in between. This data is
statistically significant (p-value of 0.0018) and therefore can prognostically identify patients that will
have better overall survival. In Figure 8B, the highest expression of upregulated genes has the lowest
this signature does not provide statistical significance in this analysis it does trend toward significance
(p-value of 0.074). It is possible that further development of this up signature could refine it so that it does
provide valuable prognostic ability in the clinical setting. However, because the down signature already
appears significantly prognostic, we would like to apply it to other datasets to show that it is has the same
power in those cohorts. If the results are reproducible, it is possible to imagine designing a clinical assay
in which a patient sample (e.g. blood) is sequenced prior to treatment and our gene signature is applied to
these sequencing results to help determine if the patient has a favorable clinical outcome or perhaps may
require a more aggressive course of therapy.
Conclusion
Comprehensive analysis of gene expression after inactivation of FOXC1 in basal-like cell lines identified a robust set of genes that are likely reflective of the activity of this transcription factor. Our preliminary data suggest that FOXC1 is driving cells into a more mesenchymal/claudin-low phenotype, which has potential clinical implications in terms of metastatic potential. Additional clinical implications include using this gene signature to better categorize patients who will respond well to chemotherapy versus those should seek alternative treatments. Hopefully, by better understanding what genetic factors determine breast cancer subtype classification, we can manipulate a tumor to be more basal-like and therefore more sensitive to chemotherapy.
Three cell lines were cultured independently. MDA-MB-468 cells were maintained in DMEM medium with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin (P/S). HCC1143 were maintained in RPMI 1640 medium with 10% FBS and 1% P/S. SUM149PT cells were maintained in HuMEC medium with 5% FBS plus HuMEC supplement. Cells were cultured in T-75 plates and transferred to 24-well plates before transfection of CRISPR/Cas9.
CRISPR/Cas9
Three basal-like breast cancer cell lines, HCC1143, MDA-MB-468, and SUM149PT, were individually transfected with a CRISPR/Cas9 plasmid to induce a knockout at the FOXC1 locus.The CRISPR guide RNA (gRNA) is bound to Cas9 to create the CRISPR/Cas9 complex. The gRNA is used to locate the particular site of DNA that needs to be cut and Cas9, which acts as a molecular scissor cuts the DNA. After the DNA is cut, a gene can be added in there to create a “knock-in,” or the DNA can be left to repair itself, which is the approach we used. When the DNA repairs itself, it makes errors which causes
mutations and leads to “knockout” of this particular part of the genome. Individual clones/colonies were isolated for each of the three cell lines, and which were then tested by PCR and the sequencing of the PCR products targeting the CRISPR site in FOXC1 to determine CRISPR mediated gene knockouts. Sanger sequencing was used to confirm knockout at FOXC1 locus and ensure off-target positions weren’t accidentally deleted by CRISPR/Cas9. PCR was used to specifically amplify the region of interest from the genome. Isolated PCR products were Sanger sequenced and each knockout clone was validated to have a disruption of the FOXC1 coding sequence.
Western Blot
FOXC1 antibody was used to perform western blot analysis. 30 micrograms of protein was separated on
transfer, the membrane was blocked using Odyssey imaging buffer (Li-Cor) for one hour. Primary
antibody was added and incubated overnight at 4C. Primary antibodies were used as recommended by
the manufacturer: FOXC1 rabbit monoclonal antibody (Cell Signaling Technologies #8578) and
beta-actin mouse monoclonal antibody (Cell Signaling Technologies #3700). After overnight incubation, the
membrane was washed three times in TBS-T. Secondary antibody (Li-Cor) was added as recommended
by the manufacturer and incubated at room temperature for 2 hours. The membrane was again washed
three times in TBS-T. The membrane was visualized on the Odyssey 9120 Infrared Imaging System
(Li-Cor).
DNA Microarrays
Custom human microarrays (Agilent Technologies) were used to assay global gene expression in all cell
lines. These microarrays have over 40,000 features which cover the expressed genes in the human
genome as well as internal controls. For each experiment, total RNA is isolated from a cell line,
synthesized into cDNA, then transcribed into cRNA to incorporate a fluorescent labelling dye. All
samples are run in parallel with a sample of control RNA that receives a different fluorescent labelling
dye. Both control sample and experimental sample are then fragmented and hybridized on the same array
overnight at 65(degrees symbol) C overnight. The following day, the array is washed twice and treated
with stabilization buffer (Agilent) then scanned on a SureScan DNA Microarray Scanner (Agilent) with
two-color scanning capability. The fluorescent intensity for each feature is measured and the ratio of
experimental sample to control sample fluorescent intensity is calculated. This process was done for all
triplicate, yielding 27 total microarrays. Downstream analyses of the resulting data was done in
Microsoft Excel and R Bioconductor.13
Flow Cytometry
Flow cytometry was used to detect the change in expression of cell-surface markers of the basal-like
subtype. EpCAM and CD49f are two cell surface proteins whose expression can define the basal subset of
breast cancers.3 Cells were washed in PBS once, then incubated with antibody for 1 hour at 4(degrees
symbol) C. Antibody dilutions were determined empirically before the experiment and were used at
1:125 for FITC-EpCAM (BD Biosciences) and 1:1500 for Alexa-fluor CD49f (BD Biosciences). Then
cells were washed three times with chilled PBS. Cells finally were resuspended in PBS with 2%FBS for
analysis. Cells were analyzed on an LSRFortessa machine (Becton Dickinson).
Gene Signatures
Figures
Figure 1. The molecular subtypes of breast cancer and their relationship to chemotherapy
Table 1. Off Target Sites Data. Sequenced off-target sites to make sure that CRISPR/Cas9 didn’t delete other portions of the genome with similar sequences. Off-target score refers to the likelihood that that that position would be targeted. The letters in red refers to the differences in sequences from the wildtype.
CRISPR Sequence Off-Target
Position Off-Target Score
Off-Target Sequence
CACGGAGTAGCGCGCCTGCATGG CLN8
(chr8-:1780242-1780264) 0.26 CTCGGACCAGCCCGCCTGCA TGG
ZNRF2 (chr7-:30284617
-30284639) 0.0267
CGCGGAGCGGCGCGGCTGCA
GGG
JAGN1
(chr3+:9890615-9890637) 0.022
CACGGAGCCGCGCGGCTGCGGG G
SIRT1 (chr10-:6788528
3-67885305) 0.0123
CCCGGAGGAGCGCGGCTGAA
Figure 5. Validation of knockout clones with western blot. Western blot analysis validated the
sequencing results in the HCC1143, SUM149PT, and MDA-MB-468 cell lines. FOXC1 protein is 75 kDa in size. Beta-actin is 42 kDa in size. It was used as a loading control.
75 kDa
50 kDa
37 kDa
MDA-MB-468
75 kDa
50 kDa
37 kDa
FOXC1:
Figure 6. Microarray Analysis of all knockout clones. Global gene expression after deletion of FOXC1 in all three cell lines. Significance Analysis of Microarray (SAM) provides a confidence interval. Genes that fall outside of this interval are either up/downregulated. This graph shows 607 genes in red (upper right) were upregulated and 889 genes in green were downregulated (bottom left). The mean false discovery rate of 0.00%.
Expected Value
O
b
se
rv
ed
V
al
u
Figure 8. FOXC1 signatures on METABRIC. The expression of genes that had been upregulated and downregulated by loss of FOXC1 were compiled and graphed against 10-year survival data and the percent survival of each characterization of expression is shown. (A) The highest expression of
downregulated genes had the highest percent survival, the lowest expression has the lowest survival, and the medium falls in between. (B) The highest expression of upregulated genes have the lowest percent survival, the lowest expression has the highest survival and the medium falls in between.
Years P er ce n t su rv iv al
0 5 10 15 0 50 100 150 High Medium Low