• No results found

3.3. Results

3.3.1. Quality control statistics

For this study, 60 MM patients were recruited into the study following at diagnosis and relapse with all clinical information at recruitment. Bone marrow (BM) samples were obtained either at the Hammersmith Hospital, Imperial College Healthcare NHS Trust in London, or at the AHEPA University Hospital of Thessaloniki in Greece, patients were consented by Alexia Katsarou, (Department of Haematology) and Evdoxia Hatjiharissi at each facility respectively. Written informed consent and research ethics committee approval was obtained (Research Ethics Committee reference: 11/H0308/9).

ATAC-seq was performed on 60 samples and RNA-seq on 54 as specified in section 2.1. Sample input material for ATAC-seq and RNA-seq was decided after purification aiming for 50,000 and 100,000 paired-end reads for ATAC and RNA respectively (when available). The samples were then bioinformatically analysed post-hoc to maintain only samples complying with the guidelines mentioned in the introduction section of this chapter.

Due to too low assigned fraction, a low number of single ends entering the peak caller, low quality when manually inspecting a sample’s peaks produced, ATAC – RNA pairing not

available, non-comparable RNA type, RNA-seq contamination, only 38 paired ATAC – RNA were used (see Table 2-1), of which 33 are primary samples and 5 MM CL samples. Cytogenetic information was obtained for some samples prior to sequencing and was later confirmed by in- house methods by Philippa May. Cytogenetics is available for primary MM samples creating subgroups based on translocations involving the IgH locus and the MMSET/NSD2, MAF and CCND1 genes and the Hyperdiploidy (HD) status (chapter 4).

Since the ATAC-seq sequencing material was of significantly lower depth for MM CLs

compared to PC and MM primary samples, the former samples were used only in a qualitative (and not quantitative) manner and correspondingly different quality control criteria were employed. Primary samples were used in determining the candidate enhancer regions regulating genes, only samples with more than around 30M single ends entering the peak caller (except sample A26.11 with 28,278,284) and in general greater than 10% assigned fraction (except sample A26.6B with 7%) were considered. Cell lines were only required to have an assigned fraction of around 10% or more and were used to obtain primary MM interactions also activated in CLs.

The 38 samples consisted of: 5 primary PC samples comprising of 3 donors (with CD19+ and

CD19- variant samples for two of them and a technical replicate for one of these) and 28 MM

primary samples. In total, 3,124,720,754 RNA-seq read pairs were generated with an average mapping rate of 83% when quantifying reads in transcripts, generating an average of

67,937,531 mapped reads per sample. 3,287,274,493 ATAC-seq read pairs were sequenced and 2,258,363,658 total unique read single ends (average of 59,430,623 per sample) were input into the peak caller to generate a total of 2,350,508 and 2,645,283 sample narrow and broad peaks respectively. The average sample assigned fraction is 21%. The table with all the samples and details can be seen at:MM_vs_PC_supervised_analysis/ATAC_and_RNA- seq_stats.xlsx

To determine that the ATAC-seq reads were piling up in patterns reflecting open chromatin consistent with previous studies (Alasoo et al., 2017), the number of single ends entering peak calling, the peaks produced and the assigned fraction were studied and classified in groups of samples (PC, MM and MM CL). The number of peaks per sample after filtering for areas of high and low mappability is greater for PC than for MM or MM CL with medians of 80,000, 60,000 and 40,000 respectively (Figure 3-1 A). The number of single ends (one read pair has two single ends) input into the peak caller per sample with each category is only slightly higher for PC than for MM (median of 55M vs. 50M respectively), but significantly fewer for MM CL (around 15M) (Figure 3-1 B). This is consistent with the strong correlation between the number of single ends used for peak calling and the number of peaks produced (Figure 3-1 D).

The assigned fraction is in the great majority of cases above the required 10% threshold, which complies with the guidelines (Alasoo et al., 2017) (Figure 3-1 C). In the case of MM samples, the variability in the assigned fraction is very high, this can be due to the nature of the different subgroups, for example in terms of overall chromatin accessibility and also due to different Hyperdiploid states which may alter the piling up of chromatin accessibility signal in certain areas (studied in greater detail in chapter 4). PC samples have a higher number of peaks than MM, albeit having similar reads entering peak calling, this could mean that the chromatin accessibility signal is more concentrated in regions that are more spread out for PC, hence explaining its lower assigned fraction in the healthy condition.

Figure 3-1: Quality control of ATAC-seq data.

A-C: Distribution of ATAC-seq statistics for different groups of all samples used in the study (Table 2-1): MM and ND (PC) primary samples, MM CL (MM cell line). A) Broad peaks called by MACS2. B) Number of filtered reads used for peak calling. C) Assigned fraction. D) Relationship between the number of filtered broad peaks called and the number of filtered reads used for peak calling. E) Relationship between the number of filtered reads used for peak calling and the sample assigned fraction.

In PC, maybe for a fraction of the signal, there is not sufficient sequencing material to surpass the threshold to be considered an accessible region, or perhaps there is more background noise. The higher assigned fraction in MM can be caused by duplications (at the gene, chromosomal arm or full chromosome level). Since amplification of these regions are likely contributing to the disease state, it is possible that they are chromatin active (and accessible regions) that can pile up more signal. However, since the ATAC-seq processing pipeline removes PCR duplicate fragments, the magnitude of this effect would be reduced. A higher number of input reads produces more peaks,

A

B

C

A

D

A

E

A

Surprisingly, the number of single ends used to call peaks in each sample does not correlate with the assigned fraction produced (Figure 3-1 E). If chromatin accessible reads were randomly placed, samples with higher number of peaks would be expected to have higher assigned fraction, however, as it is seen between MM and PC, this is not the case and likely points at a genuine sample-specific distribution of open chromatin. As mentioned before, since MM CL have a significantly lower sequencing depth, it is difficult to compare the assigned fraction with the other groups of samples.