Deep profiling of multitube flow cytometry data Supplemental information

(1)

Deep profiling of multitube flow cytometry data

Supplemental information

Kieran O’Neill

et al

(2)

Table S1: Markers in simulated multitube data. The data was split into three tubes, each containing CD3, CD4 and CD8 in addition to FSC and SSC. The remaining nine markers were distributed across the tubes, three per tube.

Marker Type Tube 1 Tube 2 Tube 3

Common (scatter) FSC FSC FSC

Common (scatter) SSC SSC SSC

Common (fluorescent) CD3 CD3 CD3

Phenotyping (fluorescent) KI67 CD57 CD27

Phenotyping (fluorescent) CD28 CCR5 CCR7

(3)

Figure S1: Overview of the flowBin pipeline, applied to one multitube sample. 1) FCM data from individual aliquot tubes is quantile normalised in terms of the common population markers present in every tube. 2) The tubes are then binned in terms of these population markers, using either K-means or flowFP. 3) The bins from the first tube are mapped to the other tubes (by nearest-neighbour mapping for K-means bins, or directly for flowFP bins). 4) The expression of each bin in terms of each phenotyping marker (those markers differing across tubes) is measured. This may be done by taking median fluorescent intensity, normalised median fluorescent intensity, or proportion of cells exceeding the 98th percentile of a negative control. The final result is a high-dimensional matrix containing expression levels for each bin in terms of each unique marker.

(4)

Bone Marrow

Aliquot, stain, run flow cytometry

FCS Data T ube 1 T ub e 2 Markers S catt er

...

129 patients 7-10 tubes/patient 20,000 cells, 3-4 markers/tube Combine tubes using flowBin FSC SSC CD45 HLA-DR CD13 CD34 CD20 CD19 CD10 CD61 CD56 CD33 CD64 CD117 CD14 CD7 CD2 CD4 CD3 FS C S SC C D4 5 H LA -D R C D1 3 C D3 4 C D2 0 C D1 9 C D1 0 C D6 1 C D5 6 C D3 3 C D6 4 C D1 17 C D1 4 C D7CD2CD4CD3 128 clusters/patient 17 markers each Cell Clusters 801 cell types associated with NPM-1 FSC SSC CD45 HLA-DR CD13 CD34 CD20 CD19 CD10 CD61 CD56 CD33 CD64 CD117 CD14 CD7 CD2 CD4 CD3 FS C S SC C D4 5 H LA -D R C D1 3 C D3 4 C D2 0 C D1 9 C D1 0 C D6 1 C D5 6 C D3 3 C D6 4 C D1 17 C D1 4 C D7CD2CD4CD3 128 clusters/patient 17 markers each Cell Clusters Type clusters using flowType (1:6-combinations)

Wilcoxon rank sum

vs patient NPM1

with Holm correction

616,285 cell types per patient Cell Type Proportions + -+

-Figure S2: Pipeline used to determineNPM1-associated immunophenotypes in

AML.Steps taken are denoted by arrows, while the data consumed/produced is

in-dicated in boxes. FCM was performed in the clinic historically; all other steps were computational. The end result was a list of 801 cell types which showed a significant

(5)

Figure S3: One, two and three-dimensional representations of quantile

normali-sation of population markers. Empirical cumulative density function (ECDF) plots

are shown for all tubes and for forward scatter (FS), the most variant marker. Fol-lowing normalisation, the ECDF for all tubes is identical, as is expected from quantile normalisation. Two-dimensional scatter plots for representative tubes show visually the improvement in two-dimensional registration. Lastly, flowFP plots show the im-provement in three-dimensional registration, measured by the standard deviation of the number of cells falling within each bin, after bins have been fitted to the consensus of all tubes.

(6)

Figure S4: The two options for binning within flowBin: k-means and flowFP, as

applied to a 7-tube sample.a. and b. show comparisons between the bin labels

them-selves. K-means creates roughly spherical bins, which conform around the location of cell populations. FlowFP creates grid-like bins,which may not conform to the true underlying shape of cell populations. c. shows the number of cells per bin across all tubes, for every bin. flowFP has approximately the same mean distribution of bin den-sity across tubes as K-means (mean SD: 24.6 vs 28.5). However, flowFP has a much closer to constant number of cells per bin across bins (SD of means: 0.07 vs 255).

(7)

Figure S5:Comparison between nearest-neighbours merging and flowBin for two

tubes computationally sampled from a real data set. a. Raw data (compensated,

transformed and filtered for debris), gated for CD3+_{cells, and showing the true CD4}

and CD8 distribution. b. The two sampled tubes, one containing CD4 and the other

CD8. The CD4+ _{population has slightly higher average CD3 than the CD8}+_{, but}

both have substantially overlapping CD3 distributions. c. Results of merging by near-est neighbours and by flowBin, including proportion of resulting “cells” falling within

each quadrant. The nearest-neighbours merging created a substantial CD4+_CD8+

pop-ulation not present in the original sample. Both nearest neighbours and flowBin slightly

overestimate the CD4−CD8−population. flowBin is more accurate at reproducing the

CD4+CD8− and CD4−CD8+ populations than nearest neighbours. d. and e. This

analysis was repeated 100 times each for each number of bins, with a separate sam-pling of 5,000 events each. d. Representative results (those with median RMSD) for selected numbers of bins. e. All results for all numbers of bins and NN merging. The best result (lowest RMSD) was for 128 bins, whereafter increasing bin number caused RMSD to tend towards that of NN.

(8)

Figure S6:nu-SVM separation of normal and abnormal cell populations in AML

samples. a. Heatmap of all populations within the AML samples that were predicted

to be normal. Most can readily be identified as having the properties of common blood and bone marrow cell populations: myeloid cells expressing CD16 and/or CD64, lym-phoid cells (dominated by CD3-expressing T-lymphocytes/precursors, and erythroid cells not expressing any of the markers in the panel, including CD45. b. Heatmap of all populations predicted to be abnormal. In contrast to the cells predicted to be normal, many of these express CD34 and CD117, primitive markers typical of stem cells and of AML. Training data pop 1) 0.48 0.44 0.78 ... pop 2) 0.67 0.45 0.34 ... pop 3) 0.74 0.89 0.12 ... ... Training classes pop 1) AML pop 2) AML pop 3) AML ... Classifier Training Algorithm Trained Classifier pop 1) 0.35 0.46 0.67 ... pop 2) 0.21 0.56 0.49 ... pop 3) 0.78 0.41 0.89 ... ... pa tien t 1 patie nt 2 pop 1) healthy pop 2) healthy pop 3) healthy ... pa tien t 1 patie nt 2 Predicted classes pop 1) AML pop 2) AML pop 3) healthy ... Trained Classifier Test data (all bins from one patient)

pop 1) 0.48 0.44 0.78 ... pop 2) 0.67 0.45 0.34 ... pop 3) 0.74 0.89 0.12 ... ... pa tien t 1 Predicted class Patient 1) AML Take vote a. b.

(9)

Training data 1) 0.35 0.46 0.67 ... 2) 0.21 0.56 0.49 ... 3) 0.78 0.41 0.89 ... ... Training classes 1) healthy 2) AML 3) healthy ... Classifier Training Algorithm Trained Classifier 1 Subsample sample 1 classes 2) healthy 3) AML 5) healthy ... sample 1 data 2) 0.21 0.56 0.49 ... 3) 0.78 0.41 0.89 ... 5) 0.33 0.43 0.47 ... ... sample 2 classes 1) healthy 3) healthy 8) AML ... sample 2 data 1) 0.35 0.46 0.67 ... 3) 0.78 0.41 0.89 ... 8) 0.12 0.31 0.71 ... ...

...

Classifier Training Algorithm Trained Classifier 2 Test data 1) 0.22 0.26 0.65 ... 2) 0.34 0.24 0.45 ... 3) 0.67 0.34 0.46 ... ... Predicted classes 1) AML 2) healthy 3) healthy ... Trained Classifier Predicted classes 1) AML 2) AML 3) AML ... Trained Classifier Predicted classes 1) AML 2) AML 3) healthy ... Trained Classifier Final classes 1) AML 2) AML 3) healthy ... Take vote Patient class Patient 1) AML Take vote a. b.

Figure S8: Schema for a voting classifier for flowBin output incorporating

bal-anced bagging. a. Training. This is similar to the base classifier (Fig. S7), except

that multiple classifiers are trained, each on a bootstrap subsample of patients. Each

bootstrap sample is set to contain equal numbers of patients from each class. b.

Pre-dictionTo predict the class of a new patient, predictions for each bin from that patient

are made by each of the trained classifiers. Final per-bin predictions are taken by ma-jority vote of those predictions. Then, the prediction for the patient is made based on a majority vote of the per-bin predictions.

(10)

All Cells CD34+ CD34+CD61− CD34+CD61−CD14− CD34+CD10−CD61−CD14− CD34+CD20−CD10−CD61−CD14− CD34+CD20−CD10−CD61−CD14−CD3+ CD34+CD20−CD61−CD14− CD34+ CD61− CD14− CD10− CD20− CD20− CD3+ 1 2 3 4 5 6 7 8 9 -log10(P-value)

Figure S9: An example of RchyOptimyx analysis of one cluster of cell types. As

801 cell types are too many to visualise meaningfully with RchyOptimyx, we clustered the cell types and visualised each in turn. In this example, the addition of CD10- or

CD20- make little difference to the P-value of the cell type CD34+CD61−CD14−. As

this was a general trend and in line with reported AML biology, we chose to exclude cell types defined over these markers from further analysis.

(11)

q q q q q q 0 wt mt P rop or tio n of all ce lls wt mt wt mt CD34-CD13+ P=0.00221 CD34-CD33+ P=0.00225 1 CD34-P>0.05 CD13+CD34−CD33+P=0.00138 wt mt 0 P rop or tio n of all ce lls 1 CD34-CD2-P=0.0265 CD34-CD2-CD4+ P=0.000149 CD13+CD34-CD2-CD4+ P=1.94e-06 wt mt wt mt wt mt 0 P rop or tio n of all ce lls 1 wt mt wt mt wt mt CD34+CD61-CD14-P=0.015 CD34+CD61-CD14-CD2+ P=0.000114 CD34+CD61-CD2+CD4-P=0.00548 0 P rop or tio n of all ce lls 1 HLA+CD34+CD33-CD64-P=0.0235 HLA+CD34+CD4-CD64-P=0.0119 wt mt wt mt

a.

b.

c.

d.

Figure S10: Selected classes of cell types showing significant differences in

abun-dance betweenNPM1-mtandNPM1-wt.P-values are given after Holm correction. a.

Gating for the presence of myeloid lineage markers CD13 and CD33 within the

CD34-compartment yields much stronger differences in abundance betweenNPM1-wt and

NPM1-mt than CD34- alone. b. Gating for CD2- within the CD34- compartment yields a slightly better separation than CD34- alone, but gating down further to

CD4-and CD13+ is a cell type that, while present in mostNPM1-mt, is absent or below 20%

abundance in nearly allNPM1-wt. c. Gating for CD61- and CD14- within the CD34+

compartment leads to a cell type which is common inNPM1-wt but almost entirely

absent inNPM1-mt. d. Gating for HLA-DR+ and CD64- within the CD34+

compart-ment leads to a cell type that occurs in a subset ofNPM1-wtbut is entirely absent in