• No results found

Ordering cells along differentiation trajectories using STREAM

Chapter 3: A single-cell atlas of adult murine haematopoiesis

3.8. Ordering cells along differentiation trajectories using STREAM

The diffusion map is a useful non-linear dimensionality reduction method to visualise continuous differentiation processes within the scRNA-seq dataset. The analysis informed on gene and surface marker expression throughout the HSPC atlas and was used to identify three differentiation trajectories within the data using pseudotime ordering. However, the visualisation lacks information about cell density and cell type composition.

Recently, a new trajectory inference tool called STREAM (Single-cell Trajectories Reconstruction, Examination and Mapping) was developed to reconstruct differentiation trajectories and capture gene expression changes during differentiation using pseudotime ordering (Trapnell et al. 2014; H. Chen et al. 2018). STREAM uses a non-linear dimensionality reduction method called Modified Local Linear Embedding (MLLE) and infers trajectories using a novel method called ElPiGraph

(Lever, Krzywinski, and Altman 2017; Z. Zhang and Wang 2006). ElPiGraph differs from other methods as it does not require drastic dimensionality reduction or pre-clustering to infer trajectories. In addition to capturing trajectories within a dataset, STREAM uses a visualisation method which includes density information throughout pseudotime. This is a useful tool to track cell population composition changes along a trajectory.

The Pinello lab used our scRNA-seq dataset to demonstrate STREAM and their interactive web- tool. The online interface allows the user to manipulate the data to interrogate branching and gene expression patterns. STREAM is available at the following link:

http://stream.pinellolab.org/MNoPZ/

HSCs were selected as the start of the branching structure and the scRNA-seq dataset was visualised on a “subway plot” (Fig. 3.9A) and a “stream” plot (Fig. 3.9B). The subway plot orders cells according to their pseudotime score and distance from their assigned branch. The purpose of the subway plot is to visualise the branching structure of the data to understand the pseudotime progression. The stream plot also orders cells based on their pseudotime score but incorporates information on the density and composition of cell types along the different trajectories. The stream plot visualisation requires the user to input cell type information and is made from the subway plot using a sliding window approach. The length of the plot represents a cell’s location along pseudotime, whereas the width of the plot is proportional to the number of cells.

The subway plot showed that STREAM analysis identified three lineages in the data: erythroid, myeloid, and lymphoid. The analysis suggests the lymphoid cells entered their trajectory before the myeloid and erythroid cells. The stream plot showed that the lymphoid branch was composed mostly of LMPPs, the erythroid branch of MEPs, and the myeloid branch of CMPs and GMPs. The lymphoid trajectory stopped before the other lineages, which is due to there being fewer lymphoid cells in the analysis as more mature lymphoid cells were excluded from the sorting gates. The expression of the genes Procr, Klf1, Ctsg and Dntt were visualised on stream plots to represent the HSCs, the erythroid, myeloid, and lymphoid lineages, respectively (Fig. 3.9C). The branching and gene expression patterns were consistent with the diffusion map visualisation. Dntt expression was observed in the lymphoid branch as well as in cells heading towards the erythroid and myeloid lineages. The stream plot (Fig. 3.9B) showed LMPPs are present in the trajectory at this stage (S1- S3 on the subway plot), accounting for the observed Dntt expression pattern.

STREAM analysis also detects genes important in defining branching points in the data (Fig. 3.9D). The user can identify which branch they want to investigate using the annotations marked on the subway plot, and STREAM identifies genes differentially expressed between the diverging branches. Cd63 and Hlf were highly expressed on the HSC branch compared to cells after the first bifurcation event. Cd63 encodes for an endosome-associated protein that has previously been identified as a marker of HSCs in cultured human CD34+ HSCs, and Hlf has recently been shown to be a key regulator in HSC quiescence (Komorowska et al. 2017; Beckmann et al. 2007). Conversely, Il12a and Cst7 were more highly expressed after the bifurcation event than in HSCs. Il12a encodes for a subunit of the IL-12 cytokine, a main activator of natural killer cells, and Cst7 is involved in normal eosinophil function (Seaman 2000; Halfon et al. 1998). Ltb and Uhrf1 were identified as differentially expressed between the lymphoid and erythroid/myeloid lineages, respectively. Ltb is involved in the development of normal lymphoid tissue, whereas Uhrf1 is an epigenetic regulator required for establishing DNA methylation patterns of erythroid genes (Koni et al. 1997; J. Zhao et al. 2017). STREAM also found genes marking the second bifurcation event. Gimap6, which encodes a protein required for T-cell maintenance, was more highly expressed in cells before the differentiation point (Pascall et al. 2018). Conversely, Sdsl and Rab44, which are associated with the erythroid and myeloid lineages, respectively, were more highly expressed in their respective lineages (Poczobutt et al. 2016; Khoramian Tusi et al. 2018). Finally, when the erythroid and myeloid lineages were directly compared, Mfsd2b and Hk3 were differentially expressed in the two trajectories. Mfsd2b is involved in red cell morphology, whereas Hk3 is involved with neutrophil differentiation, supporting the notion that these genes may mark a branching point between the lineages (Vu et al. 2017; Federzoni et al. 2012).

STREAM analysis was also used to look for transition genes, defined as genes for which the expression correlated with the pseudotime ordering on a given branch (Fig. 3.9E) (H. Chen et al. 2018). These genes were selected by the STREAM interface based on their differential expression across the stream plot. This analysis can also give insight into cell-fate decision making and has the potential to discover novel genes. Tgm2 had increasing expression towards the tip of the HSC branch, whereas Tespa1 showed increased expression moving away from HSCs. Tgm2 is an extracellular matrix protein previously suggested to be a regulator of LT-HSCs (Forsberg et al. 2005). Tespa1, on the other hand, is a signalling molecule that plays a wide range of roles in more differentiated cells, including T-cells and mast cells (Liang et al. 2017; D. Wang et al. 2012). Igsf6, which is involved in myeloid differentiation, and Ctla2a were inversely correlated with the myeloid

branch (Stein and Baldwin 2013). The expression of Smim1, which encodes an erythroid transmembrane protein, marked the tip of the erythroid trajectory, while Coro1a expression was higher in the less differentiated cells (Storry et al. 2013). Finally, Tyms, a gene involved in DNA replication and repair, was identified as a transition gene moving away from the lymphoid trajectory (Ozer et al. 2015). Ltb (Fig. 3.9D), involved in normal lymphoid organogenesis, was the transition gene identified towards the lymphoid trajectory (Koni et al. 1997). Except for Ltb, these genes were not identified in previous analyses, such as hierarchical clustering, demonstrating that STREAM can be used to find novel genes of interest.

Figure 3.9. STREAM analysis reveals information about pseudotime ordering in the HSPC differentiation landscape. (a) Subway plot of the scRNA-seq data visualised based on its pseudotime ordering. Cells are coloured

based on cell type. The trajectories are ordered and coloured for the user to easily manipulate the data. S2-S1 (blue) – HSC to first branching point; S1-S0 (green) –lymphoid trajectory; S1-S3 (orange) – branch into myeloid and erythroid trajectories; S3-S4 (purple) – myeloid trajectory; S3-S5 (red) – erythroid trajectory (b) Stream plot of the scRNA-seq

data visualised based on its pseudotime ordering. The width of each branch is proportional to the total number of cells. Branches are coloured based on cell type composition. HSCs – turquoise; CMP – red; MEP – blue; MPP – yellow; GMP – green; LMPP – brown. (c) Stream plots of all cells coloured based on the expression of selected genes. The genes chosen were previously used in Figure 3.4 to mark branches within the diffusion map. (d) Stream plots of genes identified to be differentially expressed between erythroid and myeloid branches. (e) Stream plots of genes identified to be correlated with the pseudotime ordering on the HSC branch. Genes are ordered based on which branch they are associated with (moving from HSCs to erythroid).