• No results found

3.1.1

Spatial gene expression data

Experimental advances have enabled assaying RNA and protein abundances of single cells in spatial contexts, thereby allowing the study of single cell variation in tissues. Already, these

technologies have delivered new insights into tissue systems and the sources of transcriptional variation (Bodenmiller, 2016; Battich et al., 2013), with a potential use as biomarkers for human health (Bodenmiller, 2016).

Different technologies allow for generating spatially resolved expression profiles. Imaging Mass Cytometry (IMC) (Giesen et al., 2014; Chang et al., 2017) and Multiplexed Ion Beam Imaging (MIBI) (Angelo et al., 2014) rely on protein labelling with antibodies coupled to metal isotopes of specific masses followed by high-resolution tissue ablation and ionisa- tion. IMC currently allows for the profiling of up to 37 targeted proteins with subcellular resolution. Other methods such as MxIF and CycIF use immunofluorescence for protein quantification of dozens of markers in single cells (Gerdes et al., 2013; Lin et al., 2015). Increasingly, there also exist fluorescence-based assays to measure single cell RNA levels in spatial context (Strell et al., 2018). Mer-FISH and seqFISH use a combinatorial approach of fluorescence-labeled small RNA probes to identify and localise single RNA molecules (Shah et al., 2016; Chen et al., 2015; Gerdes et al., 2013; Lin et al., 2015), which allows for a larger number of readouts (currently between 130 and 250). Even higher-dimensional expression profiles can be obtained from spatial expression profiling techniques such as Spatial Tran- scriptomics (Ståhl et al., 2016). However, they currently do not offer single cell resolution and are therefore not sufficient to study cell-to-cell variation.

3.1.2

Modelling the spatial context

The availability of spatially resolved expression profiles from a population of cells provides new opportunities to disentangle the sources of gene expression variation. Spatial context can for example be utilised to distinguish intrinsic sources of variation due to differences in cell types or states (Buettner et al., 2015), e.g. cell cycle stage (Scialdone et al., 2015), from sources of variation which relate to the spatial structure of the tissue, such as microenvi- ronmental effects linked to the cell position (Fukumura, 2005), access to glucose or other metabolites (Meugnier et al., 2007; Lyssiotis and Kimmelman, 2017), or cell-cell interactions. To perform their function, proximal cells may interact via direct molecular signals (Sieck, 2014), adhesion proteins (Franke, 2009), or other types of physical contacts (Varol et al., 2015). In addition, certain cell types such as immune cells may migrate to specific locations in a tissue to perform their function in interaction with local cells (Moreau et al., 2018). In this thesis, cell-cell interactions is used as a general term to designate any of these phenomena.

3.1 Introduction 37

More specific biological interpretations are discussed in Section 3.5.3 and Section 3.6.3.

While intrinsic sources of variation have been extensively studied, the cell-cell interaction component is arguably less understood and yet one of the most important, as it holds the promise to understand how genes are expressed in cells that participate in different tissue level functions. Yet, although experimentally spatial omics profiles can already be generated with high throughput, the required computational strategies for interpreting the resulting data are only beginning to emerge. Only a few methods quantify the impact of spatial features on the variance of individual genes, and even fewer methods specifically measure the effect of cell-cell interactions.

On the one hand, there exist methods to link the spatial position of cells to their expression profile. For example, there exist clustering methods that infer groups of cells from the same spatial location, solely based on their expression profiles (Achim et al., 2015). Other methods implement statistical tests of differential expression in space, which provide an overall assessment of the effect of the spatial topology on gene expression (Svensson et al., 2018a). However, none of these two approaches allow for directly quantifying cell-cell interactions.

On the other hand there exist methods which study cell-cell interactions, but only qualita- tively or relying on discretisation steps which limit their interpretability or applicability. For example, some methods study tissue organisation by looking at the spatial cooccurrence of discrete cell types in predefined cellular neighbourhoods (Schapiro et al., 2017; Schulz et al., 2018). These approaches provide qualitative insights into interactions between cell types but they do not allow for quantifying their impact on individual genes. In contrast, some regression-based models assess interaction effects on individual gene expression levels, based on predefined features of cell neighbourhood (Goltsev et al., 2018; Battich et al., 2015). However the prior engineering of microenvironmental features relies on discretisation steps which are arbitrary and not always directly interpretable (see Section 3.2.7).

3.1.3

SVCA: Spatial Variance Component Analysis

Here, we present Spatial Variance Component Analysis (SVCA), a computational framework to model spatial sources of variation of individual genes. SVCA allows for decomposing

gene expression variation into intrinsic effects, environmental effects and, most importantly, an explicit cell-cell interaction component. In contrast to previous modelling approaches, the model uses the spatial coordinates directly and the continuous expression profiles of individual cells as inputs, thereby avoiding the need to define discrete cell types and microen- vironmental variables.

We validate our model using simulated data, by showing that SVCA yields more accurate estimates of cell-cell interactions than alternative methods. We also illustrate the flexibility of SVCA by showing that it is more robust to confounding factors such as cell mis-segmentation.

We then illustrate SVCA using two real datasets from different technologies and biological domains: IMC proteomics profiles data from human breast cancer tissue (Schapiro et al., 2017) and spatial single-cell RNA profiles from the mouse hippocampus generated using seqFISH (Shah et al., 2017). Across these applications, we find that our model, and in particular the cell-cell interaction component, explains a major share of expression variability and facilitates the identification of biologically relevant genes and gene families participating in cell-cell interactions, such as glutamate receptors or cell junction genes in the brain.

Related documents