Protein Expression Models - Literature review

Chapter 2 Literature review

2.3 Protein Expression Models

Building accurate models for protein expression requires not only the chemical prop- erties of the molecules involved, but also their spatial distributions. This is especially important for proteins because the subcellular location of a protein is so critical to its function that the same protein can have dierent functions at dierent locations [174]. In addition, for some proteins such as β-catenin [175] and NF-κB [176], the

extent of localisation in the nucleus can be used as a biomarker to predict cancer patient prognosis. A number of studies have been concerned with simulating protein expression within a single cell. Most of these models consider the dynamic behaviour of interacting molecules over time. Simplied models analyse protein-protein interaction networks using homogeneous methods in which chemical species are assumed to be well mixed. Such methods include systems of ordinary dierential equations (ODEs) and the Gillespie method [177, 178]. These methods can be extended to include compartmental models in which one can dene homogeneous computational compartments determining which molecules can interact with each other. The high

eciency of these methods have made them very popular for modelling systems in which the copy number of each species is large and compartments are expected to be reasonably well mixed.

However, for some proteins the number of molecules found in a cell can be very low and vary greatly between cells [179]. In addition, the heterogeneous nature of cells is critical to their function [180]. As a result, signicant eorts have been made to develop spatial models for these biochemical systems. For example, a simple model of an idealised cell demonstrated how the eccentricity of the cell aects plasma membrane signalling [181]. The Virtual Cell project [182] enables the formulation of both compartmental and spatial partial dierential equation models, the latter with either idealized or experimentally derived geometries of one, two or three dimensions. Similarly, Monte Carlo Cell (MCell) and Smoldyn [183, 184, 185] use agent-based methods which simulate each molecule individually and evaluate their diusion and probability of interactions on a per-particle basis for each time step. Although extremely computationally expensive, these methods have very high spatial resolution and are very successful at modelling interactions of small numbers of heterogeneously distributed molecules. However, as these methods are stochastic, they require multiple random initialisations of the simulation in order to determine the expected behaviour of the system, further adding to the computational cost of these simulations.

Nevertheless, majority of cellular modelling continues to be with a homogeneous spread of the molecules despite the development of these spatially resolved simulation tools. This is due in part to the limited realistic geometries available for simulation which are often either hand segmented or manually fabricated, both of which can be very time-consuming. In addition, there is still need for the development of ecient ways to study cellular response using targeted geometries and organisations, as these simulation tools currently require a large amount of training to properly use. Furthermore, while these methods can be useful for studying the dynamics of protein interaction, they do little to simulate the corresponding mi- croscopy image data, which is necessary for validation of image analysis methods such as cell-compartment classication methods [186, 161, 162, 167, 163]. To address this issue, Zhao and Murphy [164] presented a machine learning method to generate realistic cells with labelled nuclei, membranes and a protein expressed in a cell organelle. Parameters for these models were learned from real images of cells in culture. However, these generative models are restricted to individual cells in culture and only one protein of interest at a time. Hence, this method struggles to capture the dynamic interplay between cells.

Chapter Summary

In this chapter, we have reviewed the existing literature on multiplex imaging. The review covered quantitative data mining methods developed for analysis of the TIS imaging data. These include pixel-level analyses both with and without tresholding the intensity values. Due to the general nature of the analysis framework presented in the next chapter, we also briey reviewed studies that have been performed with other multiplex techniques such as MALDI, Raman, multi-spectral imaging, MxIF and imaging mass cytometry. The chapter also included a review of frameworks for the generation of synthetic image data. Currently, the majority of these methods focus on the generation of homogeneous cell populations in culture. We have also briey reviewed current methods for simulating protein expression.

Chapter 3

DiSWOP: A Novel Measure for

Cell-Level Protein Network

Analysis in Localised Proteomics

Image Data

In this chapter, we propose a framework for analysing multiplex image data. As discussed in Chapter 2, the standard way of analysing image data obtained using TIS is to threshold it and then cluster CMPs into CMP motifs. While the lead proteins identied using this approach have been shown to be of functional signicance, by thresholding the data a lot of potentially important information is lost. On the other hand, if one considers the raw protein expression proles without tresholding, the data rst needs to be normalised in a robust manner. This is due to inter-sample and inter-protein intensity variations that could result from small dierences in sample preparation, imaging and antibody concentrations. This could be a very dicult issue to address due to the lack of controls and ground truth data. Instead here we focus on obtaining the protein interaction networks by considering the protein- protein dependence prole (PPDP) of the cells instead of the raw protein expression proles. In Section 3.3 we present several measures that could be used to calculate the PPDP and demonstrate why some of them and the raw expression proles fail.

Furthermore, we perform the analysis at cell level rather than pixel level. This minimises noise from unspecic binding of the protein antibodies to the extra- cellular matrix, stroma and lumen. In addition, the pixel size is not of any biological relevance. Hence, clustering of pixels gives large amounts of noisy data of little biological meaning. Our approach phenotypes the cells according to their PPDP.

Figure 3.1: Overview of the proposed framework.

This enables us to gain a better understanding of the heterogeneity within the cancer cell population. We can also compare cell phenotypes present in healthy tissue and dierent cancers.

Lastly, two new measures are proposed to enable us to infer small-scale protein networks. These new measures highlight protein pairs which have very dierent interaction in cancer and normal tissue. An overview of the approach is presented in Figure 3.1.

In document Modelling and analysis of the tumour microenvironment of colorectal cancer (Page 40-44)