Chapter 4 Experimental data analysis
4.5 Population levels
In this Section we introduce some prior information, from the literature as well as the knowledge of our collaborators, about the total molecular population of Nrf2 reporter; this will allow us to constrain the parameter space of the proportionality constantsκ(Ni) and κ(Ci).
Biggin (2011) presents a survey, from the literature, of several reliable esti- mates of transcription factors (TFs) in humans and animals, and indicates that most animal TFs are expressed at 10,000-300,000 molecules per nucleus. More specifically for Nrf2 protein, our biological collaborators believe the population of our TF to be between 5,000 and 50,000 molecules (personal communication with Prof. Paul Thornalley). In fact, Nrf2 is a low copy regulatory TF, which is present with fewer molecules than high abundance housekeeping proteins.
Xue et al. (2015b) estimate that, on the same data we analyse, the reporter Nrf2 only induces a minor increase in the total Nrf2 pool of 4-7%. This increase refers to the entire population of cells considered; however, the mean overall trans- fection was only 40% (personal communication with two of the authors, Prof. Paul Thornalley and Dr Hiroshi Momiji). By overall transfection, we refer to the transient
transfection process, described in Section 2.2, where we insert into cells an engineered version of the DNA, that is able to transcribe reporter mRNA which then translated into the fluorescent reporter protein that we observe. This process is not always successful and, sometimes, we fail to observe the reporter Nrf2. In particular, in our data, the overall tranfection is successful in about 40% of the cells. Therefore, if we only consider the fraction of cells where transfection was successful, which are the ones we analyse in this study, Nrf2 reporter induces an increase of up to 10-17.5% of the total molecular population.
We can use these pieces of information to gain an understanding of the range of possible values for the total number of molecules of Nrf2 reporter in a cell, i.e.
XtN(i)+XtC(i) at time t for i-th cell. While there is more uncertainty regarding a
plausible upper bound for this interval, we can more easily formulate a conservative
lower bound. First, we consider 5,000 as the lower bound for the original Nrf2
population in an entire cell, which is the minimum of the two lower bounds described above. This corresponds to 200-350 molecules of Nrf2 reporter in a cell, assuming the 4-7% proportion, and to 500-875, for the more realistic 10-17.5% estimate.
Furthermore, the light intensities in the available cells, as visible in Figure 2.3, are very homogeneous and one cannot distinguish single molecules by eye; this indicates that each cell has many molecules of reporter Nrf2, probably in the order of hundreds, to create a smooth light intensity when stimulated by a laser.
Say we assume a general lower bound, which we call min˜ X, and limit the
cellular population of Nrf2 reporter to have at least min˜ X proteins in each cell we
analyse and where transfection process was successful. This translates into having at
leastmin˜ X molecules, on average over the observational time, for each latent process
Xt(i)= (XtN(i), XtC(i))T, as Et(XtN(i)+XtC(i))>min˜ X.
We can re-formulate this constraint in terms of observed processes, Yt(i), by
inverting the measurement equation in (3.21) and exploit the fact that the error has zero mean, as Et(XtN(i)+X C(i) t )'Et YtN(i) κ(Ni) + YtC(i) κ(Ci) ! >min˜ X, (4.7) whereκ(Ci) is obtained as κ (i) N c(i) = κ(i) Ω(Ci) =κ (i) C .
To implement this constraint, we simply limit parameters κ(Ni) and c(i) to
respect (4.7).
some plausible values for it: 100, 200, 500 and 1,000. We compare how the inferred parameters, under each constraint, are able to mimic the observed data; in partic- ular we consider the autocorrelation (ACF) function as a proxy for the oscillatory behaviour.
Study of the constraint
We use the methodology described in Chapter 3 to infer the posterior distributions of the parameters from the experimental data, under both conditions. We repeat the full analysis on both our experimental data sets four times, each time using a different constraint in equation (4.7): 100, 200, 500 and 1,000. In this way, we obtain, for every experimental condition, four complete posterior densities for each parameter. All four analyses produce similar results with respect to the difference between the two conditions.
For each constraint and condition, we select 100 parameter values from the
MCMC posterior chains of every cell, excluding burn-in; values are chosen to be
equally spaced along the chains in order to diminish their dependence and obtain almost independent draws from their posterior distributions.
From each of the 100 selected parameter vectors, we simulate, via the DA
and measurement equation (3.21) as in Section 3.7, a process for Y(i). Therefore,
for every constraint, we obtain 100 simulated processes per cell, for a total of 3,500 processes for the basal condition and 3,600 for the stimulated one.
In every cell, we compare the estimated autocorrelation (ACFs) function of each simulated process with the ACFs of the original data; in particular, we compute the sum of the absolute differences, for nucleus and cytoplasm, between the ACFs, of lag 1 to 60 minutes, of original and simulated data. Therefore, for each cell and simulation, we obtain a number estimating how well the simulated data mimics the oscillatory pattern of the experimental data. We then average these quantities, over the 100 simulations and over the cells, and obtain, for every constraint, one value for the basal and one for the stimulated condition, indicating how closely, overall, the simulations emulate the ACFs of the observed data, where the ACF is taken as a proxy for the oscillatory behaviour.
Figure 4.11 shows the sum of absolute differences for the various constraints: both conditions exhibit similar patterns with a clear minimum at 200.
In the next Section, we show inferential results, obtained from the available
experimental data assuming, via constraint (4.7), that at leastmin˜ X = 200molecules
200 400 600 800 1000 0.94 0.96 0.98 1.00 1.02 1.04 1.06 Constraint
Figure 4.11: Sum of absolute differences of autocorrelations for the basal (blue line) and stimulated (red curve) conditions, for constraint (4.7) equal to 100, 200, 500 and 1,000 (horizontal axis).