• No results found

2.4 Diffusive hidden Markov method

2.4.3 Implementation

Obtaining step distribution functions

It would be a daunting task to determine the appropriate step distribution functions Tunl and

Tloop appearing in Eq. (2.4.2) a priori directly from theory. For example, bead–wall hydro- dynamic interactions depend on the bead’s height, which is not observed; the tether couples the unobserved bead orientational fluctuations to the observed position fluctuations; and so on. These difficulties are circumvented by empirically determining the Tunl and Tloop from experimental control data for the two states. After these functions are obtained, the simple

model of tethered-particle dynamics constructed from them is confirmed to reproduce some nontrivial features of the real control data (Figs. 2.3 and 2.5). Finally, dynamic-looping data are examined, and the two remaining free parameters τLB andτLF in Eq. (2.4.2) are adjusted until the log-likelihood,ln[Ptot(O)], is a maximum.

In order to obtain Tunl(r|r0) from the unlooped control data, note that this function must be symmetric under rotations of bothrandr0 about the attachment point by a common angle. Thus this function for r0 only needs to be determined on the xˆ-axis, at some radial distance

ρ0. Starting from a time series for unlooped DNA (no repressor protein present), all of the points in the time series for which the bead center’s distance from the anchor point, ρ0, lies in a particular range are selected. Next, the rotation in the plane that bringsr0 to thexˆ-axis is determined, and also applied to∆r=rr0, the bead’s vector displacement to its position on the following video frame. Finally, a 2-dimensional histogram of the observed displacements ∆r is constructed, and normalized to obtain Tunl(r|r0). The process is repeated, producing histograms for all observed initial distances ρ0. Using data obtained from about 30 minutes of bead observation, the observed range of ρ0could be divided into 30 intervals and still have reasonable statistics in the histograms; Figs. 2.8–2.9 shows typical examples for two values of

ρ0. A similar procedure is applied to the permanently-looped control data to obtainTloop(r|r0).

The step-probability distributions (Fig. 2.9) obtained in this way show that at small ρ0

there is no preferred direction for the next time step; for larger ρ0, the tether is stretched and exerts a restoring force on the bead, so the step distribution shows a bias to diffuse toward the attachment point (the xˆ direction). Next, a convenient analytical representation of these

500 0 500 1000 x (nm) -500 0 500 y (nm) -500 0 500 1000 x (nm) -500 0 500 y (nm)

Figure 2.8:2D histograms of unlooped control data at two values ofρ0 near (left) and far (right) from the anchor point (0,0) after rotation onto thexˆ andyˆ axes (see text). The dots on the upper x-y plane indicate the initial position ρ0 (full circle) and the mean midpoint of the final position µ(ρ0)(open circle). Forρ0near the anchor point the two dots coincide.

-250. 0. 250. 500. 750. 1000. x (nm) 0. 0.0005 0.001 0.0015 0.002 0.0025 -600. -400. -200. 0. 200. 400. 600. 800. y (nm) 0. 0.0005 0.001 0.0015 0.002 0.0025

Figure 2.9: Projections of the two distributions in Fig. 2.8 onto thexˆ andyˆ axes, together with Gaus- sian distributions chosen to idealize them. Left: thevertical linesrepresent two choices for the initial bead position;dotsrepresent the corresponding distributions of bead positions on the next video frame. Note the shift in the meanxˆ at largerρ0 (open) compared to shorterρ0 (full). Right: No such shift is observed in theyˆ direction.

distributions, both for computing the likelihood function Ptot(O), and also for simulation purposes is described.

Each distribution is seen to be roughly a 2D Gaussian, with one principal axis along the radial direction to the attachment point. After rotating r0 to lie along thexˆ axis as described above, the principal axes of the distribution are the xˆ- andyˆ-axes. The center point also lies

0. 200. 400. 600. 800. -300. -200. -100. 0. 0. 200. 400. 600. 800. 120. 140. 160. 180.

Figure 2.10: Empirical fit functions for the mean (left) and variance (right) of the 2D histograms (see e.g. Fig. 2.8) for experimental control data corresponding to unlooped (solidsymbols) and looped (opensymbols) tether states. Corresponding fit parameters for the functions of Eq. (2.4.3) are located in Sect. 2.4.3.

on the xˆ-axis, and is increasingly shifted from ρ0xˆ toward the attachment point (0, 0) as ρ0

increases due to the tether’s entropic elasticity. Accordingly, Tunl(r|r0) is characterized for each fixed r0 = (ρ0, 0) by finding its meanhxi and the variances in the x and y directions. The meanhyiequals zero (see for example Fig. 2.9), as it must by rotational invariance. Thus three functions ofρ0that characterize the histograms are required: a 3rd-order polynomial for the meanµ(ρ0) ≡ hxiρ0, and sigmoids for the variancesσx2(ρ0)andσy2(ρ0)(see Fig. 2.10):

µx(ρ) = a0+a1ρ+a2ρ2+a3ρ3

σx2(ρ) =b0/(1+e(ρ−b1)/b2) +b3

σ2

y(ρ) =c0/(1+e(ρ−c1)/c2) +c3

(2.4.3)

Gaussian distributions inxandystarting from the pointr0 = (ρ0, 0): Tunl(r|r0) = Gx(x|ρ0)·Gy(y|ρ0) where (2.4.4) Gx(x|ρ0) = (2πσx2(ρ0))1/2exp µ (x−µ(ρ0))2 2σ2 x(ρ0) ¶ (2.4.5) Gy(y|ρ0) = (2πσy2(ρ0))1/2exp à −y2 2σ2 y(ρ0) ! (2.4.6)

Examples of these functions for illustrative values ofρ0appear in Fig. 2.10. For arbitraryr0

(not necessarily on thexˆaxis), the probability is evaluated by rotatingr0to thexˆ-axis, rotating

rby the same amount, and evaluating Eq. (2.4.4) on the components of the rotatedr.

Truncated Gaussian approximation

The procedure summarized in Eqs. (2.4.4–2.4.6) is conceptually simple. The accuracy of the calculations, however, can be improved with a small elaboration. Like any Gaussian, the dis- tribution defined above is nonzero for anyxandy. In reality, however, the DNA tether sets an absolute limit on ρbeyond which the probability must be exactly zero. Not surprisingly, fol- lowing the procedure outlined above yielded simulated time series that occasionally violated this limit. Although the effect of this error may be minor for the unlooped step distribution, for the looped distribution it could interfere with looping state identification.

Accordingly, formula forTunl(r|r0)is modified to account for the limit in an approximate (and computationally inexpensive) way: Eq. (2.4.5) is replaced by a truncated Gaussian func- tion. That is,Gx(x|ρ0)is set to zero forx max, and a Gaussian with modified parameters forx max. The modified parameters were chosen in such a way that the truncated Gaussian would again have the mean µ(ρ0) and varianceσx2(ρ0) shown in Fig. 2.10. That is, for each value ofρ0, theµandσ2

a new Gaussian, with modified parametersµ˜ andσ˜x2is found, which has meanµand variance

σ2

x when the probability of values greater thanρmaxis set to zero.

IdeallyGshould be chosen to be a function that vanishes whenx2+y2exceeds(ρmax)2, and falls smoothly to zero as that boundary is approached. To make the calculations tractable,

Gis taken to be the product of a cutoff, shifted, 1D-Gaussian inxtimes an ordinary Gaussian in y. Examination of many graphs like Fig. 2.8 indicate that this simplification adequately represents the empirical histograms. Moreover, since the axes are rotated to make the initial position lie along the x-axis, an extra excursion along x is more likely to violate the tether condition than one alongy. Small changes in the choice of the empirical functionGhave little effect (see Sect. 2.6) on the final results.

To implement efficient calculation of the truncated Gaussian Gx(x|ρ0), a look–up table for the Gaussian with mean, variance, and normalization (µ˜(ρ0), ˜σ2(ρ0) and N˜) is evaluated such that whenρ maxthis Gaussian is zero and satisfies the mean and variance(µ(ρ0)and

σ2

x(ρ0))determined empirically from data in [86]:

Z ρmax ∞ dx 1 ˜ Ne (x−µ˜)2/(2 ˜σ2) =1, Z ρmax ∞ dx x ˜ Ne (x−µ˜)2/(2 ˜σ2) =µ(ρ0) (2.4.7) Z ρmax ∞ dx x2 ˜ Ne (x−µ˜)2/(2 ˜σ2) =σx2(ρ0) (2.4.8) Such look-up tables were evaluated at 100 values of ρ0 for both (unlooped, looped) tether states using Eq. (2.4.3) and parameters: a0=(0,0), a1= (-0.068, -0.238), a2= (-5.0e-4, -7.9e-4), a3 = (1.52e-7, 6.30e-8), b0= (35.1, 30.6), b1=(161.75, 107.95), b2=(242.3, 173.8), b3=(100.16, 66.42), c0=(37.11, 11.43), c1=(159.8, 126.3), c2=(444.88, 177.06), c3=(180.72, 67.86), where

4 6 8 10 5 10 20 -30 -20 -10 0 15

Figure 2.11: Evaluation ofln[Ptot (O)]on a logarithmically-spaced grid of τLF,τLB lifetimes corre-

sponding to data from Fig. 2.2.

Optimization

The optimum lifetimes are found by evaluatingPtot(O)on an evenly-spaced logarithmic grid of values for τLF andτLB, and the location of the maximum Ptot (O)determined. The result- ing likelihood surface is smooth (Fig. 2.11), so the peak likelihood can be determined more precisely by fitting a 2D quadratic in the neighborhood of the optimum lifetimes. The un- certainty of the optimum lifetimes corresponding to ln[Ptot (O)]2, i.e., enclosing 97% of the probability, were estimated along the principal axes of this 2D quadratic to account for any correlation between the estimated lifetimes. In order to facilitate this iterative process, an automated simplex solver routine was implemented to find the maximum [99].

Simulation strategy

Simulations of bead motion, with and without dynamic looping, were performed in Mathe- matica to test the DHMM model. Each step of the simulation (a) first determined whether or not to remain in the current looped state, and then (b) the next spatial position was determined appropriate for the particular loop state. In more detail, (a) If the initial state was looped, a pseudorandom number was used to determine whether to transition to the unlooped state with

probability τt

LB. If the initial state was unlooped, and if ρ < ρmax, then a transition to the looped state was allowed with probability τt

LF. Next (b) a(∆x,∆y) pair was drawn from the appropriate distribution obtained in Sect. 2.4.3. That is, ∆y was Gaussian distributed, and similarly for∆xexcept that steps resulting inρ>ρmaxwere discarded and the step repeated in order to achieve a truncated Gaussian as discussed earlier.