• No results found

INFERENCE OF EPIDEMIC PARAMETERS 69 0 0.02 0.04 0.06 0.08 0

Inverse Dynamics in epidemics: the SIR model

6.4. INFERENCE OF EPIDEMIC PARAMETERS 69 0 0.02 0.04 0.06 0.08 0

0.5 0.6 0.7 λ 0 0.02 0.04 0.06 0.08 0.1 0.4 0.5 0.6 0.7 µ Noise Fraction

Figure 6.4.2. Inferred epidemic parameters for different observational noise rates ν. Forward epidemic is simulated until observation time T = 10. Each box refers to M = 1000 instances of Random Regular graphs with N = 1000 nodes and degree g = 4. Box edges signal the 25th and 75th percentiles, the central red lines is the median. Whiskers extend up to cover 99.3% of the data for a gaussian distribution. Outliers are marked as red points outside the whiskers.

Once the free energy is expressed as a function of the parameters, its minimum (maximum of the log-likelihood) may be searched with a simple gradient descent procedure, by means of the following updates: λ ← λ − ∂f ∂λ (6.4.6) µ ← µ − ∂f ∂µ (6.4.7)

with  a free convergence parameter (note that the minus sign comes from the definition of the free energy). A detailed derivation of the expression of the derivatives ∂f∂λ and ∂f∂µ of the Bethe free energy is reported in Appendix B.2. In principle, the expressions obtained using the Bethe free energy are valid only at the BP fixed point, and one should let BP updates converge before making a step of gradient descent. In practice, it is sufficient to interleave BP and gradient descent updates in order to obtain equivalent results. In order to validate the method, I performed extensive simulations with a wide range of parameters and found that, for reasonable fraction of infected nodes at the observation time, this method simultaneously identifies the patient-zero perfectly and finds good estimates of the epidemic parameters. Some examples of inferred parameters are shown in Fig. 6.4.1 for six different configurations of (λ, µ) parameters, with each pair of box plots referring to M = 1000 samples.

The inference of parameters can be performed also in the presence of observational noise. Fig. 6.4.2 shows an example of inference for increasing levels of noise in the observation, as defined in section

6.3. Also in this case the patient-zero is detected with probability 1 and the inferred parameters are good estimators of the true values, even up to a significant fraction of noise.

In order to quantify the performance of the new method, I built a simple comparison with a very simple procedure. Let us suppose that, given a graph, the distribution of number of infected I(λ, µ) and recovered R(λ, µ) individuals is known for each value of λ and µ. When confronted with a realization of the epidemic process, one may choose those λ, µ whose statistical features are closest (in some sense to be defined) to the one observed. In practice, I run 1000 random epidemics for each combination of values λ ∈ {0.05, 0.1, . . . , 0.95}, µ ∈ {0, 0.05, . . . , 1} and computed the mean of the number of infected Imean(λ, µ) and recovered Rmean(λ, µ) individuals. Given an observation with I

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 λ BP mean median 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 µ BP mean median

Figure 6.4.3. Comparison of inference of epidemic parameters for 200 random re- alizations with λ = 0.6, µ = 0.5 between BP and the naive method consisting in finding the couple (λ∗, µ) which is closest in terms of mean (resp. median) number

of infected and recovered individuals in euclidean distance. The distributions for the inference with BP correspond to the fifth example reported in Fig.6.4.1 and the first in Fig.6.4.2.

infected and R recovered individuals, the result of the inference is simply (λ∗, µ∗) = arg min

λ,µ(I− Imean(λ, µ))

2+ (R

− Rmean(λ, µ))2.

Fig. 6.4.3 shows the distributions of λ∗ and µfound by the above procedure based on 200 epidemic

realizations with λ = 0.6 and µ = 0.5, along by the same distribution as found by the interleaved BP gradient ascent of the likelihood function. The results show that the BP-based procedure is able to infer the correct parameter λ = 0.6 and µ = 0.5 with much higher accuracy. The same procedure using the median instead of the mean (and computing thus Imedianand Rmedian) yields very similar

6.5. SUMMARY 71

6.5. Summary

In this Chapter, I discussed the problem of inferring the origin of an epidemic propagation on a network from a single snapshot of its collective state, and described a generalization of a previously developed inference scheme to the more realistic scenario of noisy observations. As I pointed out in Section 6.3.2, the effectiveness of the proposed inference strategy is constrained by the amount of information available in the snaphost, which is generally a function of the delay time and the noise level.

Belief Propagation performs well even when observations are uncertain or completely confused, such as the case where one is unable to distinguish between observed states. When coupled to a gradient ascent procedure, BP equations provide a variational strategy for inferring epidemic parameters at the same time. The method described in this chapter has been presented in Ref. [85].

In the presence of multiple epidemic cascades on a given graph, the present approach can be extended to infer the infection probabilities of any putative link, providing an efficient method for reconstructing the entire network: this will be the subject of chapter 8. In the next chapter, I will discuss a further generalization of our inference method which is capable of dealing with real contact data in continuous time, without resorting to a time discretization.

CHAPTER 7

Inverse dynamics in continuous-time contact networks