Centralized Copula-based Sequential Probability Ratio Test

In this section, we consider the centralized sequential test. In this case, corrupted raw observations yli = zli+ wli, l = 1, 2, . . . , L, i = 1, 2, . . . are received at the FC sequentially. The

FC performs a copula-based SPRT to obtain the final decision. More specifically, we solve the following binary hypothesis testing problem at the FC:

H0 : yi ∼ f0cen(yi), i = 1, 2, . . . ,

H1 : yi ∼ f1cen(yi), i = 1, 2, . . . ,

(5.1) where yi = [y1i, . . . , yLi], f1cen and f0cen denote the joint PDFs for the centralized case under

alternative and null hypotheses, respectively. We take the existing dependence into account and design a copula-based SPRT at the FC.

Using Sklar’s theorem, i.e., Theorem 1.1 in Section 1.1.1, the joint PDFs in Equation (5.1) for i = 1, 2, . . . , n can be expressed in terms of the marginal distributions and the copula densities, ccen₁ and ccen₀ , respectively, under H1 and H0, as

f₀cen(y) = n Y i=1 L Y l=1 f_0,lcen(yli) × ccen0 (F 0,cen (yi)|φcen0 ), (5.2) f₁cen(y) = n Y i=1 L Y l=1

f_1,lcen(yli) × ccen1 (F1,cen(yi)|φcen1 ),

where fcen

k,l(yli) are the marginal PDFs and Fk,cen(yi) = [F k,cen

1 (y1i), . . . , F k,cen

L (yLi)] are the

marginal CDFs for all the sensors at time instant i under hypothesis k, k = 0, 1. Moreover, φcen₀ and φcen₁ are the parameters of copula ccen₀ and ccen₁ , respectively.

For known ccen₀ (·|φcen₀ ) and ccen 1 (·|φ

cen

1 ), the centralized copula-based SPRT at the FC fol-

lows the following procedure: for n = 1, 2, . . .,               

Λn,cen_{(y) ≥ A,} _{decide H} 1,

Λn,cen_{(y) ≤ −B,} _{decide H} 0,

−B < Λn,cen_{(y) < A,} _{take another observation,}

(5.3)

constants such that PF ≤ α and PM ≤ β. Also, Λn,cen(y) is given as Λn,cen(y) = n X i=1 L X l=1 logf cen 1,l(yli) fcen 0,l(yli) + n X i=1 logc cen 1 (F1,cen(yi)|φcen1 ) ccen 0 (F0,cen(yi)|φcen0 ) . (5.4)

In general, for given α and β, exact analytical expressions of the optimal thresholds A and −B are intractable. One may use the approximated thresholds obtained from Wald’s SPRT [107], which are given by

A ≈ log1 − β

α , −B ≈ log β

1 − α, (5.5) where if α, β ∈ (0,1₂), we have −B < A.

Let N be the stopping time for the centralized scheme. Since the messages received at the FC are i.i.d., we have P (N < ∞|Hk) = 1, k = 0, 1 under condition A1 [99, Lemma 3.1.1].

The goal for the above centralized copula-based SPRT is to minimize the average stopping time Ek[N ], k = 0, 1 such that PF ≤ α, PM ≤ β. In Theorem 5.1, we show the asymptotic

optimality of the centralized copula-based SPRT as A, B → ∞. Note that in this work, we assume that when the test stops, we have Λn,cen_{(y) = A or Λ}n,cen_{(y) = −B by ignoring the}

overshoots. Therefore, we use ≈ while evaluating PF, PM and the expected stopping time

EHk[N ], k = 0, 1 in Theorems 5.1 and 5.3.

Theorem 5.1. For the centralized copula-based SPRT in Equation (5.3), as A → ∞ and B → ∞, we have PF ≈ e−AandPM ≈ e−B. Moreover, the average stopping times under the

two hypotheses admit the following asymptotic expressions: EH0[N ] ≈ B Dcen 0 , _EH1[N ] ≈ A Dcen 1 , (5.6)

asA, B → ∞, and where D₀cen=

l=1

D f_0,lcen(·)||f_1,lcen(·) + D (ccen₀ (·|φcen₀ )||ccen₁ (·|φcen₁ )) , D₁cen=

l=1

D f_1,lcen(·)||f_0,lcen(·) + D (ccen₁ (·|φcen₁ )||ccen₀ (·|φcen₀ )) .

Proof: See Appendix C. Remark 5.4. The asymptotic performance obtained in Equation (5.6) shows that the average detection time depends on the KL distance provided by each sensor as well as the KL distance due to the spatial dependence amongL sensors. By including the spatial dependence in our analysis, we can reduce the detection time on an average.

Typically, the copula density function ccen_k (·|φcen_k ) and its corresponding parameter set φcen_k under hypothesis Hk, k = 0, 1 are not known and need to be estimated. Using maximum

likelihood estimates in place of the true copula density functions and the true parameters, the centralized copula-based SPRT in Equation (5.3) becomes a generalized copula-based SPRT. In the following, we present the estimation of the best copula model.

Since the FC has no knowledge of the dependence structure of the received messages, we assume that the FC waits for N0messages before starting the copula-based SPRT. Note that N0

can determined by the goodness-of-fit tests for copula models [36]. Hence, the copula density functions and their corresponding parameters can be estimated. The estimation of optimal copula density functions under the two hypotheses is similar. Therefore, the hypothesis index k will be omitted for now to simplify notations. Also, the superscript “cen" for the centralized sequential test is omitted for now. Note that the marginal CDFs at each time instant i are needed to evaluate the copula log likelihood ratios as shown in Equation (5.4).

We apply R-Vine copula to estimate the multivariate copula function ccen k (·|φ

cen

k ) under hy-

the R-Vine tree structure, the choice of bivariate copulas and the estimation of their corresponding parameters. To select the optimal R-Vine tree structure, we adopt the sequential maximum spanning tree algorithm in [27]. This sequential method is based on Kendall’s τ . The sample estimate of Kendall’s τ , for N0 observations, can be calculated using Equation (1.10).

The estimation of the R-Vine copula model is presented in Algorithm 2.1 with m = L and data sequence (y1j, . . . , yLj), j = 1, 2, . . . , N0. The main computational complexity of this

algorithm is from the fit of the bivariate copulas. For a d-dimensional system, the number of bivariate copulas needed to fit in the tree is d(d−1)/2. Therefore, the computational complexity is O(d2). In the following, we describe the sequential estimation in detail. The sequential method starts with the selection of the first tree T1and continues tree by tree up to the last tree

TL−1. The trees are selected in a way that the chosen bivariate copula models the strongest

pair-wise dependencies present which are characterized by Kendall’s τ .

After the R-Vine tree structure is obtained, we need to define a bivariate copula set and estimate the optimal bivariate copulas that best characterize the pair-wise dependencies. Con- sider a library of copulas, C = {cm : m = 1, . . . , M }. Before estimating the optimal bivariate

copula, the copula parameter set φ is first obtained using MLE as ˆ φ = arg max φ N0 X i=1 log c ˆFl1(yl1i), ˆFl2(yl2i)|φ , (5.7) where (l1, l2), l1, l2 ∈ [L] is a connected pair in the selected R-Vine trees and for simplification

of notation, we omit the conditioning elements for conditional marginal CDFs. Note that the conditional marginal CDFs can be obtained recursively using Equation (1.12).

Once the copula parameter set is obtained, the best copula c∗ is selected from the copula library C using the Akaike Information Criterion (AIC) [3] as the criterion, given as

AICm = − N0 X i=1 log cm ˆFl1(yl1i), ˆFl2(yl2i)| ˆφm + 2q(m), (5.8)

where q(m) is the number of parameters in the mth copula model. Also, the conditioning elements for conditional marginal CDFs are omitted.

The best copula c∗ is

c∗ = arg min

cm∈C

AICm. (5.9)

Now, we have the optimal copula density functions ccen,∗₁ (·| ˆφcen,∗₁ ) and ccen,∗₀ (·| ˆφcen,∗₀ ) under the alternative and null hypotheses, respectively. The log-likelihood ratio test statistics Λn,cen(y) in Equation (5.4) now becomes

Λn,cen(y) = n X i=1 L X l=1 log f cen 1,l(yli) fcen 0,l(yli) + n X i=1 logc cen,∗ 1 (F1,cen(yi)| ˆφ cen,∗ 1 ) ccen,∗₀ (F0,cen_(y i)| ˆφ cen,∗ 0 ) . (5.10)

Note that the centralized copula-based SPRT incurs substantial data transmission overhead between the sensors and the FC. In the following section, we proceed to develop the distributed copula-based SPRT.

5.4 Distributed Copula-based Sequential Probability Ra-

In document Copula-based Multimodal Data Fusion for Inference with Dependent Observations (Page 88-93)