arXiv:1208.4161v1 [cs.IT] 21 Aug 2012
Robust Distributed Maximum Likelihood Estimation with Quantized Data
Xiaojing Shen, Member, IEEE, Pramod K. Varshney, Fellow, IEEE, Yunmin Zhu∗†
August 22, 2012
Abstract
In this paper, distributed maximum likelihood estimation (MLE) with quantized data is considered under the assumption that the structure of the joint probability density function (pdf) is known, but it contains unknown deterministic parameters. The parameters may include different vector parameters corresponding to marginal pdfs and parameters that describe dependence of observations across sensors.
We first discuss the regularity conditions which should be satisfied by the pdf and vector quantizers such that the MLE with quantized data is asymptotically efficient. Then, the relationship between the asymptotic variance of the MLE and the number of quantization bits is analytically derived. Since the optimal MLE scheme based on quantized data cannot be obtained when the joint pdf of observations is not completely known, a robust distributed MLE scheme is designed for a fixed number of quantization bits. Its asymptotic efficiency is proved under some regularity conditions and the asymptotic variance is derived so that the robustness can be analytically assessed. A numerical example with a bivariate Gaussian pdf is considered. Simulations show that the robustness of the proposed MLE scheme.
keywords: Maximum likelihood estimation; distributed estimation; quantized data; asymptotic variance;
Fisher information matrix; wireless sensor networks
1 Introduction
Wireless sensor networks have attracted much attention and have become a fast-growing research area over the past years. Many advances have been made in distributed detection, estimation, tracking and control
∗This work was supported in part by U.S. Air Force Office of Scientific Research (AFOSR) under Grants FA9550-10-1-0263 and FA9550-10-1-0458 and in part by the NNSF of China (# 61004138 and 61273074).
†Xiaojing Shen and Pramod K. Varshney (corresponding author, [email protected]) are with the Department of Electrical Engi- neering and Computer Science, Syracuse University, NY, 13244, USA. Yunmin Zhu is with Department of Mathematics, Sichuan University, Chengdu, Sichuan 610064, China. Xiaojing Shen ([email protected]) is on leave from Department of Mathematics, Sichuan University, Chengdu, Sichuan 610064, China.
(see e.g., [1, 2, 3, 4, 5, 6] and references therein). Due to limited communication bandwidth and energy, the sensors usually quantize the measurements and send them to a fusion center which makes the final inference for decision, estimation, tracking or control tasks. The focus of this paper is on robust distributed MLE with quantized data. Distributed estimation and quantization problems have been considered in a number of studies. The parameters to be estimated are modeled as random and deterministic in different situations.
For random parameters, there exist various prior studies under the assumption of known joint pdf of pa- rameters and sensor measurements (see [7, 8, 9, 10, 11, 12]). The work in [9] presented necessary conditions for optimal quantizers and optimal estimation. Optimum quantizers are difficult to obtain because coupled nonlinear equations need to be solved. The optimum estimator takes the form of a conditional mean that is usually hard to compute. To simplify computation, an efficient vector quantization algorithm based on the best linear unbiased estimator was proposed in [10, 11]. The convergence of the algorithm in [11] is guaran- teed. The authors of [12] proposed a quantization approach where only a training sequence is available.
For deterministic parameters, several universal distributed estimation schemes have been proposed [13, 14, 15] in the presence of unknown, additive sensor noises that are bounded and identically distributed. These universal distributed estimation schemes have a low bandwidth requirement. The work in [16, 17, 18, 19, 20, 21] addressed various design and implementation issues under the assumption of a scalar parameter and us- ing scalar quantizers. An approach consisting of multiple non-identical thresholds is employed in [16, 17].
The authors of [17] studied estimation of a scalar mean location parameter in the presence of zero-mean additive white Gaussian noise. Methods to design quantizers by minimizing the worst case performance in bounded parameter sets were proposed [18]. In [19], the performance limit that a distributed estimation scheme with identical quantizers can achieve was found as well as the set of optimal noise distribution func- tions and quantizers. In [20, 21], a quantization approach that adaptively adjusts the thresholds from sensor to sensor was proposed. The work of [22, 23, 24] proposed vector quantization design for distributed estimation under the assumption of additive observation noise model. The authors of [22] proposed a hyperplane-based heuristic approach, where the vector quantization problem can be converted to scalar quantization problems.
In [24], a class of hyperplane-based vector quantizers was proposed which linearly convert the observation vector into a scalar by using a compression vector and then carry out scalar quantization.
When the structure of pdf is known, in previous works, the MLE with quantized data is extensively used to estimate the deterministic parameters. In this paper, robust distributed MLE with quantized data is consid- ered. Our work differs from previous studies in several aspects. Prior results concentrate on the problem of how to design the quantization schemes for estimating a deterministic parameter where each sensor makes one noisy observation. The observations are usually assumed independent across sensors, and then discuss the relationship between MLE performance and the number of sensors. Here, we focus on the problem of how to design estimation schemes for the unknown parameter vector corresponding to the joint pdf of the observations where the number of sensors is fixed. These observations may be dependent across sensors. The unknown parameters may include different vector parameters corresponding to marginal pdfs and parameters that describe dependence of observations across sensors. Actually, the dependence between sensors is very
important in multisensor fusion systems, for example, see the recent work on distributed Neyman-Pearson detection fusion, hypothesis testing using heterogeneous data, distributed location estimation with dependent sensor observations [25, 26, 27]. We also derive the relationship between MLE performance, number of quantization bits and number of observations. It is worth noting that our work neither requires the knowledge of observation models, assumptions of Gaussianity and independence of noises across sensors nor requires scalar quantizers and scalar estimated parameters.
In this paper, we first determine the regularity conditions which should be satisfied by the joint pdf and quantizers such that the MLE with quantized data is asymptotically efficient. Then, the relationship between the asymptotic variance of MLE and the number of quantization bits is analytically derived. We shall prove that the asymptotic variance of MLE with quantized data is monotone decreasing with the number of quantization bits and has a lower limit, which is equal to the asymptotic variance of MLE with raw measurements. When the number of quantization bits is given, a robust distributed MLE scheme is designed by employingJ different quantizers. Its asymptotic efficiency is proved under some regularity conditions and the asymptotic variance is derived to be the inverse of a convex linear combination of Fisher information matrices based onJ different quantizers. Thus, the robustness can be analytically verified. A numerical example with a bivariate Gaussian pdf with an unknown parameter vector is considered. Simulations show that the new MLE scheme is robust and much better than that based on the worst quantization scheme from among the groups of quantizers. Another interesting phenomenon is that the asymptotic variance of the estimates of parameters of marginal pdfs is almost independent of the dependence between the sensors.
The rest of the paper is organized as follows. Problem formulation is given in Section 2. In Section 3, the performance analysis of MLE with quantized data is given. The robust MLE scheme is proposed and the asymptotic results are derived. In Section 4, numerical examples are given and discussed. In Section 5, conclusions are made.
2 Problem formulation
The basicL-sensor distributed estimation system is considered (see Figure 1). Each sensor has ki-dimensional observation populationYi,i = 1, . . . , L. Suppose that the joint observation population Y , (Y1′, . . . , YL′)′ has a given family of joint pdf:
{p(y1, . . . , yL|θ)}θ∈Θ⊆Rk (1)
where′denotes the transpose andθ is the unknown k-dimensional deterministic parameter vector which may include marginal parameters and dependence parameters. Here, we do not assume independence across sen- sors, knowledge of measurement models and Gaussianity of joint pdf. LetN independently and identically distributed (i.i.d.) sensor observation samples and joint observation samples be
Y~i = (Yi1, . . . , YiN), i = 1, . . . , L; (2)
Y~ = (~Y1′, . . . , ~YL′)′. (3) Suppose the sensors and the fusion center wish to jointly estimate the unknown parameter vectorθ based on
Sensor 2 Population: Y2∈ Rk2 Samples:Y2n, n=1…,N
Sensor L Population: YL∈ RkL Samples:YLn,n=1,…,N
r2−bit Quantizer
I21(y2),…,I2 r2(y2) r1−bit Quantizer
I1 1(y
1),…,I1 r1(y 1)
Fusion center:
θest = MLE(U in, i=1,…,L,n=1,…,N)
rL−bit Quantizer
IL1(yL),…,IL rL(yL) Sensor 1
Population: Y 1∈ Rk1 Samples:Y1n, n=1…,N
. . . . . .
. . . . . . .
. . . . .
Quantized samples:
U1n=(U1n1,…,U1n r1), n=1,…,N
Quantized samples:
U2n=U 2n
1,…,U2n r2, n=1,…,N
Quantized samples:
U2n=U 2n
1,…,U2n r2, n=1,…,N
Figure 1: Distributed MLE fusion system with quantized data
the spatially distributed observations. If there is sufficient communication bandwidth and power, the fusion center can obtain asymptotically efficient estimates with the complete observation samples based on MLE procedure under some regularity conditions on the joint pdf.
In many practical situations, however, to reduce the communication requirement from sensors to the fusion center due to limited communication bandwidth and power, thei-th sensor quantizes the observation vector toribits (ri ≥ 1) by rimeasurable indicator quantization functions:
Ii1(yi) : yi∈ Rki → {0, 1}; . . . ; Iiri(yi) : yi∈ Rki→ {0, 1}, (4) fori = 1, . . . , L. Here, each quantizer Iit(yi) may have one or multiple thresholds; and its 0/1 quantization region may be a continuous region or union of discontinuous regions. Moreover, we denote by
I(y|r) , (I1(y1)′, . . . , IL(yL)′)′ ∈ Rr, r =
L
X
i=1
ri, (5)
where
Ii(yi), (Ii1(yi), . . . , Iiri(yi))′, i = 1, . . . , L, (6) andr is the total number of bits available to transmit observations from the sensors to the fusion center.
Once theri-bit binary quantized samplesIi(Yin), n = 1, . . . , N are generated at sensor i, i = 1, . . . , L, they are transmitted to the fusion center. The fusion center is then required to estimate the true parameter vectorθ∗with the quantized data. Usually, the MLE is used to estimate the parameter vector by maximizing the log likelihood function
l(θ|~Y ) , log P (I(~Y |r)|θ)
= log
N
Y
n=1
P (I1(Y1n), . . . , IL(YLn)|θ)
=
N
X
n=1
log P (I1(Y1n), . . . , IL(YLn)|θ)
=
N
X
n=1
log Z
{I1(y1)=I1(Y1n),...,IL(yL)=IL(YLn)}
p(y1, . . . , yL|θ)dy1. . . dyL. (7)
In the framework of such a distributed multisensor fusion system, one may question whether this method can estimate the parameters efficiently, since raw measurementsyimay be compressed to as low as 1-bitIi(yi), i = 1, . . . , L so that a lot of information can be lost. Indeed, there are examples where MLE with quantized data cannot estimate parameters well, which will be given in Remark 3.3. On the other hand, there are also examples where MLE method with quantized data yields good estimation results (see, e.g., [25, 27]). Thus, in the present paper, we shall concentrate on analytically investigating the following basic problems:
• What conditions should the pdf p(y1, . . . , yL|θ) and the quantizers I(y|r) satisfy to guarantee that MLE with quantized data will be an asymptotically efficient estimator?
• What is the relationship between the asymptotic variance of MLE and the total number of quantization bitsr?
• How should I(y|r) be designed to derive a robust and asymptotically efficient MLE for a given number of bitsr?
3 Performance Analysis of Maximum Likelihood Estimation with Quantized Data
3.1 Asymptotic efficiency of maximum likelihood estimation with quantized data By the definition of observation samples and quantizers, we define
U~ , (~U1′, . . . , ~UL′)′, (8)
U~i , (Ui1, . . . , UiN)′, i = 1, . . . , L, (9)
Uin , (Uin1, . . . , Uinri), n = 1, . . . , N, (10) Uin1 , Ii1(Yin), . . . , Uinri , Iiri(Yin). (11) If we take ~U as the joint quantized observation samples and denote the quantized observation population by U , I(Y |r) = (I1(Y1), . . ., IL(YL))′, we know thatU has a discrete/categorical distribution. Based on the pdf ofY and quantizers I(y|r), the probability mass function (pmf) of the quantized observation population U is
fU(u1, u2, . . . , uL|θ) = Pu1,u2,...,uL for U = (u1, u2, . . . , uL), (12) where
(u1, u2, . . . , uL) ∈ Su = {(u1, u2, . . . , uL) ∈ Rr:
uiis ari-dimensional binary row vector, i = 1, . . . , L, r =
L
X
i=1
ri}, (13)
Pu1,u2,...,uL = Z
Ξ(u1,u2,...,uL)
p(y1, y2, . . . , yL|θ)dy1dy2. . . dyL, (14)
Ξ(u1,u2,...,uL) = {(y1, y2, . . . , yL) : I1(y1) = u1, I2(y2) = u2, . . . , IL(yL) = uL}. (15) Note thatfU(u1, u2, . . . , uL|θ) is determined by p(y1, y2, . . . , yL|θ) and sensor quantizers I1(y1), . . ., IL(yL).
Thus, the quantized observation populationU has a family of joint pmf {fU(u1, u2, . . . , uL|θ)}θ∈Θ⊆Rk
which yields the following log likelihood function of samples ~U by (7), (12)-(15):
l(θ|~U ) , log
N
Y
n=1
fU(U1n, U2n, . . . , ULn|θ) (16)
=
N
X
n=1
log Z
{I1(y1)=U1n,...,IL(yL)=ULn}
p(y1, . . . , yL|θ)dy1. . . dyL
=
N
X
n=1
log Z
{I1(y1)=I1(Y1n),...,IL(yL)=IL(YLn)}
p(y1, . . . , yL|θ)dy1. . . dyL (17)
= l(θ|~Y ). (18)
Therefore, MLE with quantized data for the family{p(y1, . . . , yL|θ)}θ∈Θ⊆Rk is equivalent to MLE for the family{fU(u1, u2, . . . , uL|θ)}θ∈Θ⊆Rk. Therefore, if MLE for the latter is an asymptotically efficient esti- mator, then MLE for the former is also asymptotically efficient. Thus, we only provide the conditions for
p(y1, y2, . . . , yL|θ) and sensor quantizers I1(y1), . . ., IL(yL) such that the classical regularity conditions for fU(u1, u2, . . . , uL|θ) are satisfied, then asymptotic efficiency of MLE with quantized data can be guaranteed.
Moreover, by (16) and (13), the log likelihood function is also equal to l(θ|~U ) =
N
X
n=1
log fU(U1n, U2n, . . . , ULn|θ)
=
2r
X
j=1
njlog fU(~uj|θ) (19)
wherenj = #{(U1n, U2n, . . . , ULn) = ~uj ∈ Su, n = 1, . . . , N },P2r
j=1nj = N ; #{·} is the cardinality of the set. The parameter vectorθ is estimated by maximizing the log likelihood function (19) or equivalently solving the equation:
∂
∂θl(θ|~U ) = 0. (20)
Based on the classical asymptotic properties of MLE (see, e.g., [28, 29]), we have the following two lemmas.
Lemma 3.1. Assume thatp(y1, y2, . . . , yL|θ) and sensor quantizers I1(y1), . . ., IL(yL) generate the quan- tized samples andfU(u1, u2, . . . , uL|θ) satisfies the regularity conditions:
(C1) Ii(Yi1), . . . , Ii(YiN) are i.i.d., i = 1, . . . , L;
(C2) The model parameter is identifiable:
θ 6= θ′⇔ fU(u1, u2, . . . , uL|θ) 6= fU(u1, u2, . . . , uL|θ′); (21) (C3) fU(u1, u2, . . . , uL|θ) is differentiable with respect to parameter vector θ;
(C4) The parameter space contains an open set of which the true parameterθ∗is an interior point.
Then, Equation (20) has a solution denoted by ˆθ which is a consistent estimator of θ∗, i.e., ˆθ → θ∗, a.s.
The next three conditions and (C1)–(C4) are sufficient to guarantee asymptotic normality and efficiency of MLE.
Lemma 3.2. Assume thatp(y1, y2, . . . , yL|θ) and sensor quantizers I1(y1), . . ., IL(yL) generate the quan- tized samples andfU(u1, u2, . . . , uL|θ) satisfies the regularity conditions (C1)–(C4) and
(C5) For every (u1, . . . , uL) ∈ S, fU(u1, u2, . . . , uL|θ) is three times differentiable with respect to pa- rameter vector θ, andR fU(u1, u2, . . . , uL|θ)du1. . . duL can be differentiated three times under the integral sign;
(C6) For anyθ∗in the parameter space, there exists a positive numberǫ and a function M (u1, . . . , uL) such that
∂3
∂θiθjθtlog fU(u1, u2, . . . , uL|θ)
≤ M(u1, . . . , uL) for all (u1, . . . , uL) ∈ S,
i, j, t = 1, . . . , k, (22) θ ∈ ǫ neighborhood of θ∗, withEθ∗[M (U1, . . . , UL)] < ∞;
(C7) The Fisher information matrix exists and is nonsingular:
I(θ∗, I(·|r)) = Eθ∗
∂ log(fU(U1, U2, . . . , UL|θ))
∂θ ×∂ log(fU(U1, U2, . . . , UL|θ))
∂θ
′
≻ 0. (23)
Then,
√N (ˆθ − θ∗) −→ N(0, I−1(θ∗, I(·|r))) (24)
whereI−1(θ∗, I(·|r)) is the Cram´er-Rao lower bound for one quantized sample which depends on the form and number of bits of the quantizerI(y|r). That is, ˆθ is a consistent and asymptotically efficient estimator of θ∗.
Remark 3.3. Note that the identifiability offU(u1, u2, . . . , uL|θ) implies that p(y1, y2, . . . , yL|θ) is identifi- able. If not, there areθ 6= θ′ such thatp(y1, y2, . . . , yL|θ) = p(y1, y2, . . . , yL|θ′) so that, by (12) and (14), fU(u1, u2, . . . , uL|θ) = fU(u1, u2, . . . , uL|θ′) which yields a contradiction. On the contrary, the identifia- bilityp(y1, y2, . . . , yL|θ) does not imply that fU(u1, u2, . . . , uL|θ) is identifiable. For example, let
(Y1, Y2) ∼ N "
0 0
# ,
"
1 θ θ 1
#!
, |θ| < 1
and
I1(y1) = I2(y2) = I(x) =
1, ifx ≥ 0, 0, ifx < 0.
We know that the normal pdfp(y1, y2|θ) is identifiable. However, by (12), the pdf of U = (I1(Y1), I2(Y2)) is
fU(u1, u2|θ) ≡ 1/4, for all |θ| < 1,
which is not identifiable. In this case, we would not be able to distinguish between two parameters even with an infinite amount of data by MLE. For the above example, however, it is easy to makefU(u1, u2|θ) identifiable by choosing unsymmetricI1(y1) and I2(y2).
3.2 MLE with multiple bit quantized data and Cram´er-Rao lower bound
In this subsection, the relationship between Cram´er-Rao lower bound (i.e., asymptotic variance of MLE with quantized data) and the number of bits for quantization is analytically determined. We first derive a useful lemma.
Lemma 3.4. If b= [b1, b2, . . . , bk]′, c= [c1, c2, . . . , ck]′, a= b+c and g = d+e, d, e are positive constants, then
aa′ g bb′
d +cc′
e (25)
where the equality is obtained at bd = ce.
Proof. Since(cd − be)(cd − be)′ is a non-negative definite matrix, we have (cd − be)(cd − be)′ 0
⇔ cc′d2+ bb′e2 cb′de + bc′de
⇔ cc′d2+ bb′e2+ bb′de + cc′de cb′de + bc′de + bb′de + cc′de
⇔ (bb′e + cc′d)(d + e) (b + c)(b + c)′de
⇔ aa′
d + e bb′ d +cc′
e .
Theorem 3.5. If we denote the Fisher information matrix with one raw measurement by I(θ), then the relationship between the number of bits and the Cram´er-Rao lower bound or the asymptotic variance of MLE with quantized data satisfy
I−1(θ∗, I(·|r)) I−1(θ∗, I(·|r + 1)), (26) and
r→∞lim I−1(θ∗, I(·|r)) = I(θ)−1, (27)
wherer → ∞ means ri→ ∞, i = 1, . . . , L. The inequality in terms of Fisher information Matrices can be obtained from (26) by taking the inverses and flipping the direction of inequality.
Proof. We denote ther-bit and (r + 1)-bit quantized observation population by UrandUr+1 respectively.
The pmfs ofUr and Ur+1 are defined in a similar manner by (12) respectively. Here, they are denoted by fUr(u1, . . . , ur|θ) and fUr+1(u1, . . . , ur, ur+1|θ) respectively, where ui = 0/1, i = 1, . . . , r + 1. Thus, we have
fUr(u1, . . . , ur|θ) = fUr+1(u1, . . . , ur, 0|θ) + fUr+1(u1, . . . , ur, 1|θ). (28) Moreover,
∂
∂θifUr(u1, . . . , ur|θ) = ∂
∂θifUr+1(u1, . . . , ur, 0|θ) + ∂
∂θifUr+1(u1, . . . , ur, 1|θ), (29) fori = 1, . . . , k, where k is the dimension of parameter vector θ.
The Fisher information matrices forUrandUr+1can be derived respectively as follows.
I(θ∗, I(·|r))
= E
" ∂
∂θ1fUr(Ur) fUr(Ur) , . . . ,
∂
∂θkfUr(Ur) fUr(Ur)
!′ ∂
∂θ1fUr(Ur) fUr(Ur) , . . . ,
∂
∂θkfUr(Ur) fUr(Ur)
!#
= X
ui=0/1,i=1,...,r
au1,...,ur· a′u1,...,ur
gu1,...,ur , (30)
where
au1,...,ur ,
∂
∂θ1fUr(u1, . . . , ur|θ), . . . , ∂
∂θkfUr(u1, . . . , ur|θ)
′
, (31)
gu1,...,ur , fUr(u1, . . . , ur|θ); (32)
I(θ∗, I(·|r + 1))
= E
" ∂
∂θ1fUr+1(Ur+1) fUr+1(Ur+1) , . . . ,
∂
∂θkfUr+1(Ur+1) fUr+1(Ur+1)
!′ ∂
∂θ1fUr+1(Ur+1) fUr+1(Ur+1) , . . . ,
∂
∂θkfUr+1(Ur+1) fUr+1(Ur+1)
!#
= X
ui=0/1,i=1,...,r
bu1,...,ur,0· b′u1,...,ur,0
du1,...,ur,0
+cu1,...,ur,1· c′u1,...,ur,1
eu1,...,ur,1
, (33)
where
bu1,...,ur,0,
∂
∂θ1
fUr+1(u1, . . . , ur, 0|θ), . . . , ∂
∂θkfUr+1(u1, . . . , ur, 0|θ)
′
, (34)
cu1,...,ur,1,
∂
∂θ1fUr+1(u1, . . . , ur, 1|θ), . . . , ∂
∂θkfUr+1(u1, . . . , ur, 1|θ)
′
, (35)
du1,...,ur,0, fUr+1(u1, . . . , ur, 0|θ), eu1,...,ur,1 , fUr+1(u1, . . . , ur, 1|θ). (36) By Lemma 3.4 and Equations (28)-(36), we have
I(θ∗, I(·|r)) I(θ∗, I(·|r + 1)).
Thus,
I−1(θ∗, I(·|r)) I−1(θ∗, I(·|r + 1)).
Furthermore, quantization using r bits means that the measurement space is divided into 2r regions.
When r is large enough, for notational simplicity, we consider R2 to be divided into 2r regions: ∆1 = (−∞, −Mr]×R, ∆2 = (−Mr, +∞]×R, ∆3 = (−Mr, Mr]×(−∞, −Mr], ∆4 = (−Mr, Mr]×(Mr, +∞], and the square(−Mr, Mr] × (−Mr, Mr] is further divided into 2r− 4 small regions ∆i, i = 5, 6, . . . , 2r whereMr is large enough. By the definition ofI(θ∗, I(·|r)) (23), the mean value theorem for integration and existence ofI−1(θ), we have
r→∞lim I(θ∗, I(·|r)) = lim
r→∞
2r
X
i=1
1 R
∆ip(y|θ)dy( ∂
∂θ1 Z
∆i
p(y|θ)dy, . . . , ∂
∂θk Z
∆i
p(y|θ)dy)′
· ( ∂
∂θ1 Z
∆i
p(y|θ)dy, . . . , ∂
∂θk Z
∆i
p(y|θ)dy)
= lim
r→∞
2r
X
i=1
1 R
∆ip(y|θ)dy( Z
∆i
∂
∂θ1p(y|θ)dy, . . . , Z
∆i
∂
∂θkp(y|θ)dy)′
· ( Z
∆i
∂
∂θ1p(y|θ)dy, . . . , Z
∆i
∂
∂θkp(y|θ)dy)
= lim
r→∞
2r
X
i=5
1 p(yi|θ)∆i
( ∂
∂θ1p(yi|θ)∆i, . . . , ∂
∂θkp(yi|θ)∆i)′
· ( ∂
∂θ1p(yi|θ)∆i, . . . , ∂
∂θkp(yi|θ)∆i) + 4o(ǫr) (yi ∈ ∆i, ǫr→ 0)
= lim
r→∞
2r
X
i=5
1 p(yi|θ)( ∂
∂θ1p(yi|θ), . . . , ∂
∂θkp(yi|θ))′
· ( ∂
∂θ1p(yi|θ), . . . , ∂
∂θkp(yi|θ))∆i+ 4o(ǫr)
= Z
R2
1 p(y|θ)( ∂
∂θ1p(y|θ), . . . , ∂
∂θkp(y|θ))′
· ( ∂
∂θ1p(y|θ), . . . , ∂
∂θkp(y|θ))dy
= I(θ)
For high dimensional caseRm, m > 2, the proof is similar. Therefore, we have the theorem.
3.3 Robust maximum likelihood estimation with quantized data
When the number of bits ri for each sensor is given, a natural question is how should quantizers I(·|r) be designed such that the asymptotic variance I−1(θ∗, I(·|r)) of MLE with quantized data is as small as possible. The true parameter θ∗, however, is not known, i.e. the pdf is not known. To the best of our knowledge, most of the existing work on optimal quantizer depends on the pdf or signal models. When both of them are not known, the optimal quantizer design cannot be derived in general. Actually, in the framework of this paper, we do not assume the Gaussian pdf and knowledge of measurement models. Thus, it is impossible to derive the optimal quantizer. However, we can choose multiple groups of quantizers to fend off against the risk of a poor quantizer. Therefore, we have following robust MLE design scheme.
For notation simplicity, letri= 1, i = 1, . . . , L and r = L
1. ChooseJ groups of different quantizers (see Remark 3.7 for a discussion on the choice of J groups of quantizers)
I(j)(y), (I1(j)(y1), . . . , IL(j)(yL))′ ∈ RL, j = 1, . . . , J, (37) where
Ii(j)(yi) : yi ∈ Rki → {0, 1}, i = 1, . . . , L. (38) 2. ObserveNj joint observation samples{(Y1nj, . . . , YLnj)}Nnjj=1which are quantized by thej-th group of quantizers for j = 1 . . . , J. We denote by N , PJj=1Nj. The quantized observation samples {(I1(j)(Y1nj), . . . , IL(j)(YLnj))}nNjj=1 are denoted by{(U1n(j)j, . . . , ULn(j)
j)}Nnjj=1. Moreover, we denote by ~U(j) , {(U1n(j)j, . . . , ULn(j)
j)}Nnjj=1. The corresponding quantized observation population based on I(j)(y) is denoted by U(j)whose pmf is
fU(j)(u1, u2, . . . , uL|θ), j = 1, . . . , J, (39) which can be similarly obtained by (12).
3. Estimate the parameter θ with the J groups of quantized samples by maximizing the log likelihood function:
l(θ|~U(1), . . . , ~U(J)) = log
J
Y
j=1 Nj
Y
nj=1
fU(j)(U1n(j)
j, U2n(j)
j, . . . , ULn(j)
j|θ) (40)
=
J
X
j=1 Nj
X
nj=1
log fU(j)(U1n(j)
j, U2n(j)
j, . . . , ULn(j)
j|θ)
=
J
X
j=1 Nj
X
nj=1
log Z
{I1(j)(y1)=U1nj(j),...,IL(j)(yL)=ULnj(j) }
p(y1, . . . , yL|θ)dy1. . . dyL
(41) ,
J
X
j=1
l(θ|~U(j)), (42)
wherel(θ|~U(j)) is the log likelihood function of the j-th group of quantized data ~U(j). Equivalently, we solve the equation:
∂
∂θl(θ|~U(1), . . . , ~U(J)) = 0, (43) whose solution is called robust MLE and denoted by ˆθR.
Obviously, theN quantized samples are not identically distributed due to the J different quantizers. One may question whether the new estimator based on the different quantizers is still asymptotically efficient?
What is the asymptotic variance of the new estimator? Why is it robust compared to using one group of quantizers? Actually, these questions can be analytically answered by the following Theorem.
Theorem 3.6. There are J groups of different sensor quantizers I(j)(y) defined by (37), j = 1, . . . , J.
Assume thatp(y1, y2, . . . , yL|θ) and quantizers I(j)(y) generate the quantized samples and the quantized pmffU(j)(u1, u2, . . . , uL|θ) defined by (39) satisfies the regularity conditions (C1)–(C7). Then,
√N (ˆθR− θ∗) −→ N(0, I−1(θ∗; I(1)(·), . . . , I(J)(·)) (44)
whereN =PJ
j=1Nj, Nj → ∞, j = 1, . . . , J,
I−1(θ∗; I(1)(·), . . . , I(J)(·)) ,
J
X
j=1
Nj
N I(θ∗; I(j)(·))
−1
(45)
= N
J
X
j=1
NjI(θ∗; I(j)(·))
−1
. (46)
PJ
j=1NjI(θ∗; I(j)(·))−1
is the Cram´er-Rao lower bound forN quantized samples, where I(θ∗; I(j)(·)) is the Fisher information matrix for one quantized sample ofU(j). That is, ˆθRis a consistent and asymptoti- cally efficient estimator ofθ∗.
Moreover, without loss of generality, we assume that the first group of quantizers corresponds to the worst asymptotic variance
I−1(θ∗; I(1)(·)) I−1(θ∗; I(j)(·)), j = 2, . . . , J, (47)
then
I−1(θ∗; I(1)(·), . . . , I(J)(·)) I−1(θ∗; I(1)(·)). (48) That is, ˆθRis a robust estimator ofθ∗.
Proof. From Lemma 3.1, the regularity of p(y1, y2, . . . , yL|θ) and quantizers I(j)(y) ensures that the quantized samples and the corresponding pmffU(j)(u1, u2, . . . , uL|θ) defined by (39) satisfy the regularity conditions (C1)–(C4), it is easy to prove that Equation (43) has a solution denoted by ˆθRwhich is a consistent estimator ofθ∗, i.e., ˆθR→ θ∗, a.s.. The proof is similar to that of Lemma 3.1 (see, e.g., [28, 29]). However, N quantized samples are independent but not identically distributed due to the use of different quantizers.
Thus, to prove the asymptotic normality, we will use Lyapunov central limit theorem by checking the Lya- punov condition (see, e.g., [29]). Simultaneously, the Cram´er-Wold device (see, e.g., [29]) will be used to deal with the high dimensional estimated parameters.
First, we expand the first derivative of the log likelihood function (40) around the true valueθ∗,
∂l(θ|~U(1), . . . , ~U(J))
∂θ = ∂l(θ|~U(1), . . . , ~U(J))
∂θ
θ∗
+ ∂2l(θ|~U(1), . . . , ~U(J))
∂θ2
θ∗
(θ − θ∗) + · · · (49)
where higher-order terms can be ignored under regularity conditions (see e.g. [28]). Substitute ˆθRforθ and by (43), we have
0 = ∂l(θ|~U(1), . . . , ~U(J))
∂θ
ˆ
θR
= ∂l(θ|~U(1), . . . , ~U(J))
∂θ
θ∗
+ ∂2l(θ|~U(1), . . . , ~U(J))
∂θ2
θ∗
(ˆθR− θ∗).
Thus,
√N (ˆθ − θ∗) = −
"
1 N
∂2l(θ|~U(1), . . . , ~U(J))
∂θ2
θ∗
#−1
√1 N
∂l(θ|~U(1), . . . , ~U(J))
∂θ
θ∗
. (50)
Then, we check the Lyapunov condition. Denote bySN2 , PJj=1NjI(θ∗, I(j)), (SNτ)2 , PJj=1Nj τ′I(θ∗, I(j))τ for an arbitrary τ , and
Mj , E
τ′
∂
∂θfU(j)(U1nj, U2nj, . . . , ULnj|θ) fU(j)(U1nj, U2nj, . . . , ULnj|θ)
3
(51)
which exists, since condition (C3) is satisfied andU(j)is a categorical distribution. Moreover, by condition (C5) and (51),
N →∞lim 1 (SNτ)3
J
X
j=1 Nj
X
nj=1
E
τ′ ∂
∂θlog fU(j)(U1nj, U2nj, . . . , ULnj|θ)
− E[τ′ ∂
∂θlog fU(j)(U1nj, U2nj, . . . , ULnj|θ)]
3#
= lim
N →∞
1 (SNτ)3
J
X
j=1 Nj
X
nj=1
E
"
τ′ ∂
∂θlog fU(j)(U1nj, U2nj, . . . , ULnj|θ) − 0
3#
≤ lim
N →∞
1 (SNτ)3
J
X
j=1 Nj
X
nj=1
Mj
≤ lim
N →∞
1
(SNτ)3N max{M1, . . . , MJ}
≤ lim
N →∞
N max{M1, . . . , MJ}
(N min{τ′I(θ∗, I(1))τ, . . . , τ′I(θ∗, I(J))τ })32
≤ lim
N →∞
√1 N
max{M1, . . . , MJ}
min{τ′I(θ∗, I(1))τ, . . . , τ′I(θ∗, I(J))τ } = 0.
That is, the Lyapunov condition is satisfied. Thus, by Lyapunov central limit theorem (see, e.g., [29]), for all τ ,
√1
Nτ′ ∂l(θ|~U(1), . . . , ~U(J))
∂θ
θ∗
→ N(0,(SNτ)2 N ).
Moreover, by the Cram´er-Wold device (see, e.g., [29]), we have
√1 N
∂l(θ|~U(1), . . . , ~U(J))
∂θ
θ∗
→ N(0,SN2
N ). (52)
By application of the weak law of large number, we have 1
Nj
∂2l(θ|~U(j))
∂θ2 θ∗
→ −I(θ∗; I(j)(·)), j = 1, . . . , J, (53)
wherel(θ|~U(j)) is defined in (42).
By Slustkey Theorem and (53), we have 1
N
∂2l(θ|~U(1), . . . , ~U(J))
∂θ2
θ∗
=
J
X
j=1
Nj N
1 Nj
∂2l(θ|~U(j))
∂θ2 θ∗
→ −
J
X
j=1
Nj
N I(θ∗; I(j)(·)) = −SN2
N . (54) Based on (50) (52), (54) and Slustkey Theorem, we have
√N (ˆθR− θ∗) −→ N(0, I−1(θ∗; I(1)(·), . . . , I(J)(·)) (55)
whereI−1(θ∗; I(1)(·), . . . , I(J)(·)) defined by (45).
Furthermore, by (45) (47) and the condition (C7) (23),
I−1(θ∗; I(1)(·), . . . , I(J)(·)) =
J
X
j=1
Nj
N I(θ∗; I(j)(·))
−1
J
X
j=1
Nj
N I(θ∗; I(1)(·))
−1
= I−1(θ∗; I(1)(·)).
Therefore, we have the theorem.
Remark 3.7. In the first step of the robust MLE scheme, how do we chooseJ different quantizers?
1. If some prior information that may be due to previous experience or feedback information that can be obtained such as the fact that thresholds should be in a bounded region, the J groups of different quantizers can be uniformly chosen in that region. If there is no prior information available, then some real-time training samples are required. A basic heuristic criterion is that do not let all samples to be in the 0 region or the 1 region. For example, fori-th sensor, we can choose J different quantizers by observingNsitraining samples. First, find thep-quantile1and(1−p)-quantile of Nsitraining samples, and then uniformly chooseJ different thresholds between p-quantile and (1 −p)-quantile (assume that p < 1 − p). If Nsiis large, one can choosep close to12, and vice versa.
2. How to determineJ? Actually, there is a tradeoff between robustness, optimality and memory require- ments. Theorem 3.6 shows that the asymptotic variance is the inverse of a convex linear combination of Fisher information matrices based onJ different groups of quantizers. Thus, a larger J means more robustness compared with the worst case. However, it may also mean worse performance, since the best quantizer can be averaged by other worse ones. In addition, note that the above procedure requires theNsitraining samples to be stored at thei-th sensor. Thus, Nsicannot be too large for sensors with limited memory so that the median of the samples may be randomly biased with respect to the median of the population. If Nsi is large, one can choose one quantizer (J = 1) with one threshold—the median number, it may be the “best” quantizer. However, ifNsiis small which means a possibly large bias so that it may result in a very poor performance. Thus, it is necessary to choose a largerJ. Based on experiments that we have run,J ranging between 2 and 10 is likely to yield good results.
1Here, p-quantile (0 ≤ p ≤ 1) is point such that the cumulative probability of samples will be less than p. Thus,12-quantile is median.
4 Numerical Examples
Let us consider a two-sensor Gaussian example:(Y1, Y2) ∼ N(µ1, µ2, σ21, σ22, ρ) with joint pdf as follows p(y1, y2|θ) = 1
2πp(1 − ρ2)σ1σ2
· exp −1
2(1 − ρ2)
"
y1− µ1
σ1
2
− 2ρ y1− µ1
σ1
y2− µ2
σ2
+ y2− µ2
σ2
2#!
,
where the parameter vector to be estimatedθ, [θ0, θ1, θ2] = [ρ, µ1, µ2]. We will assume µ1 = 5, µ2 = 7, σ1 = 6, σ2 = 8 and obtain results for several different values of ρ = [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]. We will compare the robust MLE with MLE based on a single quantizer. Assume that the prior information is that thresholds are in[0, 15]. We uniformly choose the following four groups of different quantizers.
I(1)(y) = (I1(1)(y1), I2(1)(y2)) = (I[y1− 15], I[y2− 15]), I(2)(y) = (I1(2)(y1), I2(2)(y2)) = (I[y1− 10], I[y2− 10]), I(3)(y) = (I1(3)(y1), I2(3)(y2)) = (I[y1− 5], I[y2− 5]), I(4)(y) = (I1(4)(y1), I2(4)(y2)) = (I[y1− 0], I[y2− 0]), where
I[x − c] =
1, ifx ≥ c, 0, ifx < c.
For robust MLE, we letN1 = N2 = N3 = N4 = N/4 where N is number of samples for MLE with fixed quantizerI(j)(y), j = 1, 2, 3, 4 respectively.
In Figs 2–4, theoretical CRLBs with different number of measurements are plotted for parametersθ = [0.5, 5, 7] respectively. The number of samples N = [800, 900, 1000, 1100, 1200] are considered respec- tively. The corresponding MSEs based on 2000 Monte Carlo runs are also plotted in Figs 5–7 respec- tively. In Fig. 8, theoretical CRLBs with different values ofρ and N = 1200 are plotted for parameters θ0 = [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], θ1 = 5 and θ2 = 7 respectively. The corresponding MSEs based on 2000 Monte Carlo runs are also plotted forθ0, θ1, θ2in Fig. 9 respectively.
From Figs 2–9, we have the following observations:
1. Both theoretical CRLBs and MSEs based on 2000 Monte Carlo runs for robust MLE are much smaller than those of the MLE based on the single quantizer that is worst in the group. This phenomenon is consistent with the analytical result in Theorem 3.6. Robust MLE is a kind of conservative estimate,