Bounds for the approximation of Poisson binomial distribution by Poisson distribution

(1)

R E S E A R C H

Open Access

Bounds for the approximation of

Poisson-binomial distribution by Poisson

distribution

Tran Loc Hung

*

_{and Vu Thi Thao}

*_{Correspondence:}

[email protected]

Faculty of Basic Science, University of Finance & Marketing (UFM), 306 Nguyen Trong Tuyen St., Tan Binh Dist., Ho Chi Minh City, Vietnam

Abstract

Let (Xnk,k= 1, 2,. . .,n;n= 1, 2,. . .) be a row-wise triangular array of independent

Bernoulli random variables with success probabilities

P(Xnk= 1) = 1 –P(Xnk= 0) =pnk∈[0, 1],k= 1, 2,. . .,n;n= 1, 2,. . .. For everyn= 1, 2,. . .,

the random variablesSn=

n

k=1Xnkhave probability distributions with complicated

structure and therefore they are used to being approximated by Poisson distribution. Well-known Le Cam’s inequality is established for providing information on the quality of a Poisson approximation. The main aim of this paper is to re-establish the Le Cam-type inequalities via a linear operator. The operator method used in this paper is quite elementary and it also could be applied for the probability distributions of random sumsSNn=

Nn

k=1Xnkin the Poisson approximation, whereNn,n= 1, 2,. . ., are positive integer-valued random variables, independent of allXnk,k= 1, 2,. . .,n; n= 1, 2,. . ..

MSC: 60F05; 60G50; 41A36

Keywords: Poisson distribution; Poisson-binomial random variable; Le Cam’s inequality; linear operator; random sum; convergence in distribution; convergence in probability

1 Introduction

Throughout this paper, let (Xnk, k= , , . . . ,n; n= , , . . .) be a row-wise triangular ar-ray of independent Bernoulli random variables with success probabilitiesP(Xnk = ) =  –P(Xnk= ) =pnk∈[, ],k= , , . . . ,n;n= , , . . . . The random variablesSn=nk=Xnk, n= , , . . . , are often called the Poisson-binomial random variables. And it is easily seen that the mean, variance, and characteristic function ofSn,n= , , . . . , areE(Sn) = n

k=pnk,D(Sn) = n

k=pnk( –pnk), andfSn(t) =E(eitSn) =

n

k=( –pnk+pnkeit), respec-tively.

The probability distributions ofSn,n= , , . . . , have many applications in various areas of mathematics and statistics such as reliability, survival analysis, survey sampling, econo-metrics, and so on (the reader is referred to [, ] and [] for full development). However, since the probability distributions ofSn,n≥, have the complicated structure (see, for instance, []), they are used to being approximated by the distribution of Poisson random variablesZλnwith a positive parameterλn=E(Sn) =

n

k=pnk. More speciﬁcally, assume

(2)

that

lim

n→∞λn=λ ( <λ< +∞), ()

then

Sn d –

→Zλ, asn→ ∞, ()

where, and from now on, the notation→–d means the convergence in distribution (see, for instance, []). Moreover, remarkable Le Cam’s inequality for the Poisson-binomial distri-bution [] is widely considered in literature as follows:

∞

k=

P(Sn=k) –P(Zλn=k)≤

n

k=

p_nk ()

(we refer the reader to the results of Le Cam [], Barbour, Holst, and Janson [], Steele [], Chen [], Chen and Liu [], Neammanee [], and Ross [] for more details).

It should be noted that in [, ], and [] various powerful tools (such as the method of matrix analysis, the semi-group method, the coupling method, and the Chen-Stein method) for providing Le Cam’s inequality have been demonstrated. The main objective of this paper is to obtain the bounds for well-known Le Cam’s inequality in () using the op-erator method, introduced by Renyi []. In the third section, we use the opop-erator method from [] to establish the bounds for the approximation of Poisson-binomial distribution by Poisson distribution. The operator method in this paper is quite elementary and it also could be applied for random sumsSNn=

Nn

k=Xnk,S= , whereNn,n= , , . . . are

pos-itive integer-valued random variables, independent of allXnk,k= , , . . . ,n;n= , , . . . . This will be taken up in the last section. We refer the reader to the works of Trotter [], Renyi [], and Hung [] for a deeper discussion of this operator method. Based on the operator method, the received results of this paper are analogues of Le Cam’s inequality in classical literature (we refer the reader to Steele [], Le Cam [], Chen [], Neammanee [], and Wang [] for a complete treatment of the problem).

2 Preliminaries

In the sequel we will need the operator method, which has been used for a long time in various studies of classical limit theorems for sums of independent random variables (see Trotter [], Renyi [], and Hung [] for the complete bibliography).

We recall some deﬁnitions and notations. We denote by K the set of all real-valued bounded functionsf(x), deﬁned on the set of non-negative integersZ+={, , , . . .}. The

norm of a functionf ∈Kis deﬁned byf=sup_x_∈_Z +|f(x)|.

Deﬁnition . We deﬁne a linear operator associated with a positive discrete random variableX,AX:K→K, by setting

(AXf)(x) :=E

f(X+x)=

∞

k=

(3)

It is to be noticed that the linear operator deﬁned in () is actually a discrete form of Trotter’s operator (we refer the readers to Trotter [], Renyi [], and Hung [] for a more general and detailed discussion of this operator method).

We will need some properties of the operator in () in the sequel. LetAX,AYbe operators associated with two discrete random variablesXandYforf,g∈K. Suppose thatαand β are two real numbers, then we easily get the following linear property of the operator

in ():

AX(αf+βg) =αAX(f) +βAX(g).

We deﬁne the operator (AX+AY) by (AX+AY)f =AXf +AYf,∀f ∈K, and the product of two operatorsAXandAY is (AXAY)f=AX(AYf),∀f ∈K.

It is obvious that

.AXf ≤ ffor allf∈K.

.AXf +AYf ≤ AXf+AYffor allf∈K.

. Suppose thatAXandAYare operators associated with two independent random vari-ablesX,Yandf ∈K. ThenAXAYf=AYAXf =AX+Yf.

In fact, for allf∈Kandx∈Z+,

AXAYf(x) =AX

AYf(x)

=AX

_∞

k=

f(x+k)P(Y=k)

=

∞

r,k=

f(x+k+r)P(Y=k)P(X=r)

=

∞

l=

f(x+l)P(X+Y=l)

=AX+Yf(x)

by an argument analogous to that used for the proof ofAYAXf =AX+Yf.

. Suppose thatAX,AX, . . . ,AXnare the operators associated with the independent

ran-dom variablesX,X, . . . ,Xn. ThenASn=AXAX· · ·AXnis the operator associated with the

partial sumSn=X+X+· · ·+Xn.

. Suppose thatAX,AX, . . . ,AXnandAY,AY, . . . ,AYnare operators associated with

in-dependent random variablesX,X, . . . ,XnandY,Y, . . . ,Yn. Moreover, assume that allXi andYjare independent fori,j= , , . . . ,n. Then, for everyf∈K,

An

k=Xkf –Ank=Ykf ≤

n

k=

AXkf–AYkf. ()

Clearly,

AXAX· · ·AXn–AYAY· · ·AYn=

n

k=

(4)

It deduces that

An

k=Xkf –A

n

k=Ykf ≤

n

k=

_A_X

· · ·AXk–(AXk–AYk)AYk+· · ·AYnf

≤

n

k=

AYk+· · ·AYn(AXk–AYk)f

≤

n

k=

AXkf–AYkf.

. It is to be noticed thatAn

Xf –AnYf ≤nAXf –AYf.

. Suppose that X, . . . ,Xn andY, . . . ,Yn are independent random variables (in each group), and let{Nn,n= , , . . .}be a sequence of positive integer-valued random variables independent of allXkandYk,k= , , . . . . Then, for everyf∈K,

A_Nn

k=Xkf –A

_Nn

k=Ykf ≤

∞

n=

P(Nn=n) n

k=

AXkf –AYkf.

Lemma . The equation AXf(x) =AYf(x)for f ∈K,x∈Z+,provided that X and Y are

identically distributed random variables.

LetAX,AX, . . . ,AXn, . . . be a sequence of operators associated with the independent

dis-crete random variablesX,X, . . . ,Xn, . . . , andAXbe the operator associated with the dis-crete random variableX. The following lemma states one of the most important properties of the operatorAX.

Lemma . A suﬃcient condition for a sequence of random variables X,X, . . . ,Xn. . . con-verging in distribution to a random variable X is that

lim

n→∞AXnf–AXf=  for all f ∈K.

Proof Sincelimn→∞AXnf –AXf= , for allf ∈K, we get

lim n→∞

∞

k=

f(x+k)P(Xn=k) –P(X=k)

=  for allf ∈Kandx∈Z+.

If we choose

f(x) = ⎧ ⎨ ⎩

, if ≤x≤t, , ifx>t.

Then

lim n→∞

t

k=

P(Xn=k) –P(X=k)

= .

It follows thatP(Xn≤t) –P(X≤t)→ as n tends to +∞. In other words,Xn

d –

(5)

3 A bound of Poisson-binomial approximation

LetAXnk,k= , . . . ,n;n= , , . . . be the operators associated with the random variablesXnk,

k= , . . . ,n;n= , , . . . , and letAZ_pnk,k= , . . . ,n;n= , , . . . , be the operators associated with the Poisson random variables with parameterspnk,k= , . . . ,n;n= , , . . . . On the assumption thatZλnis a Poisson random variable with a positive parameterλn=

n

k=pnk,

we can perform thatZλn

d

=n_k_=Zpnk, whereZpn,Zpn, . . . ,Zpnn are independent Poisson

random variables with positive parameterspn,pn, . . . ,pnn, and the notation= denotesd

coincidence of distributions. We will now state an analogue of Le Cam’s inequality [] via the linear operator in () as follows.

Theorem . Let(Xnk, ≤k≤n;n= , , . . .)be a row-wise triangular array of indepen-dent,Bernoulli random variables with success probabilities P(Xnk = ) =  –P(Xnk = ) = pnk,pnk∈[, ],k= , , . . . ,n;n= , , . . . .Let us write Sn=

n

k=Xnkandλn=

n

k=pnk.We denote by Zλnthe Poisson random variable with the parameterλn.Then,for all real-valued

bounded functions f ∈K,we have

ASnf–AZλnf ≤f

n

k=

p_nk. ()

Proof Applying the inequality in (), we have

ASnf–AZλnf ≤

n

k=

AXnkf–AZpnkf.

Moreover, for allf ∈Kand for allx∈Z+, we conclude that

AXnkf(x) –AZ_pnkf(x) =

∞

r=

f(x+r)P(Xnk=r) –P(Zpnk=r)

=

∞

r=

f(x+r)

P(Xnk=r) – e–pnk_pr

nk r!

=f(x) –pnk–e–pnk

+f(x+ )pnk–pnke–pnk

–

∞

r=

f(x+r)e

–pnk_pr

nk

r! .

Since∞_r_=e –_pnk_pr

nk

r! =  –e–pnk–pnke–pnk, for allf∈Kandx∈Z+, it may be concluded that

AXnk(f) –AZpnk(f)=

f(x)

 –pnk–e–pnk

+f(x+ )pnk–pnke–pnk

–

∞

r=

f(x+r)e

–pnk_pr

nk r!

≤f(x) –pnk–e–pnk+f(x+ )

pnk–pnke–pnk

+ ∞ r=

f(x+r)e

–pnk_pr

nk r!

(6)

≤ sup x∈Z+

f(x)e–pnk_{–  +}_p

nk+pnk–pnke–pnk+  –e–pnk–pnke–pnk

≤fpnk

 –e–pnk≤__f_p

nk.

Therefore, applying (), we can assert that

ASn(f) –AZλn(f)≤f

n

k=

p_nk.

This completes the proof.

Remark . According to Theorem . and assumption (), using the deﬁnition of the norm of the operatorA, we get following inequality:

ASn–AZλ ≤

_n

k=

p_nk .

The following corollaries are immediate consequences from Theorem ..

Corollary . Under the stated assumptions of Theorem.,for all k= , , , . . . ,

n

j=

p_nj. ()

Proof Choose the particular functionf(x),x∈Z+, such that

f(x+m) = ⎧ ⎨ ⎩

 ifm=k,

 ifm=k.

Sety=x+m. Sincex,m∈Z+_{, it follows that}_y_∈_Z

+. Then we have

f=sup x

f(x)=sup y

f(y)= .

Thus, according to Theorem ., we conclude that

ASn(f) –AZλn(f)≤

n

j=

p_nj. ()

On the other hand, by choosing the functionf(x) as above, we have

ASnf–AZλnf=sup

x

ASnf(x) –AZλnf(x)

=sup x

∞

m=

f(x+m)P(Sn=m) –P(Zλn=m)

=sup x

f(x)P(Sn= ) –P(Zλn= )

(7)

+f(x+k)P(Sn=k) –P(Zλn=k) +· · ·

=P(Sn=k) –P(Zλn=k).

Applying () we can assert that

n

j=

p_nj.

The proof is complete.

Corollary . Let condition()hold.Under the hypotheses of Theorem.,if moreover

lim

n→∞max≤k≤npnk= , ()

then the distribution of Snconverges to the Poisson distribution with meanλ,i.e.,Sn d –

→Zλ

as n→ ∞.

Proof The proof is based on the following observation:

n

k=

p

nk≤_max_≤ k≤npnk×

n

k=

pnk.

According to the inequality in () for allf ∈Kand (), we conclude that lim

n→∞ASn(f) –AZλn(f)= .

As an argument analogous to the one used for the proof of Corollary ., on account of Lemma ., we get

lim n→∞

P(Sn=k) –P(Zλn=k)

= .

Then, on account of (), we have

lim

n→∞P(Sn=k) =nlim→∞

e–λn_(λ_n)k

k! =

e–λ_λk

k! , k= , ,  . . . .

Thus, the proof is straightforward.

4 A bound of random Poisson-binomial approximation

Throughout this section, we begin with assuming thatNn,n= , , . . . , are positive integer-valued random variables independent of allXnk,k= , , . . . ,n;n= , , . . . , which are sup-posed to obey the relation

Nn P –

→+∞ asn→+∞. ()

Here and subsequently,→–P denotes the convergence in probability. For everyn= , , . . . , we denote bySNnthe random sumsSNn=

Nn

(8)

random sumsSNn could be said to be the random Poisson-binomial random variables.

In this section, we establish Le Cam-type inequalities related to the Poisson approxima-tion for distribuapproxima-tions of random Poisson-binomial variables. It is to be noticed that many various results concerning the random summations have already been included in the textbooks of probability theory; see,e.g., [, , ]).

LetAXn,AXn, . . . ,AX_nNn be operators associated with the independent triangular array of random variablesXn,Xn, . . . ,XnNn, and letAZ_pn,AZpn, . . . ,AZpnNn be operators

asso-ciated with the independent Poisson distributed random variables with positive param-eterspn,pn, . . . ,pnNn. According to the properties of the linear operator in (), we have

AS_Nn =AXnAXn· · ·AXnNn andAZλ_Nn =AZ_pnAZpn· · ·AZpnNn are the respective operators

associated with the random sumsSNn=

Nn

k=XnkandZλ_Nn=

Nn

k=Zpnk.

Theorem . Let(Xnk,k= , , . . . ,n;n= , , . . .)be a row-wise triangular array of inde-pendent,non-identically distributed Bernoulli random variables with success probabilities P(Xnk= ) =  –P(Xnk= ) =pnk,pnk∈[, ],k= , , . . . ,n;n= , , . . . .Moreover,we sup-pose that Nn,n= , , . . .are independent positive integer-valued random variables, inde-pendent of all Xnk,k= , , . . . ,n;n= , , . . . .Then,for all real-valued bounded functions f ∈K and for all x∈Z+,we have

AS_Nn(f) –AZλ_Nn(f)≤fE

_N_n

k=

p_nk .

Proof According to the assumptions on the random variablesNn,Xnk,Zλn,k= , , . . . ,n;

n= , , . . . , we can write

AS_Nnf(x) =

∞

m=

P(Nn=m)

∞

k=

f(x+k)P(Sm=k),

and

AZλ_Nnf(x) =

∞

m=

P(Nn=m)

∞

k=

f(x+k)e

–λm_λk m k! .

Therefore, by an argument analogous to that used for the proof of Theorem ., for all real-valued functionf ∈K,x∈Z+, we have

_A_S

Nn(f) –AZλ_Nn(f)=

∞

m=

P(Nn=m)

AXn· · ·AXnm(f) –AZ_pn_· · ·AZpnm(f)

≤

∞

m=

P(Nn=m)AXn· · ·AXnm(f) –AZ_pn· · ·AZpnm(f)

≤

∞

m=

P(Nn=m) m

k=

fp_nk

≤fE

_N_n

k=

p

nk .

(9)

Note that the following remarks are immediate consequences from Theorem ..

Remark . According to Theorem . and assumption (), using the deﬁnition of the norm of the operatorA, we conclude that

AS_Nn–AZλ ≤E

_N_n

k=

p_nk .

Remark . By an argument analogous to that used for the proof of Corollary ., under the stated assumptions of Theorem ., for allk= , , , . . . , we have

P(SNn=k) –P(ZλNn =k)≤E

_N

n

k=

p_nk .

When the success probability is identical,pnk=pn∈[, ],k= , , . . . ,n; forn= , , . . . , we obtain the following remark.

Remark . Suppose that theNn,n= , , . . . are positive integer-valued random variables independent of all independent identically distributed random variablesXnk, and assume thatP(Xnk= ) =  –P(Xnk= ) =pn∈[, ],k= , , . . . ,Nn;n= , , . . . . Then, for allk= , , , . . . , we get the following inequality:

P(SNn=k) –P(Zλ_Nn =k)≤E(Nn)pn.

It is worth noticing that when the positive integer-valued random variables Nn, n= , , . . . take on the valuenwith probability one,i.e.,P(Nn=n) = , the results concern-ing the probability distributions of the random sumsSNnin the Poisson approximation in

this section return to the ones in Section .

We conclude this paper with the following comments. The linear operator in this paper introduced by Renyi [] essentially is a discrete form of Trotter’s operator [] which has been used in the theory of limit theorems. The proofs of theorems in this paper by the operator method are very elementary and elegant. The received results in this article allow us to think about a new approach method to the Poisson approximation problems for the distributions of the sums of the discrete independent random variables like Poisson-binomial, geometric, and negative binomial variables.

Competing interests

The authors declare that they have no completing interests.

Authors’ contributions

All authors read and approved the ﬁnal manuscript.

Acknowledgements

Dedicated to Professor Nguyen Duy Tien on the occasion of his 70th birthday.

The authors wish to express their gratitude to the referees for valuable remarks and comments improving the previous version of this paper. This work is supported by the Vietnam National Foundation For Science and Technology Development (NAFOSTED, Vietnam), grant 101.01-2010.02.

(10)

References

1. Chen, SX, Liu, JS: Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Stat. Sin.7, 875-892 (1997)

2. Tejada, A, Dekker, AJ: The role of Poisson’s binomial distribution in the analysis of TEM images. Ultramicroscopy111, 1553-1556 (2011)

3. Wang, YH: On the number of successes in independent trials. Stat. Sin.3, 295-312 (1993) 4. Renyi, A: Probability Theory. Akad. Kiadó, Budapest (1970)

5. Le Cam, L: An approximation theorem for the Poisson binomial distribution. Pac. J. Math.10(4), 1181-1197 (1960) 6. Barbour, AD, Holst, L, Janson, S: Poisson Approximation. Clarendon, Oxford (1992)

7. Steele, JM: Le Cam’s inequality and Poisson approximations. Am. Math. Mon.101(1), 48-54 (1994)

8. Chen, LHY: On the convergence of Poisson binomial to Poisson distribution. Ann. Probab.2(1), 178-180 (1974) 9. Neammanee, K: A nonuniform bound for the approximation of Poisson binomial by Poisson distribution. Int. J. Math.

Math. Sci.2003(48), 3041-3046 (2003)

10. Ross, SM: Introduction to Probability Models, 9th edn. Elsevier, Amsterdam (2007) 11. Trotter, HF: An elementary proof of the central limit theorem. Arch. Math.10, 226-234 (1959) 12. Hung, TL: On a probability metric based on Trotter operator. Vietnam J. Math.35(1), 22-33. (2007)

13. Kruglov, VM, Korolev, VY: Limit Theorems for Random Sums. Moscow University Press, Moscow (1990) (in Russian) 14. Gnedenko, BV, Korolev, VY: Random Summation: Limit Theorems and Applications. CRC Press, Boca Raton (1996)

doi:10.1186/1029-242X-2013-30