1. (a) The gamma(α = 2, θ) population pdf is fY(y) = 1 θ2ye −y/θ , y > 0 0, otherwise. The joint pdf of the sample Y = (Y1, Y2, ..., Yn) is
fY(y) = fY(y1) × fY(y2) × · · · × fY(yn) = 1 θ2y1e −y1/θ× 1 θ2y2e −y2/θ× · · · × 1 θ2yne −yn/θ = 1 θ2 n n Y i=1 yi ! e−Pni=1yi/θ.
What is the (sampling) distribution of T = Pn
i=1Yi? We know the mgf of T and the mgf of Y ∼ gamma(2, θ) are related via mT(t) = [mY(t)]n. Recall that
mY(t) = 1 1 − θt 2 , for t < 1/θ. Therefore, mT(t) = [mY(t)]n= ( 1 1 − θt 2)n = 1 1 − θt 2n .
We recognize mT(t) as the mgf of a gamma random variable with shape parameter 2n and scale parameter θ. Therefore, T ∼ gamma(2n, θ) and the pdf of T is
fT(t) = 1 Γ(2n)θ2nt 2n−1e−t/θ, t > 0 0, otherwise, where t =Pn
i=1yi. Therefore, the conditional pdf of the sample Y, given T = t, is given by
fY|T(y|t) = fY(y) fT(t) = 1 θ2 n n Y i=1 yi ! e−Pni=1yi/θ 1 Γ(2n)θ2nt 2n−1e−t/θ = Γ(2n) Qn i=1yi t2n−1 ,
which does not depend on θ. Therefore, T =Pn
i=1Yi is a sufficient statistic for θ. (b) We could answer this question by calculating
V S 2 2 and V 1 2 n 2n + 1 Y2
directly. However, both would be very difficult calculations and comparing them might be difficult too analytically. Fortunately, we don’t have to calculate either one to answer this question. In part (a), we showed T = Pn
i=1Yi is a sufficient statistic, and the Rao-Blackwell Theorem assures us the MVUE for τ (θ) = θ2 is a function of T . The estimator bτ1 = S2/2 is not a function of T (i.e., if you know T , you cannot calculate S2/2). However, the estimator
b τ2 = 1 2 n 2n + 1 Y2
is a function of T because Y = T /n. Therefore, because bτ2 is unbiased, we know that τb2 is the MVUE of τ (θ) = θ2. This means that among all unbiased estimators of τ (θ) = θ2, the estimator τb2 has the smallest possible variance. Therefore, its variance is necessarily smaller than that ofτb1. This means
eff(bτ1 tobτ2) = V (bτ2) V (bτ1)
< 1.
We know eff(τb1 tobτ2) is strictly less than 1 because the MVUE is unique.
2. (a) The N (0, σ2) population pdf is
fY(y) = 1 √ 2πσe −y2/2σ2 , −∞ < y < ∞ 0, otherwise.
The joint pdf of the sample Y = (Y1, Y2, ..., Yn) is fY(y) = fY(y1) × fY(y2) × · · · × fY(yn) = √1 2πσe −y2 1/2σ2 ×√1 2πσe −y2 2/2σ2 × · · · ×√1 2πσe −y2 n/2σ2 = 1 √ 2πσ n e−Pni=1yi2/2σ2.
From Chapter 7, we know the sampling distribution of Y is Y ∼ N (0, σ2/n). Therefore, the pdf of Y is fY(t) = 1 √ 2πσ/√n e −t2/2(σ2/n) , −∞ < t < ∞ 0, otherwise,
where t = y. Therefore, the conditional pdf of the sample Y, given Y = t, is given by
fY|Y(y|t) = fY(y) fY(t) = 1 √ 2πσ n e−Pni=1yi2/2σ2 1 √ 2πσ/√n e −t2/2(σ2/n) = 1 √ n(√2πσ)n−1exp " − 1 2σ2 n X i=1 y2i − nt2 !# ,
which is not free of σ2; i.e., the process of conditioning the sample Y on Y = t does not remove all information about σ2. Therefore, the sample mean Y is not a sufficient statistic.
(b) Because Y ∼ N (0, σ2/n), we know E(Y ) = 0 and V (Y ) = σ2/n. Following the hint, we have E(Y2) = V (Y ) + [E(Y )]2= σ 2 n + (0) 2 = σ2 n. Therefore, E(nY2) = nE(Y2) = n σ 2 n = σ2. This shows nY2 is an unbiased estimator of σ2.
(c) No, it’s not. The Rao-Blackwell Theorem guarantees the MVUE of σ2 must be a function of a sufficient statistic. Because Y is not sufficient, the estimator nY2 is not the MVUE.
3. (a) The likelihood function is L(p|y) = n Y i=1 pY(yi) = pY(y1) × pY(y2) × · · · × pY(yn)
= py1(1 − p)1−y1× py2(1 − p)1−y2× · · · × pyn(1 − p)1−yn
= pPni=1yi(1 − p)n−
Pn i=1yi.
The log-likelihood function is ln L(p|y) = ln h pPni=1yi(1 − p)n−Pni=1yi i = ln pPni=1yi + ln h (1 − p)n−Pni=1yi i = n X i=1 yiln p + n − n X i=1 yi ! ln(1 − p). The derivative of the log-likelihood function is
∂ ∂pln L(p|y) = Pn i=1yi p − n −Pn i=1yi 1 − p set = 0 =⇒ (1 − p) n X i=1 yi− p n − n X i=1 yi ! = 0 =⇒ n X i=1 yi− p n X i=1 yi− np + p n X i=1 yi = 0 =⇒ n X i=1 yi− np = 0 =⇒p =b Pn i=1yi n = y. We now show this first-order critical point y maximizes ln L(p|y). The second derivative of the log-likelihood function is ∂2 ∂p2 ln L(p|y) = − Pn i=1yi p2 − n −Pn i=1yi (1 − p)2 < 0
for 0 < p < 1. Therefore, y maximizes ln L(p|y). The MLE of p from the first strategy is Y = 1 n n X i=1 Yi.
(b) Define the events A = {“yes” response} and B = {student flips H}. Note that B and B partition the sample space of possible flips. From the Law of Total Probability, we know
θ = P (X = 1) = P (A) = P (A|B)P (B) + P (A|B)P (B) = p(0.5) + 0.3(0.5) = 0.5p + 0.15. From part (a), we know
X = 1 n n X i=1 Xi is the MLE of θ. Because
θ = 0.5p + 0.15 ⇐⇒ p = θ − 0.15
0.5 = τ (θ), the MLE of p under the second strategy is
X − 0.15 0.5 , by invariance.
0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 p Relativ e efficiency
(c) We are asked to compare
b
p1 = Y and pb2 =
X − 0.15 0.5 .
It is easy to see both estimators are unbiased. Therefore, we can compare the estimators on the basis of their variances. For the first strategy, we have
V (bp1) = V (Y ) =
p(1 − p)
n .
For the second strategy, we have V (pb2) = V X − 0.15 0.5 = 1 (0.5)2V (X − 0.15) = 4V (X). Now, X1, X2, ..., Xn are iid Bernoulli(θ), where θ = 0.5p + 0.15. Therefore,
V (X) = θ(1 − θ) n = (0.5p + 0.15)[1 − (0.5p + 0.15)] n = (0.5p + 0.15)(0.85 − 0.5p) n . Therefore, V (pb2) = 4 (0.5p + 0.15)(0.85 − 0.5p) n . Finally, the relative efficiency of bp1 topb2 is given by
eff(pb1 topb2) = V (pb2) V (pb1) = 4 (0.5p + 0.15)(0.85 − 0.5p) n p(1 − p) n = 4[(0.5p + 0.15)(0.85 − 0.5p)] p(1 − p) .
A graph of eff(pb1 topb2) versus p is shown above, with a horizontal dotted line at eff(pb1 topb2) = 1. Note that eff(pb1 topb2) is always above one; i.e.,pb1 is more efficient thanbp2. Of course, this is not surprising given that additional uncertainty is introduced in the second strategy. However, this also assumes that all answers are truthful.
4. (a) The likelihood function is given by L(θ|y) = n Y i=1 fY(yi|θ) = fY(y1|θ) × fY(y2|θ) × · · · × fY(yn|θ) = θ 1 y1+ 1 θ+1 × θ 1 y2+ 1 θ+1 × · · · × θ 1 yn+ 1 θ+1 = θn " n Y i=1 1 yi+ 1 #θ+1 .
Note that we can write the likelihood function as
L(θ|y) = θn " n Y i=1 1 yi+ 1 #θ | {z } g(t,θ) × n Y i=1 1 yi+ 1 | {z } h(y1,y2,...,yn) , where t =Qn
i=1[1/(yi+ 1)]. By the Factorization Theorem, it follows that
T = n Y i=1 1 Yi+ 1
is a sufficient statistic for θ. (b) First, note the function
u = h(y) = − ln
1 y + 1
is a strictly increasing function of y over RY = {y : y > 0}, the support of Y . This is true because ln[1/(1 + y)] is clearly strictly decreasing and thus − ln[1/(1 + y)] is strictly increasing. You could also see this by noting
h0(y) = 1
y + 1 > 0.
Therefore, u = h(y) is a 1:1 function over RY and hence the transformation method can be applied. What is the support of U ? Note that
y > 0 =⇒ 0 < 1 y + 1 < 1 =⇒ ln 1 y + 1 < 0 =⇒ − ln 1 y + 1 > 0.
Therefore, the support of U is RU = {u : u > 0}. Next, we find the inverse transformation. We have u = h(y) = − ln 1 y + 1 =⇒ −u = ln 1 y + 1 =⇒ e−u = 1 y + 1 =⇒ y + 1 = eu =⇒ y = eu− 1 = h−1(u). The derivative of the inverse transformation is
d duh
−1(u) = d du(e
Therefore, the pdf of U , for u > 0, is given by fU(u) = fY(h−1(u)) d duh −1(u) = θ 1 eu− 1 + 1 θ+1
× |eu| = θe−θue−u× eu = θe−θu. Summarizing,
fU(u) = (
θe−θu, u > 0 0, otherwise.
We recognize fU(u) as the exponential pdf with mean β = 1/θ. Now simply note that
V = − ln T = − ln " n Y i=1 1 Yi+ 1 # = n X i=1 − ln 1 Yi+ 1 = n X i=1 Ui,
that is, V is the sum of iid exponential random variables with mean β = 1/θ. Therefore, the mgf of V is mV(t) = [mU(t)]n= 1 1 −θt n ,
for t < θ. We recognize mV(t) as the mgf of a gamma random variable with shape parameter α = n and scale parameter β = 1/θ. Therefore, V ∼ gamma(n, 1/θ) because mgfs are unique. (c) The hint is reminding us that V = − ln T and (by the Rao-Blackwell Theorem) the MVUE must be a function of T = n Y i=1 1 Yi+ 1 .
Therefore, to find the MVUE of θ, we need to find a function of V that is an unbiased estimator of θ. We know that E(V ) = n/θ, so let’s calculate the expectation of 1/V . We have
E 1 V = Z R 1 vfV(v)dv = Z ∞ 0 1 v θn Γ(n)v n−1e−θv | {z } gamma(n, 1/θ) pdf dv = θ n Γ(n) Z ∞ 0 v(n−1)−1e−θvdv = θ n Γ(n)Γ(n − 1) 1 θ n−1 = Γ(n − 1)θ n (n − 1)Γ(n − 1)θn−1 = θ n − 1. Therefore, E n − 1 V = (n − 1)E 1 V = (n − 1) θ n − 1 = θ. Therefore, n − 1 V = n − 1 − ln T = − n − 1 ln " n Y i=1 1 Yi+ 1 # = − n − 1 n X i=1 ln 1 Yi+ 1
5. (a) The likelihood function is L(θ|y) = n Y i=1 fY(yi|θ) = fY(y1|θ) × fY(y2|θ) × · · · × fY(yn|θ) = 1 2(1 + θy1) × 1 2(1 + θy2) × · · · × 1 2(1 + θyn) = 1 2 n n Y i=1 (1 + θyi). There is no way to factor the likelihood function
L(θ|y) = L(θ|y1, y2, ..., yn) = g(t, θ)h(y1, y2, ..., yn),
where g depends on θ and a sufficient statistic T = t. Therefore, we can not reduce the entire sample to a scalar statistic T without losing information about θ.
(b) The method of moments (MOM) estimator of θ is found by setting up the equation E(Y ) set= Y
and solving it for θ. Let’s find E(Y ). We have E(Y ) = Z R yfY(y)dy = Z 1 −1 y × 1 2(1 + θy)dy = 1 2 Z 1 −1 (y + θy2)dy = 1 2 y2 2 + θy3 3 1 −1 = 1 2 1 2 + θ 3 − 1 2+ θ 3 = θ 3. Therefore, the MOM estimator solves
θ 3
set
= Y =⇒ bθ = 3Y .
The MOM estimator is unbiased because E(bθ) = 3E(Y ) = 3(θ/3) = θ.
(c) Because a scalar sufficient statistic T does not exist, it is firstly not surprising the MLE does not exist in closed form. We have calculated L(θ|y) above. The log-likelihood function is
ln L(θ|y) = ln " 1 2 n n Y i=1 (1 + θyi) # = −n ln 2 + n X i=1 ln(1 + θyi). The derivative of the log-likelihood function is given by
∂ ∂θln L(θ|y) = n X i=1 ∂ ∂θ[ln(1 + θyi)] = n X i=1 yi 1 + θyi . Therefore, the score equation is
n X i=1 yi 1 + θyi set = 0.
This equation has no closed-form solution, so the MLE would have to be calculated numerically after observing the sample y.