4. ESTIMATION
4.9 Chapter 4 Problems
4.9
Chapter 4 Problems
1. R Code for plotting a Binomial relative likelihood
Suppose for a Binomial experiment we observe y = 15 successes in n = 40 trials. The following R code will plot the relative likelihood function of and the line R ( ) = 0:1 which can be used to determine a 10% likelihood interval.
> y<-15 > n<-40
> thetahat<-y/n
> theta<-seq(0.15,0.65,0.001) # points between 0.15 and 0.65 spaced 0.001 apart
> Rtheta<-exp(y*log(theta/thetahat)+(n-y)*log((1-theta)/(1-thetahat))) > plot(theta,Rtheta,type="l") # plots R( )
> R10<-0.10+0*theta
> points(theta,R10,type="l") # draws a horizontal line at 0.10 > title(main="Binomial Likelihood for y=15 and n=40")
Modify this code for y = 75 successes in n = 200 trials and y = 150 successes in n = 400 trials and observe what happens to the width of the 10% likelihood interval.
2. R Code for plotting a Poisson relative likelihood
Suppose we have a sample y1; y2; :::; yn from a Poisson distribution and n = 25;
y = 5: The following R-code will plot the relative likelihood function of and the line R ( ) = 0:1 which can be used to determine a 10% likelihood interval.
> thetahat<-5 > n<-25 > theta<-seq(3.7,6.5,0.001) > Rtheta<-exp(n*thetahat*log(theta/thetahat)+n*(thetahat-theta)) > plot(theta,Rtheta,type="l") > R10<-0.10+0*theta
> points(theta,R10,type="l") # draws a horizontal line at 0.10 > title(main="Poisson Likelihood for ybar=5 and n=25")
Modify this code for larger sample sizes n = 100; n = 400 and observe what happens to the width of the 10% likelihood interval.
138 4. ESTIMATION
3. The following excerpt is from a March 2, 2012 cbc.ca news article:titled:
“Canadians lead in time spent online: Canadians are spending more time online than users in 10 other countries, a new report has found. The report, 2012 Canada Digital Future in Focus, by the internet marketing research company comScore, found Canadians spent an average of 45:3 hours on the internet in the fourth quarter of 2011.The report also states that smartphones now account for 45% of all mobile phone use by Canadians.”
Assume that these results are based on a random sample of 1000 Canadians.
(a) Suppose a 95% con…dence interval for , the mean time Canadians spent on the internet in this quarter, is reported to be [42:8; 47:8]. How should this interval be interpreted?
(b) Construct an approximate 95% con…dence interval for the proportion of Cana- dians whose mobile phone is a smartphone
4. Suppose that a fraction p of a large population of persons over 18 years of age never drink alcohol. In order to estimate p, a random sample of n persons is to be selected and the number y who do not drink determined; the maximum likelihood estimate of p is then ^p = y=n. We want our estimate ^p to have a high probability of being close to p, and want to know how large n should be to achieve this. Consider the random variable Y and estimator ~P = Y =n.
(a) Describe how you could work out the probability that 0:03 P~ p 0:03, if you knew the values of n and p.
(b) Suppose that p = 0:40. Using an approximation determine how large n should be in order to ensure
P 0:03 P~ p 0:03 = 0:95:
5. In the U.S.A. the prevalence of HIV (Human Immunode…ciency Virus) infections in the population of child-bearing women has been estimated by doing blood tests (anonymously) on all women giving birth in a hospital. One study tested 29; 000 women and found that 64 were HIV positive (had the virus). Give an approximate 99% con…dence interval for , the fraction of the population that is HIV positive. State any concerns you have about the accuracy of this estimate.
6. Let n and k be integers. Suppose that blood samples for n k people are to be tested to obtain information about , the fraction of the population infected with a certain virus. In order to save time and money, pooled testing is used: samples are mixed together k at a time to give a total of n pooled samples. A pooled sample will test negative if all k individuals in that sample are not infected.
4.9. CHAPTER 4 PROBLEMS 139
(a) Give an expression for the probability that x out of n samples will be negative, if the nk people are a random sample from the population. State any assumptions you make.
(b) Obtain a general expression for the maximum likelihood estimate ^ in terms of n, k and x.
(c) Suppose n = 100, k = 10 and x = 89. Give the maximum likelihood estimate ^, the relative likelihood function, and …nd a 10% likelihood interval for .
(d) Discuss (or do it) how you would select an “optimal”value of k to use for pooled testing, if your objective was not to estimate but to identify persons who are infected, with the smallest number of tests. Assume that you know the value of and the procedure would be to test all k persons individually each time a pooled sample was positive. (Hint: Suppose a large number n of persons must be tested, and …nd the expected number of tests needed.)
7. Recall Problem 5 of Chapter 2.
(a) Plot the relative likelihood function R( ) and determine a 10% likelihood inter- val. The likelihood interval can be found from the graph of R( ) or by using the function uniroot in R. Is very accurately determined?
(b) Suppose that we can …nd out whether each pair of twins is identical or not, and that it is determined that of 50 pairs, 17 were identical. Obtain the likelihood function, the maximum likelihood estimate and a 10% likelihood interval for in this case. Plot the relative likelihood function on the same graph as the one in (a), and compare the accuracy of estimation in the two cases.
8. The lifetime T (in days) of a particular type of light bulb is assumed to have a distribution with probability density function
f (t; ) = 1 2
3t2e t for t > 0 and > 0:
(a) Suppose t1; t2; : : : ; tn is a random sample from this distribution. Show that the
likelihood function for is equal to
c 3nexp
n
P
i=1
ti for > 0
where c is constant with respect to .
(b) Find the maximum likelihood estimate ^ and the relative likelihood function R( ).
(c) If n = 20 and
20
P
i=1
ti= 996, graph R( ) and determine the 15% likelihood interval
140 4. ESTIMATION
interval can be obtained from the graph of R( ) or by using the function uniroot in R.)
(d) Suppose we wish to estimate the mean lifetime of a light bulb. Show E(T ) = 3= . (Recall that R1
0
xn 1e xdx = (n) = (n 1)! for n = 1; 2; : : :). Find a 95% con…dence interval for the mean.
(e) The probability p that a light bulb lasts less than 50 days is
p = p ( ) = P (T 50; ) = 1 e 50 [1250 2+ 50 + 1]:
(Can you show this?) Thus ^p = p(^) = 0:580. Find an approximate 95% con…dence interval for p from the approximate 95% con…dence interval for . In the data referred to in part (c), the number of light bulbs which lasted less than 50 days was 11 (out of 20). Using a Binomial model, we can also obtain a 95% con…dence interval for p. Find this interval. What are the pros and cons of the second interval over the …rst one?
9. The 2 (Chi-squared) distribution:
(a) Determine the following using 2 tables:
(i) If X v 2(10) …nd P (X 2:6) and P (X > 16).
(ii) If X v 2(4) …nd P (X > 15).
(iii) If X v 2(40) …nd P (X 24:4) and P (X 55:8). Compare these values with P (X 24:4) and P (X 55:8) if X v N(40; 80).
(iv) If X v 2(25) …nd a and b such that P (X a) = 0:025 and P (X > b) = 0:025.
(v) If Xv 2(12) …nd a and b such that P (X a) = 0:05 and P (X > b) = 0:05. (b) Determine the following WITHOUT using 2 tables:
(i) If X v 2(1) …nd P (X 2) and P (X > 1:4). (ii) If X v 2(2) …nd P (X 2) and P (X > 3).
10. The 2 (Chi-squared) distribution continued: Suppose Y 2(k) with prob- ability density function given by
f (y; k) = 1 2k=2 (k=2)y
(k=2) 1e y=2 for y > 0:
(a) Show that this probability density function integrates to one for any k 2 f1; 2; : : : :g. (b) Show that the moment generating function of Y is given by
M (t) = E etY = (1 2t) k=2 for t < 1 2 and use this to show that E(Y ) = k and V ar(Y ) = 2k.
4.9. CHAPTER 4 PROBLEMS 141
(c) Plot the probability density function for k = 5, k = 10 and k = 25 on the same graph. What do you notice?
11. In an early study concerning survival time for patients diagnosed with Acquired Im- mune De…ciency Syndrome (AIDS), the survival times (i.e. times between diagnosis of AIDS and death) of 30 male patients were such that
30
P
i=1
yi = 11; 400 days. It is
known that survival times were approximately Exponentially distributed with mean days.
(a) Write down the likelihood function for and obtain the likelihood ratio statistic. Use this to obtain an approximate 90% con…dence interval for . (Note: You will need to determine this interval from a graph of the relative likelihood function or by using the function uniroot in R.)
(b) Show that m = ln 2 is the median survival time. Using the interval obtained in (a), give an approximate 90% con…dence interval for m.
12. Suppose Y v Exponential( ).
(a) Show that W = 2Y = has a 2(2) distribution. (Hint: compare the probability density function of W with (4.6).
(b) If Y1; : : : ; Yn is a random sample from the Exponential( ) distribution above,
prove that U = 2 n P i=1 Yi= 2(2n) :
(Use the results in Section 4.5.) U is therefore a pivotal quantity, and can be used to get con…dence intervals for .
(c) Refer to Problem 10. Using the fact that
P (43:19 W 79:08) = 0:90
where W s 2(60) obtain a 90% con…dence interval for based on U . Compare
this with the interval found in 10(a). Which interval is preferred here? Why?
13. Two hundred adults are chosen at random from a population and each is asked whether information about abortions should be included in high school public health sessions. Suppose that 70% say they should.
(a) Obtain an approximate 95% con…dence interval for the proportion of the pop- ulation who support abortion information being included using (4.5).
(b) Suppose you found out that the 200 persons interviewed consisted of 50 married couples and 100 other persons. The 50 couples were randomly selected, as were the other 100 persons. Discuss the validity (or non-validity) of the analysis in (a).
142 4. ESTIMATION
14. Consider the data on weights of adult males and females from Chapter 1. (The data are posted on the course webpage.)
(a) Determine whether is is reasonable to assume a Normal model for the female heights and a di¤erent Normal model for the male heights.
(b) Obtain a 95% con…dence interval for the mean for the females and males sepa- rately. Does there appear to be a di¤erence in the means for females and males? (We will see how to test this formally in Chapter 6.)
(c) Obtain a 95% con…dence interval for the standard deviation for the females and males separately. Does there appear to be a di¤erence in the standard deviations?
15. Company A leased photocopiers to the federal government, but at the end of their recent contract the government declined to renew the arrangement and decided to lease from a new vendor, Company B. One of the main reasons for this decision was a perception that the reliability of Company A’s machines was poor.
(a) Over the preceding year the monthly numbers of failures requiring a service call from Company A were
16 14 25 19 23 12
22 28 19 15 18 29
Assuming that the number of service calls needed in a one month period has a Poisson distribution with mean , obtain and graph the relative likelihood function R( ) based on the data above.
(b) In the …rst year using Company B’s photocopiers, the monthly numbers of service calls were
13 7 12 9 15 17
10 13 8 10 12 14
Under the same assumption as in part (a), obtain R( ) for these data and graph it on the same graph as used in (a). Do you think the government’s decision was a good one, as far as the reliability of the machines is concerned?
(c) Use the likelihood ratio statistic ( ) as an approximate pivotal quantity to obtain an approximate 95% con…dence intervals for for each company. (Note: the interval can be obtained from the graph of the relative likelihood function or by using the function uniroot in R.)
(d) What conditions would need to be satis…ed to make the assumptions and analysis in (a) to (c) valid? What approximations are involved?
4.9. CHAPTER 4 PROBLEMS 143
16. At the R.A.T. laboratory a large number of genetically engineered rats are raised for conducting research. Twelve rats are selected at random and fed a special diet. The weight gains (in grams) from birth to age 3 months of the rats fed this diet are:
55:3 54:8 65:9 60:7 59:4 62:0 62:1 58:7 64:5 62:3 67:6 61:1
Let yi= weight gain of the i0th rat, i = 1; : : : ; 12. For these data
12 P i=1 yi = 734:4 and 12 P i=1 (yi y)2= 162:12:
To analyze these data the model
Yi v N ; 2 = G ( ; ) ; i = 1; : : : ; 12
is assumed where and are unknown parameters.
(a) Explain clearly what the parameters and represent.
(b) Give the maximum likelihood estimates of and . (You do not need to derive them.) (c) Let S2 = 1 11 12 P i=1 Yi Y 2 :
State the distributions of the random variables
T = Y S=p12 and W = 1 2 12 P i=1 Yi Y 2:
(d) Find a such that
P ( a T a) = 0:95
and show clearly how this can be used to construct a 95% con…dence interval for : Construct a 95% con…dence interval for for the given data.
(e) Find b and c such that
P (W b) = 0:05 = P (W c) :
Show clearly how this can be used to construct a 95% con…dence interval for
2. Construct a 90% con…dence interval for 2 for the given data.
144 4. ESTIMATION
17. Sixteen packages are randomly selected from the production of a detergent packaging machine. Their weights (in grams) are as follows:
287 293 295 295 297 298 299 300
300 302 302 303 306 307 308 311
(a) Assuming that the weights are independent G( ; ) random variables, obtain 95% con…dence intervals for and .
(b) Let Y represent the weight of a future, independent, randomly selected package.
Since Y Y G 0;
q
1 +1n and (n 1) S2= 2 independently it follows that
Y Y
S q
1 +1n
t (n 1) :
Use this pivotal and the given data to obtain a 95% prediction interval for Y .
18. Radon is a colourless, odourless gas that is naturally released by rocks and soils and may concentrate in highly insulated houses. Because radon is slightly radioactive, there is some concern that it may be a health hazard. Radon detectors are sold to homeowners worried about this risk, but the detectors may be inaccurate. Univer- sity researchers placed 12 detectors in a chamber where they were exposed to 105 picocuries per liter of radon over 3 days. The readings given by the detectors were:
91:9 97:8 111:4 122:3 105:4 95:0 103:8 99:6 96:6 119:3 104:8 101:7
Let yi= reading for the i0th detector, i = 1; : : : ; 12. For these data
12 P i=1 yi= 1249:6 and 12 P i=1 (yi y)2= 971:43:
To analyze these data assume the model
Yi v N ; 2 = G ( ; ) ; i = 1; : : : ; 12 independently
where and are unknown parameters.
University researchers obtained a 13th radon detector. It is to be exposed to 105 picocuries per liter of radon over 3 days. Calculate a 95% prediction interval for the reading for this new radon detector.
19. A manufacturer wishes to determine the mean breaking strength (force) of a type of string to “within a pound”, which we interpret as requiring that the 95% con…dence interval for a should have length at most 2 pounds. If breaking strength Y of strings tested are G( ; ) and if 10 preliminary tests gave
10
P
i=1
(yi y)2 = 80, how
4.9. CHAPTER 4 PROBLEMS 145
20. A manufacturing process produces …bers of varying lengths. The length of a …ber Y is a continuous random variable with p.d.f.
f (y; ) = y2e y= ; y 0; > 0
where is an unknown parameter.
(a) Let y1; y2; : : : ; yn be the lengths of n …bers selected at random. Find the max-
imum likelihood estimate of based on these data. Be sure to show all your work.
(b) Suppose Y1; Y2; : : : ; Yn are independent and identically distributed random vari-
ables with p.d.f. f (y; ) given above. If E (Yi) = 2 and V ar (Yi) = 2 2 then
…nd E Y and V ar Y : (c) Justify the statement
P 1:96 Yp 2
2=n 1:96 !
t 0:95:
(d) Explain how you would use the statement in (c) to construct an approximate 95% con…dence interval for .
(e) Suppose n = 18 …bers were selected at random and the lengths were:
6:19 7:92 1:23 8:13 4:29 1:04 3:67 9:87 10:34 1:41 10:76 3:69 1:34 6:80 4:21 3:44 2:51 2:08
For these data
18
P
i=1
yi = 88:92. Find the maximum likelihood estimate of and
an approximate 95% con…dence interval for .
21. When we measure a quantity we are in e¤ect estimating the true value of the quantity; measurements of the same variate on di¤erent occasions are usually not equal. A chemist has two ways of measuring a particular quantity; one has more random error than the other. For method I, measurements X1; X2; : : : follow a Normal distribution
with mean and variance 21, whereas for method II, measurements Y1; Y2; : : :, have
a Normal distribution with mean and variance 22.
(a) Suppose that the chemist has n measurements X1; : : : ; Xn of a quantity by
method I and m measurements, Y1; : : : ; Ym by method II. Assuming that 21
and 22 are known, write down the combined likelihood function for , and show that
~ = w1X + w2Y w1+ w2
146 4. ESTIMATION
(b) Suppose that 1 = 1, 2 = 0:5 and n = m = 10. How would you rationalize
to a non-statistician why you were using the estimate (x + 4y) =5 instead of (x + y) =2?
(c) Determine the standard deviation of ~ and of (X + Y )=2 under the conditions of part (b). Why is ~ a better estimator?
22. Student’s t distribution: Suppose that Z and U are independent variates with
Z N (0; 1) and U 2(k) :
Consider the random variable
X pZ
U=k:
Its distribution is called the t (Student’s) distribution with k degrees of freedom, and we write X t (k). It can be shown by change of variables that X has probability density function f (x; k) = k+1 2 p k k2 1 + x2 k k+1 2 for x 2 < and k = 1; 2; : : :
The probability density function is symmetric about the origin, and is similar in shape to the probability density function of N (0; 1) random variable but has more probability in the tails.
(a) Plot the probability density function for k = 1 and k = 5: (c) Show that f (x; k) is unimodal for all x.
(d) Show that lim
k!1f (x; k) ! 1 p 2 exp 1 2x
2, which is the probability density func-
tion of the G(0; 1) distribution.
23. Challenge Problem: For “two-sided” intervals based on the t distribution, we usually pick the interval which is symmetrical about y. Show that this choice provides the shortest 100p% con…dence interval.
24. Challenge Problem: A sequence of random variables fXng is said to converge in
probability to the constant c if for all > 0,
lim
n!1P fjXn cj g = 0
We denote this by writing Xn p
! c.
(a) If fXng and fYng are two sequences of random variables with Xn p ! c1 and Yn p ! c2, show that Xn+ Yn p ! c1+ c2 and XnYn p ! c1c2.
4.9. CHAPTER 4 PROBLEMS 147
(b) Let X1; X2; be independent and identically distributed random variables with
probability density function f (x; ). A point estimator ~n based on a random
sample X1; : : : ; Xn is said to be consistent for if ~n p
! as n ! 1.
(i) Let X1; : : : ; Xn be independent and identically distributed Uniform(0; )
random variables. Show that ~n= max (X1; : : : ; Xn) is consistent for .
(ii) Let X Binomial(n; ). Show that ~n= X=n is consistent for .
25. Challenge Problem: Refer to the de…nition of consistency in Problem 24(b). Dif- …culties can arise when the number of parameters increases with the amount of data. Suppose that two independent measurements of blood sugar are taken on each of n individuals and consider the model
Xi1; Xi2 N ( i; 2) for i = 1; ; n
where Xi1 and Xi2 are the independent measurements. The variance 2 is to be
estimated, but the i’s are also unknown.
(a) Find the maximum likelihood estimator ~2 and show that it is not consistent. (b) Suggest an alternative way to estimate 2 by considering the di¤erences W
i =
Xi1 Xi2.
(c) What does represent physically if the measurements are taken very close to- gether in time?
26. Challenge Problem: Proof of Central Limit Theorem (Special Case) Suppose Y1; Y2; : : : are independent random variables with E(Yi) = ; V ar(Yi) = 2 and that
they have the same distribution, whose moment generating function exists.
(a) Show that (Yi )= has moment generating function of the form (1 + t 2
2+
terms in t3; t4; : : :) and thus that (Yi )=pn has moment generating function
of the form h
1 +2nt2 + o(n) i
, where o(n) signi…es a remainder term Rn with the
property that Rn=n ! 0 as n ! 1. (b) Let Zn= n P i=1 (Yi ) p n = pn(Y )
and note that its moment generating function is of the form h1 +2nt2 + o(n)in. Show that as n ! 1 this approaches the limit et2=2, which is the moment generating function for G(0; 1). (Hint: For any real number a, (1 + a=n)n! ea as n ! 1.)