Measuring Precision of Statistical Inference on Partially Identified Parameters

(1)

Working Paper No. 98

December 2008, (Revised, January 2012)

www.carloalberto.org

THE CARLO ALBERTO NOTEBOOKS

Measuring Precision of Statistical Inference

on Partially Identified Parameters

(2)

Measuring Precision of Statistical Inference on Partially

Identi…ed Parameters

Aleksey Tetenovy

January 31, 2012

Abstract

Planners of surveys and experiments that partially identify parameters of interest face trade o¤s between using limited resources to reduce sampling error and using them to reduce the extent of partial identi…cation. I evaluate these trade o¤s in a simple statistical problem with normally distributed sample data and interval partial identi…cation using di¤erent frequentist measures of inference precision (length of con…dence intervals, minimax mean sqaured error and mean absolute deviation, minimax regret for treatment choice) and analogous Bayes measures with a ‡at prior. The relative value of collecting data with better identi…cation properties (e.g., increasing response rates in surveys) depends crucially on the choice of the measure of precision. When the extent of partial identi…cation is signi…cant in comparison to sampling error, the length of con…dence intervals, which has been used most often, assigns the lowest value to improving identi…cation among the measures considered.

Keywords: statistical treatment choice; survey planning; nonresponse; mean squared error; mean absolute deviation; minimax regret

JEL classi…cation: C21; C44; C83

I am grateful to Chuck Manski, Elie Tamer, Keisuke Hirano and Luca Anderlini for their helpful comments. I have also bene…tted from the opportunity to present this work at Collegio Carlo Alberto, LAMES 2008, ICEEE 2009, UCL, Cambridge and Toulouse.

(3)

1 Introduction

Many types of statistical data only partially identify parameters of interest as simple as popula-tion means, meaning that they cannot be estimated with arbitrary precision simply by increasing the sample size. Statisticians designing surveys and experiments which generate such data could use limited resources either to reduce the extent of partial identi…cation or to reduce sampling error. The former can be accomplished, for example, by putting more e¤ort into pursuing sam-pled population members who did not respond to a survey. The latter by increasing sample size. To inform these choices, I attempt here to evaluate the relative e¤ects of both margins of planning on the precision of inference, which the planner could then compare to their relative costs.

The problem was …rst considered in the Cochran-Mosteller-Tukey report on the Kinsey study published in 1954. Concerned with selective non-response to the study’s questions, they advocated a conservative approach to inference that sets limits on population parameters by allowing for any values of the variable in the part of the population that was not sampled or refused to respond. A variety of applications of this approach, now known as partial identi…ca-tion, has been developed by Manski (1995, 2007a) and other researchers. CMT calculated for di¤erent sample sizes and refusal rates the relative e¤ects of reducing non-response or increas-ing the sample size on the precision of inference about the population means. They judged the precision of inference by the length of a 95% con…dence interval for the identi…ed interval. The same measure of precision has been used to illustrate the e¤ects of missing data on the precision of inference by Horowitz and Manski (1998).

Length of a con…dence interval for the identi…ed interval is not the only measure of preci-sion. In this paper I show that other reasonable measures yield qualitatively di¤erent conclusions about the relative merits of reducing sampling error and reducing the extent of partial identi…-cation. First, I consider the minimax mean squared error (MSE) and minimax mean absolute deviation (MAD) of point estimates around the true value of the parameter, which have been widely used to measure the precision of estimators for point identi…ed parameters. In addition to the minimax measures, I also consider the average risk for these loss functions with a ‡at prior.

Another measure considered in this paper is the minimax regret of statistical treatment rules for choosing between two alternative policies (or treatments) when the parameter of interest

(4)

is the di¤erence in average returns of the two treatments. Regret is the average welfare loss incurred from choosing an inferior treatment for the population based on the observed statistical data. In recent years, econometricians started studying statistical treatment rules that minimize maximum regret both when the average treatment e¤ect of interest is point identi…ed (Manski 2004, 2005; Hirano and Porter 2009; Stoye 2009; Schlag 2007; Manski and Tetenov 2007) and when it is partially identi…ed (Manski 2007a, 2007b, 2008, 2009; Stoye 2007, 2012). I also consider the average welfare loss with a ‡at prior.

I apply these measures of precision to the following partial identi…cation problem. Let the real-valued parameter of interest = O+ U be the sum of a point identi…ed component O and a partially identi…ed component U. For the point identi…ed component O, the statistician observes an unbiased normally distributed estimate with known standard error . The partially identi…ed component U is only known to lie in a given bounded interval of length 2P . When P= is relatively large (i.e., partial identi…cation is a signi…cant issue), all of the considered measures of precision put a higher value on reducing the extent of partial identi…cation than the con…dence interval measure.

The problem is deliberately simpli…ed to demonstrate in an analytically tractable setting the qualitative di¤erences between the conclusions about the relative bene…ts of reducing sam-pling error vs. narrowing the identi…ed interval based on alternative measures of precision. I do not develop here a formal asymptotic argument extending the solution to more realistic data generating processes. Song (2010) formally establishes that midpoint of an estimated identi-…ed interval is asymptotically minimax for absolute and square loss functions by considering sequences of problems with P ! 0, ! 0 and P= converging to a constant. An extension of Song’s analysis may also be applicable to the problem considered here.

The paper proceeds as follows. Section 2 describes the statistical problem and considers the length of con…dence intervals as a measure of precision. In section 3 I derive minimax estimators of under square loss, absolute loss and regret loss for treatment choice, and consider their minimax losses as measures of precision. Section 4 considers instead the average risk with an improper ‡at prior on ( ; P ). Section 5 o¤ers a numerical illustration applying the results to survey non-response. Section 6 concludes and the appendix collects proofs of the propositions.

(5)

2 Statistical Setting

I will consider the following partial identi…cation problem. The parameter of interest to the statistician is = O+ U. O 2 R is a point identi…ed (observable) component, for which the statistician could obtain an unbiased normally distributed estimate X _N O; 2 with standard error . U is a partially identi…ed (unobservable) component, which is only known to lie in a bounded interval U 2 [ P; P ] of length 2P (setting P to be the half-length simpli…es notation throughout the paper). The restriction that U lies in a symmetric interval around zero is without loss of generality.

Survey non-response is a leading example to keep in mind. Suppose we’re interested in the population mean of Y and survey N individuals. Let r be the proportion of respondents (denote them by D = 1), who may have a di¤erent distribution of Y than non-respondents, then = EY is identi…ed up an interval

[rE (Y jD = 1) + (1 r) YL; rE (Y jD = 1) + (1 r) YH] , (1)

where YL and YH are the bounds on feasible values of Y (Manski 1995, 2007a). The length of the identi…ed interval is 2P = (1 r) (YH YL). It could be reduced at a cost (for example, by driving up in person to a household that does not respond to phone calls or by o¤ering stronger incentives to respondents). The midpoint O = rE (Y jD = 1) + (1 r) (YH YL) =2 of the identi…ed interval could be estimated with standard error proportional to N 1=2_{, which could} be reduced by sampling more households. The statistical setup with normally distributed X is not an exact representation of this problem, but is analytically tractable and more informative than solving the "correct" problem computationally

In this setting the pair ( ; P ) describes the experimental design parameters. The main question of this chapter is how do these design parameters a¤ect the precision of inference on that the statistician could carry out based on the results of the experiment (observation of X). Formally, let the function M ( ; P ) 0 be a particular measure of maximum precision with which the statistician can carry out inference on based on the data from an experiment with design parameters ( ; P ). Lower values of M ( ; P ) will correspond to more precise inference and M ( ; P ) = 0 will correspond to perfect precision. Let a di¤erentiable function b ( ) 0; b0 < 0 denote the economic bene…t of inference with a given level of precision and let a di¤erentiable function c ( ; P ) ; c < 0; cP < 0 denote the costs of conducting an experiment with design

(6)

parameters and P . Then the statistical planning problem is to maximize the net bene…t of the experiment

max

;P [b (M ( ; P )) c ( ; P )] . (2) If M is di¤erentiable with partial derivatives M > 0 and MP > 0, a necessary condition for a pair ( ; P ) with > 0 and P > 0 to be a solution to the planning problem is that

M ( ; P ) MP( ; P )

= c ( ; P ) cP ( ; P )

. (3)

If these ratios are unequal, then it is possible to adjust and P in a way that improves precision without increasing costs. I will evaluate a few functions M ( ; P ) based on di¤erent criteria of precision and derive the M =MP ratios for them. Survey and experiment planners could compare these ratios to the marginal cost ratio c =cP and see whether a proposed allocation of resources maximizes the precision of inference for a given budget. These conclusions could be made without specifying the bene…t function b ( ). Knowledge of b ( ) is required, however, to determine the optimal size of the budget.

2.1 Length of Con…dence Intervals

First, let’s consider using the length of a 1 level con…dence interval for the identi…ed interval as the measure of precision. In this model, the identi…ed set for is

2 [ O P; O+ P ] . (4)

Given that the experimental outcome X is normally distributed with mean O and standard error , the con…dence interval

X P 1(1 =2) ; X + P + 1(1 =2) (5)

contains the identi…ed set (4) exactly with probability 1 . denotes the standard normal CDF. The precision of inference from an experiment with parameters ( ; P ), as measured by the length of a 1 con…dence interval then equals

(7)

The marginal e¤ects of changes in and P (partial derivatives of MCI( )) equal MCS( )= 2 1(1 =2) and M_PCS( ) = 2. The ratio of these marginal e¤ects equals

MCS( ) M_PCS( )

= 1(1 =2) . (7)

Thus, if the length of conventional 95% con…dence intervals is used as a measure of precision, then a reduction of the standard error by " always brings the same improvement as a re-duction of the half-length P of the identi…ed interval by 1:96". The evaluation of the relative e¤ects of reducing the sampling error and the extent of partial identi…cation depends on the chosen con…dence level 1 . Thus, using 99% con…dence level instead of 95% would imply a relatively higher value of reducing the standard error instead of reducing the extent of partial identi…cation.

Imbens and Manski (2004) proposed an alternative type of con…dence interval which covers each point in the identi…ed set with probability 1 , but may cover the whole identi…ed set with a smaller probability (see Stoye 2008 for more details). In the present problem, the shortest Imbens-Manski con…dence interval is X MCP ( )( ; P ) ; X + MCP ( )( ; P ) where MCP ( )( ; P ) > 0 is the solution to

P + MCP ( )( ; P )! P MCP ( )( ; P )!

= 1 . (8)

The ratio of its partial derivatives then equals

MCP ( ) M_PCP ( ) = d d P +M P M d dP P +M P M = P +M P +M ₊P M P M P +M P M . (9)

As Figure 1 illustrates, this ratio is close MCS(2 )=M_PCS(2 ) for small values of =P .

3 Minimax Measures of Precision

3.1 Absolute and Square Loss

Suppose that the statistician is asked to provide a point estimate of instead of an interval. Let the estimator ^ (X) be a function mapping observed experimental outcome X to a point estimate of . There is a long tradition in statistics of measuring the precision of point estimators

(8)

by their expected loss

EXL ^ (X) ; (10)

where the expectation is taken with respect to the distribution of X for …xed values of O and U. Expected loss di¤ers across values of O and U, its maximum value over the parameter space _{= f} O2 R; U 2 [ P; P ]g:

ML( ; P ) sup

O; U

EXL ^ (X) ( O+ U) (11)

could be used as a conservative measure of the precision of ^ (X). Since ^ (X) is optimal in the sense of minimizing (11), ML_{( ; P ) is also a measure of precision of the experimental data} itself.

Proposition 2 shows that a simple estimator ^ (X) = X minimizes maximum expected loss (11) for a broad class of symmetric convex loss functions. This class includes square loss and absolute loss, for which I derive more speci…c results later. Formally, suppose that the loss function L satis…es the following conditions:

Condition 1 (a) L is symmetric, L (t) = L ( t), (b) L is convex,

(c) L (0) = 0,

(d) L (t) > 0 for some t > 0,

(e) L (t) q exp (rt) for all t 0 and some constants q > 0; r > 0.

Then (a)-(d) imply that L is continuous, non-negative, and non-decreasing on [0; +1), while (e) ensures that expected loss is …nite with normally distributed X.

Proposition 2 If loss function L satis…es Condition 1, O 2 R, U 2 [ P; P ], and X N O; 2 , then the estimator ^ (X) = X minimizes maximum expected loss (11), which equals ML( ; P ) = 8 > < > : R₊₁ 1 L (t) 1 t P _{dt for} _{> 0,} L (P ) for = 0. (12)

For square and absolute loss functions it is possible to evaluate (12) and its partial derivatives analytically. In case of square loss L (t) = t2; the maximum mean squared error of ^ equals

MM SE( ; P ) = Z ₊₁ 1 t21 t P dt = Z ₊₁ 1 (s + P )2 (s) ds = 2+ P2: (13)

(9)

The marginal e¤ects of changes in and P on the minimax MSE equal MM SE = 2 and M_PM SE = 2P . The ratio of these marginal e¤ects equals

MM SE MM SE

P =

P. (14)

This ratio shows that using MM SE as a measure of precision yields qualitatively di¤erent conclusions about the optimal choices of and P than using MCS( ), which does not depend on =P . Whenever =P < 1₍₁ _{=2), M}M SE_=MM SE

P < M CI( )

=M_PCI( ) and the minimax MSE measure of precision implies that further reducing sampling error is not as important as the length of con…dence interval measure would suggest. In any experiment or survey with the standard error smaller than the length of the identi…ed interval ( < 1:96P ) a planner using the maximum MSE measure of precision would allocate more resources to reducing the extent of partial identi…cation than a planner measuring precision by the length of a 95% con…dence interval. The di¤erence between the "marginal rates of substitution" produced by the two methods could be particularly striking when considering large sample surveys in which the extent of partial identi…cation could greatly exceed sampling error in magnitude.

For the absolute loss function L (t) = jtj, the minimax mean absolute deviation (MAD) of ^ equals MM AD( ; P ) = Z ₊₁ 1 jtj 1 t P dt = Z ₊₁ 1 js + P j (s) ds = = Z ₊₁ P= (s + P ) (s) ds Z P= 1 (s + P ) (s) ds = = (P= ) + P (P= ) + (P= ) P ( P= ) = = 2 (P= ) + P [ (P= ) ( P= )] (15)

since R s (s) @s = (s).The marginal e¤ects of changes in and P on the minimax MAD are MM AD _{= 2 (P= ) and M}M AD

P = (P= ) ( P= ). The ratio of these marginal e¤ects equals MM AD( ; P ) MM AD P ( ; P ) = 2 (P= ) (P= ) ( P= ). (16)

This is a continuous decreasing function of P= , which goes to in…nity as P= ! 0 and to zero as P= ! 1.

(10)

mea-sure of precision implies greater importance of reducing the scope of partial identi…cation than does the con…dence interval measure. For conventional 95% con…dence intervals, calculations show that MM AD=M_PM AD < MCS(:05)=M_PCS(:05) whenever < 2:11P . MAD and MSE mea-sures yield similar conclusions about the relative bene…ts of reducing and P for small values of P= , since (16) _{=P when P= ! 0.}

3.2 Regret Loss in Treatment Choice Problems

The next measure of precision - minimax regret - is motivated by considering the economic loss resulting from incorrect inference about when it is the di¤erence in average returns of two alternative policy decisions and the aim of inference is to choose which policy to implement.

Let = r2 r1, where r1 is the average return from implementing the …rst policy and r2 the average return from implementing the second policy. Then the economic loss from choosing the second policy when, in fact, r1 > r2 ( < 0) equals r1 r2 = . The economic loss from choosing to implement the …rst policy when, in fact, r1 < r2 ( > 0) equals r2 r1 = . If policy choice is denoted by a = 1 for the second policy and a = 0 for the …rst (the choice could also be randomized with a 2 (0; 1)), then the regret loss function is

L (a; ) = 8 > < > : [1 a] if > 0, a if 0, = (I [ > 0] a) (17)

The method by which the decision maker chooses which policy to implement based on experimental data could be summarized by a statistical treatment rule (X), which is a function mapping feasible realizations of X 2 R into actions in the [0; 1] interval. The regret of statistical treatment rule is the average (over the distribution of X) regret loss incurred by the decision maker using rule . It is a function of O and U, and in this problem equals

R ( ; ( O; U)) 8 > < > : [1 E _O (X)] if > 0, E O (X) if 0, (18)

where E _O (X) denotes the average value of (X). For < 0, E _O (X) is the probability of mistakenly choosing the inferior second policy, while for > 0, the probability of error is 1 E _O (X).

(11)

Minimizing maximum regret is a criterion suggested by Savage (1951) as a clari…cation of Wald’s minimax principle (1950). Regret is a natural reparametrization for loss functions that are not minimized at zero by any action. A number of recent papers in Econometrics, starting with Manski (2004), applied the criterion to treatment choice problem. Similar loss function could also be motivated by the problem of eliciting valuation of public projects from a sample of individuals (an "economic jury") for the purpose of deciding whether they’re e¢ cient (McFadden 2012). Selective non-participation in such juries leads to partial identi…cation of the valuation of the project and the analysis in this paper applies to the trade o¤ between increasing participation and increasing sample size in such juries.

The following Proposition derives statistical treatment rules that minimize maximum regret for given experimental parameters ( ; P ) and their minimax regret.

Proposition 3 a) For > 2P (0), the unique minimax regret statistical treatment rule is (X) _{I jX > 0j. Its maximum regret equals}

MM M R( ; P ) = max h>0 h

P h

, (19)

which is greater than P=2 and is a strictly increasing function of for any given P . b) For 2P (0), statistical treatment rules

M ( ;P )(X) 8 > < > : I jX > 0j if = 2P (0) , X= q (2P (0))2 2 _if _{< 2P (0) ,} (20)

minimize maximum regret, which equals MM M R( ; P ) = P=2.

The results of Proposition 3 are qualitatively similar to those obtained by Stoye (2012), who studied minimax regret statistical treatment rules based on binary outcome data from an experiment with randomized treatment assignment in which the outcomes are missing with some probability.

First, when the extent of partial identi…cation (in Stoye’s problem, the maximum feasible proportion of missing outcomes) is below some threshold relative to the sampling error, the minimax regret statistical treatment rule is the same as it would be with point identi…cation. In Proposition 3 the same result holds, the minimax regret statistical treatment rule is identical for all values of P = (2 (0)), including the point identi…ed case P = 0.

(12)

The second qualitative similarity is that maximum regret of the minimax regret statistical treatment rule becomes constant with respect to the sampling error once the sampling error falls below some threshold relative to the extent of partial identi…cation. Thus, reducing the sampling error below that threshold (reducing in this chapter, increasing sample size in Stoye’s) could not further reduce minimax regret.

Statistical treatment rule in part (b) of Proposition 3 may not be the only one that minimizes maximum regret, but deriving one is su¢ cient to make conclusions about the minimax regret value, and thus about the precision of inference from the data for treatment choice.

Precision of inference generated by an experiment with parameters ( ; P ), as measured by minimax regret for treatment choice, is

MM M R( ; P ) = 8 > < > : max h>0 h P h _if _{> 2P (0) ,} P 2 if 2P (0) , (21)

This measure of precision could yield the most drastic conclusions about the relative bene…ts of reducing the extent of partial identi…cation since MM M R= 0 for =P 2 (0) 0:8, implying that reducing the extent of partial identi…cation is not only important, but is the only way to reduce minimax regret and improve the inferential precision of experimental or survey data for treatment choice.

4 Average Risk Measures

One of the concerns with using minimax measures of risk is that they overemphasize extreme parameter values. For all the measures considered above, the risk is maximized at U = P , which may overemphasize the relative bene…ts of reducing partial identi…cation. For comparison, I consider in this section measures of precision based on average risk (for the same loss functions) with a Uniform[ P; P ] prior on U and an independent improper ‡at prior on O (Lebesgue measure on R). Bayesian inference in partially identi…ed models is more sensitive to the choice of prior (since some of its features are not "diluted" by any amount of sample data) and the uniform prior is considered here for its conventionality (Hirano and Porter (2009) considered average risk with a ‡at prior for regret loss in treatment choice problems with point identi…ed ). The results would clearly be very di¤erent, for example, with a prior that places point mass on U = 0 (implying that there is no partial identi…cation problem) or on O = 0 (implying

(13)

that there is no bene…t to sampling). While the prior is improper, identical results could be obtained by considering a sequence of proper N 0; 2 priors on O with ! 1.

With the prior on O placing uniform measure on the real line and X N O; 2 , the posterior measure is a proper normal distribution OjX N X; 2 . The posterior measure on U remains Uniform[ P; P ]. The posterior MSE and MAD are minimized by ^B = X, which is both the mean and the median of the posterior distribution of . The posterior mean squared error equals Z P P Z R ( O+ U X)2 O X d O 1 2Pd U = Z P P Z R (tO+ U)2 tO dtO 1 2Pd U = = Z P P 2₊ 2 U 1 2Pd U = 2₊P2 3 . (22)

Since it is constant in X, the average MSE (with respect to the prior) also equals MAM SE( ; P ) = 2_{+ P}2_{=3, and the ratio of its partial derivatives is}

MAM SE MAM SE

P

= 3

P. (23)

For regret loss (17), it is optimal to adopt the second policy if the posterior mean is positive, which happens if X > 0, hence (X) _{I jX > 0j. The average regret with respect to the prior} then equals MAR = Z R Z P P ( O+ U) (I [ O+ U > 0] E O (X)) 1 2Pd Ud O = Z R Z P P ( O+ U) I [ O+ U > 0] O 1 2Pd Ud O. (24)

The partial derivatives of MAR equal

MAR= and M_PAR= P=3 (25)

(see Appendix), hence their ratio MAR=M_PAR = 3 =P is the same as for average square loss, while the two loss functions yield starkly di¤erent results when minimax measures are consid-ered. Predictably, the average regret loss always implies that reducing P and always has some positive value, while the minimax regret value didn’t vary with at all for small values of =P .

(14)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 1 2 3 4 5 6 Frequentist σ /P Mσ /M P

95% C.I. for the interval 95% Imbens-Manski C.I. Maximum MSE Maximum MAD Minimax regret 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 1 2 3 4 5 6

Bayes (flat prior)

σ /P

95% HPD interval

Average MSE and average regret Average MAD

Figure 1: M =MP as a function of =P under di¤erent measures of precision.

Figure 1 summarizes the …ndings, displaying the ratios M =MP as a function of =P for di¤erent measures of precision. The left pane shows the ratios when the length of frequentist con…dence intervals, minimax MSE, MAD and regret loss are used as measures of precision. The right pane displays the same ratios for Bayesian precision measures with a ‡at prior. The ratios for measures based on the length of the 95% Highest Posterior Density interval and absolute loss are computed numerically. The graph shows the extent to which conclusions depend on the measure of precision used. The disagreement in conclusions for frequentist measures could be in…nite when partial identi…cation is relatively important ( =P is small), when the length of the con…dence interval responds to changes in more than any other measure, while minimax regret does not change in at all. While all the measures based on the ‡at prior converge for large =P is large, when =P is smaller they also display substantial disagreement (up to 3.8 times between 95% HPD interval length and average absolute loss). The following section illustrates what range of values =P may be relevant in practice for problems of survey non-response.

5 Illustration: Survey Non-Response

The statistical setup of the previous sections is purposefully simpli…ed to get clean analytical results. Here we will consider it as an informal approximation to the problem of survey non-response. Suppose that a survey samples N individuals, has response rate r and the variance

(15)

Response rate r 90% 90% 90% 90% 80% 80% 80% Total sample size N 100 500 2000 10000 500 2000 10000 =P 1.054 0.471 0.236 0.105 0.25 0.125 0.056 Measures of precision:

Length of 95% con…dence interval 10.7 22.6 44.3 97.8 21.4 41.8 92.3 Length of 95% Imbens-Manski C.I. 12.3 26.8 52.6 116.4 25.3 49.6 109.8

Minimax MSE 19 91 361 1801 161 641 3201

Minimax MAD 25.5 488.4 > 105 > 105 > 105 > 105 > 105 Minimax regret (treatment choice) 119.8 ₊₁ ₊₁ ₊₁ ₊₁ ₊₁ ₊₁ Measures with ‡at prior:

Length of 95% HPD interval 6.6 23.1 65.8 244.4 30.6 91.4 439.1 Average MSE or average regret 7 31 121 601 54.3 214.3 1067

Average MAD 7.3 37.8 171.1 890.4 76 316 1596

Table 1: Value of "converting" one non-respondent relative to an additional observation from the same subpopulation of respondents according to di¤erent measures of precision.

of the measured outcome among respondents is 2

0. Let O stand for the mean outcome among respondents, which could be estimated with variance 2 2

0= (rN ) : Let U be the mean outcome among non-respondents, which is partially identi…ed up to an interval of length 2P = (1 r) (YH YL), where YH and YL are the highest and lowest feasible outcome values. Then

P YH YL 2 0

p

rN (1 r) . (26)

If outcomes of interest are binary, then YH YL= 1 and if the mean among respondents equals O = 1=2, then 0 = 1=2 (it isn’t very di¤erent for a large range of values of O). Then (YH YL) = (2 0) = 1 and P=

p

rN (1 r).

To make the di¤erences in M =MP under various criteria more concrete, they could be translated into the ratio of values of two alternative marginal changes to the sample. One option is increase the response rate by 1=N , that is, to "convert" one non-respondent to a respondent, which would reduce by _dNd 0=

p

rN = 0= 2N p

rN and reduce P by P (YH YL) = (2N ). An alternative option is to increase the overall sample size by 1=r, thus adding one more respondent, which would reduce by but leave P unchanged. The value of one non-respondent observation is then

M + P MP M = 1 + P MP M = 1 + p rNYH YL 0 MP M (27)

times higher than the value of one from the existing population of respondents. For binary outcomes this ratio simpli…es to 1 + 2prN MP=M .

(16)

Table 1 shows the value of this ratio under di¤erent measures of precision for a range of common sample sizes and response rates. Response rates of 80%-90% are common in major surveys like the NLSY, the HRS and the CPS. While their national sample sizes are greater than those in Table 1, the sample sizes for subpopulations in which researchers may be interested are smaller. The relative value of non-respondents rises with sample size regardless of the measure of precision used, because P is proportional to N 1_{, while} _{is proportional to N} 3=2_{, hence} optimal response rates should be increasing with sample size. The value of non-respondents also rises with sample size and with the non-response rate because of the changes in MP=M under all measures of precision except for frequentist con…dence interval lengths.

6 Conclusion

I have compared what di¤erent measures of inferential precision for partially identi…ed parame-ters about optimal economic trade o¤ between reducing sampling error and reducing the extent of partial identi…cation in the data when the researcher could control both (for example, when choosing the size of a study and the level of e¤ort to reduce non-response). The length of con…-dence intervals for the identi…ed interval is the most apparent measure of precision, but it turns out to be an outlier. When the extent of partial identi…cation is relatively important, all other measures of precision considered here (mean sqaured error, mean absolute deviation, regret loss for treatment choice) would lead the researcher to reallocate the budget more strongly towards reducing the identi…cation problem at the expense of sampling error.

The statistical problem with a normal sampling distribution considered in the paper is simple in comparison to many practical problems. However, it is su¢ ciently rich to capture some of the main features of partial identi…cation problems and to concisely illustrate how choosing di¤erent criteria for measuring the precision of inference qualitatively impacts the conclusions about the relative value of reducing the extent of partial identi…cation and reducing sampling error. The results could serve both as a rough practical approximation for problems with similar structure and as a useful indicator of potential …ndings for future research that considers partial identi…cation problems in greater generality.

(17)

7 Appendix: Proofs

Proof of Proposition 2

The proof relies on a well-known result (e.g., Berger 1985, p. 350, Theorem 18) that deci-sion rule _{is minimax if there exists a sequence of proper priors f} kg such that R ( ; )

lim

k!1r ( k) < 1, where r ( k) is the Bayes risk of Bayes rule k with respect to prior k. I will consider a sequence of such priors with k( O) = N 0; k2 , k( U) = :5I [j Uj = P ], and

O? U.

Since X _N O; 2 , the posterior distributions of O and U conditional on X are inde-pendent and equal

k( OjX) = N ckX; ck 2 ; and k( UjX) = :5I [j Uj = P ] ,

where ck = k2= k2+ 2 ; lim

k!1ck = 1. Since the loss function L is convex and symmetric, the posterior Bayes estimator of is ^k(X) = ckX. Conditional on X, the variable y = ckX O has a N 0; ck 2 distribution. The posterior risk of ^k(X), then, equals

Z L (ckX ( O+ U)) d k( O; UjX) = = Z ₁ 2L (ckX O P ) + 1 2L (ckX O+ P ) d k( OjX) = = Z 1 2L (y P ) + 1 2L (y + P ) d k(yjX) = Z L (y + P ) d k(yjX) = = Z ₊₁ 1 L (y + P ) _p1 ck y p_c k dy = Z ₊₁ 1 L (t) _p1 ck t P p_c k dt

The third equality holds because L and k(yjX) are symmetric. Condition 1(e) ensures that this and other improper integrals in the proof are …nite. Since the posterior risk is constant in X, it is equal to the Bayes risk with prior k:

r ( k) = Z ₊₁ 1 L (t) _p1 ck t P p_c k dt. Functions L (t) pck 1 (t P ) = pck converge pointwise in t to L (t) 1 ((t P ) = )

(18)

as k ! 1. Due to Condition 1(e), Lebesgue dominated convergence theorem applies and lim k!1r ( k) = Z ₊₁ 1 L (t) 1 t P dt.

It remains to verify that the maximum risk of ^ over ( O; U) is equal to this limit. The risk equals R ^ ; ( O; U) = EXL (X ( O+ U)) = = Z ₊₁ 1 L (x O U) 1 x O dx = = Z ₊₁ 1 L (y U) 1 y dy = (28) = Z ₊₁ 0 [L (y U) + L (y + U)] 1 y dy;

where the last equality is due to symmetry of L and . The sum L (y U) + L (y + U) is non-decreasing in U for U > 0 due to convexity of L, hence the risk is maximized at j Uj = P for any value of O. Substituting U = P and t = y + P in (28) yields

R ^ ; ( O; U)

Z ₊₁

1

L (t) 1 t P dt = lim

k!1r ( k) ,

thus ^ (X) = X is a minimax estimator of under loss function L with maximum expected loss equal toR+1₁ L (t) 1 t P dt.

Proof of Proposition 3(a)

The proof of part (a) relies on a well known result (e.g., Berger 1985, p. 350, Theorem 17) that decision rule is minimax if it is Bayes with respect to some proper prior and for all ( O; U) 2

R ( ; ( O; U)) r ( ) = Z

R ( ; ( O; U)) @ ( O; U) . (29)

This result applies as well when R denotes regret, then is a minimax-regret rule.

Decision rule (X) = I [X > 0] is Bayes with respect to any symmetric two-point prior distribution with ( _O; _U) = 1=2 and ( _O; _U) = 1=2, if _O > 0 and _O+ _U > 0. We will …rst …nd values of ( O; U) that maximize R ( ; ( O; U)), then verify that is Bayes with respect to a two-point prior using these values, and then verify that equation (29) holds.

(19)

of O it is maximized at U = P , since E O (X) doesn’t depend on U. Since E O (X) =

1 ( O= ), maximum regret of over 0 then equals (with the substitution h = O+ P )

max O+ U 0 R ( ; ( O; U)) = max O P ( O+ P ) O = max h>0 h P h .

The maximum is attained at

O= arg max h>0 h

P h

P .

Similarly, maximum regret over < 0 also equals max

h>0[h ((P h) = )] and it is attained at O= O and U = P .

The function h P h is continuous in h with derivative

d dh h

P h

= P h h P h = P h 1 h ((P h) = )

((P h) = ) .

At h = 0, _dhd h P h = P > 0. The function (y)_(y) is positive and decreasing in y, hence h ((P h)= )

((P h)= ) is positive and increasing in h for h 0 with lim_h!1

h ((P h)= )

((P h)= ) = +1, therefore h P h has a unique maximum over h > 0 at some point h .

The derivative of h P h at h = P equals (0) P (0) = 1₂ P (0). By assumption > 2P (0), hence it is positive, h > P and _O= h P > 0.

Since _O > 0 and _O+ P > 0, is a Bayes rule with respect to prior with ( _O; P ) = 1=2, ( _O; P ) = 1=2 and r ( ) = 1 2R ( ; ( O; P )) + 1 2R ( ; ( O; P )) = maxO; U R ( ; ( O; U)) ,

minimizes maximum regret. Furthermore, since is a unique Bayes rule up to randomization at X = 0, which does not a¤ect R ( ; ( O; U)) for any values of ( O; U), it is admissible.

Maximum regret of exceeds P₂ because h P h is not maximized at h = P , hence

h P h > P P P = P 2. To verify that minimax regret max

h>0 h

(20)

observe that since h > P , max h>0 h P h = max h>P h P h .

For any h > P , h P h is strictly decreasing in , thus max h>P h

P h _{is also strictly} decreasing in .

Proof of Proposition 3(b)

Maximum regret of any decision rule is at least P=2 because

max O; U R ( ; ( O; U)) max (R ( ; (0; P )) ; R ( ; (0; P ))) = = max (P (1 E _O=0 (X)) ; P E O=0 (X)) P 2:

I will …rst show that any rule for which E _O (X) lies within the bounds

E _O (X) 1 _{2(P +}P O) for O P 2, and E _O (X) _2(PP O) for O P 2 (30)

has maximum regret of P=2 (hence minimizes maximum regret), then show that M ( ;P )(X) satis…es this condition.

For all ( O; U) such that 0, R ( ; ( O; U)) = ( O+ U) (1 E O (X)) is increasing in

U, so

R ( ; ( O; U)) R ( ; ( O; P )) = ( O+ P ) (1 E O (X)) .

If O P=2, the lower bound in (30) implies that

( O+ P ) (1 E O (X)) ( O+ P ) P 2 (P + O) = P 2. If O 2 [ P; P=2], then since (1 E O (X)) 1, ( O+ P ) (1 E O (X)) O+ P P 2.

(21)

Second, I will show that any decision rule with E _O (X) = q ( O), where

q ( O) O

2P (0)

satis…es inequalities (30). The proof veri…es this for O 0, it is symmetric for O< 0.

When O = 0, q (0) = (0) = 1=2, which coincides with both bounds. For O 2 [0; P=2], q satis…es the upper bound (30) because

O 2P (0) 1 2+ O 2P (0) (0) = P + O 2P P 2 (P O) .

The …rst inequality follows from using (0) as an upper bound for the derivative of on [0; O

2P (0)]. The second one follows from (P + O) (P O) = P2 2

O P2.

The proof that q satis…es the lower bound is split into two cases: O2 [0; P ] and O P . For O 2 [0; P ], I will prove that q has a higher derivative than the bound 1 _{2(P +}P _O₎, which implies the needed inequality since both are equal at O = 0. The derivative of q is

d d O q ( O) = 1 2P (0) O 2P (0) = 1 2P exp 4 O P 2!

Since the function exp (y) is convex with exp (0) = 1 and exp (1) < 3, exp (y) _{1 + 2y for y 2} [0; 1], therefore exp (y) _{1 2y}1 _{for y 2 [ 1; 0]. Since =4 < 1 and (} O=P )2 1, ₄( O=P )2 2 [ 1; 0], therefore 1 2P exp 4 O P 2! 1 2P 1 1 + ₂ ( O=P )2 = = P 2P2₊ 2 O P 2P2_{+ 4P} O+ 2 2O = P 2 (P + O)2 .

The second inequality uses the fact that O 4P and 2 2O 0. On the other hand,

d d O 1 P 2 (P + O) = P 2 (P + O)2 ,

hence q grows faster than the bound.

For O P , I will use two inequalities. First, for the normal distribution (t) > 1 (t)_t for t > 0 (cf. Feller 1968, Chapter VII, Lemma 2). Second, for y 0, exp (y) 1 + y, thus for

(22)

y 0, exp (y) _{1 y}1 . q ( O) = O 2P (0) > 1 2P (0) O O 2P (0) = = 1 P O exp 4 O P 2! > 1 P O 1 1 +₄ ( O=P )2 = = 1 P O+ 2 4 O( O=P ) 2 > 1 P 2 (P + O) .

The last inequality relies on observations that O> 2P , 2=4 > 2, and ( O=P )2 1.

It remains to show that the decision rule M ( ;P ) satis…es E O M ( ;P )(X) = q , hence has

maximum regret P=2.

For = 2P (0), M ( ;P )(X) = I jX > 0j, thus E O M ( ;P )(X) =

O

2P (0) = q ( O). For < 2P (0), it is simplest to derive M ( ;P )(X) using the following construction. De…ne an auxiliary random variable

Y _{N 0; (2P (0))}2 2

independent of the observed outcome X _N O; 2 . Then X + Y N O; (2P (0))2 . De…ne the randomized statistical treatment rule ~ (X; Y ) as a function of both X and Y

~ (X; Y ) _{I jX + Y > 0j ,}

then clearly

E _O _{M ( ;P )}(X) = O

2P (0) = q ( O) . Integrating ~ (X; Y ) with respect to the distribution of Y yields

M ( ;P )(X) E (1 jX + Y > 0j) = X= q

(2P (0))2 2 _,

which thus satis…es E _O M ( ;P )(X) = q ( O) by construction and minimizes maximum regret, which equals P=2.

(23)

Let’s denote the inner integral over U by J ( O; ; P ) and evaluate it J ( O; ; P ) 1 2P Z P P ( O+ U) I [ O+ U > 0] O d U = 1 2P O Z P P ( O+ U) d U+ Z P P ( O+ U) I [ U > O] d U = O O + 8 > > > > < > > > > : 0 for O< P; P 4 + O 2 + 2 O 4P for O2 [ P; P ] ; O for O> P .

Di¤erentiating it with respect to and P for a given O yields

d dPJ ( O; ; P ) = I [j Oj P ] 1 4 2 O 4P2 , d d J ( O; ; P ) = O 2 O .

Integrating the derivatives over O, we get

d dP Z R J ( O; ; P ) d O = Z P P 1 4 2 O 4P2 d O= P 3, d d Z R J ( O; ; P ) d O = Z R O 2 O d O= :

(24)

References

[1] Berger, J.O. (1985), Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer Verlag.

[2] Cochran, W.G., Mosteller, F., and Tukey, J.W. (1954), Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male, Washington, DC: American Statistical Association.

[3] Feller, W. (1968), An Introduction to Probability and Its Applications Vol. 1 (3rd ed.), New York: Wiley.

[4] Hirano, K., and Porter, J.R. (2009), "Asymptotics for Statistical Treatment Rules," Econo-metrica, 77, 1683–1701.

[5] Horowitz, J.L., and Manski, C.F. (1998), "Censoring of Outcomes and Regressors Due to Survey Nonresponse: Identi…cation and Estimation Using Weights and Imputations," Journal of Econometrics, 84, 37–58.

[6] Imbens, G.W., and Manski, C.F. (2004), "Con…dence Intervals for Partially Identi…ed Parameters," Econometrica, 72, 1845-1857.

[7] Manski, C.F. (1995), Identi…cation Problems in the Social Sciences, Cambridge, MA: Har-vard University Press.

[8] Manski, C.F. (2004), "Statistical Treatment Rules for Heterogeneous Populations," Econo-metrica, 72, 1221–1246.

[9] Manski, C.F. (2005), Social Choice with Partial Knowledge of Treatment Response, Prince-ton: Princeton University Press.

[10] Manski, C.F. (2007a), Identi…cation for Prediction and Decision. Cambridge, MA: Harvard University Press.

[11] Manski, C.F. (2007b), "Minimax-Regret Treatment Choice with Missing Outcome Data," Journal of Econometrics, 139, 105–115.

[12] Manski, C.F. (2008), "Adaptive Partial Drug Approval: A Health Policy Proposal," The Economists’ Voice, Vol. 6: Iss. 4, Article 9.

(25)

[13] Manski, C.F. (2009), "Diversi…ed Treatment Under Ambiguity," International Economic Review, 50, 1013–1041.

[14] Manski, C.F., and Tetenov, A. (2007), "Admissible Treatment Rules for a Risk-Averse Planner with Experimental Data on an Innovation," Journal of Statistical Planning and Inference, 137, 1998–2010.

[15] McFadden, D. (2012), "Economic Juries and Public Project Provision," Journal of Econo-metrics, 166, 116–126.

[16] Savage, L.J. (1951), "The Theory of Statistical Decision," Journal of the American Statis-tical Association, 46, 55–67.

[17] Schlag, K.H. (2007), "Eleven – Designing Randomized Experiments Under Minimax Re-gret," working paper, European University Institute.

[18] Song, K. (2010), "Point Decisions for Interval-Identi…ed Parameters," working paper, Uni-versity of Pennsylvania.

[19] Stoye, J. (2007), "Minimax Regret Treatment Choice with Incomplete Data and Many Treatments," Econometric Theory, 23, 190–199.

[20] Stoye, J. (2008), "More on Con…dence Intervals for Partially Identi…ed Parameters," Econo-metrica, 77, 1299-1315.

[21] Stoye, J. (2009), "Minimax Regret Treatment Choice with Finite Samples," Journal of Econometrics, 151, 70–81.

[22] Stoye, J. (2012), "Minimax Regret Treatment Choice with Limited Validity of Experiments or with Covariates," Journal of Econometrics, 166, 138–156.