Supplements to Directionally Differentiable Econometric Models

(1)

Supplements to “Directionally Differentiable Econometric Models”

JIN SEO CHO School of Economics

Yonsei University

50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Korea Email: [email protected]

HALBERT WHITE Department of Economics University of California, San Diego 9500 Gilman Dr., La Jolla, CA, 92093-0508, U.S.A.

Email: [email protected] This version: October, 2016

Abstract

We illustrate analyzing directionally differentiable econometric models and provide technical details that are not contained in Cho and White (2016).

Key Words: directionally differentiable quasi-likelihood function, Gaussian stochastic process, quasi- likelihood ratio test, Wald test, and Lagrange multiplier test statistics, stochastic frontier production function, GMM estimation, Box-Cox transform.

JEL Classification: C12, C13, C22, C32.

Acknowledgements: The co-editor, Yoon-Jae Whang, and two anonymous referees provided very help- ful comments for which we are most grateful. The second author (Halbert White) passed away while the final version was written. He formed the outline of the paper and also showed an exemplary guide for writing a quality research paper. The authors are benefited from discussions with Yoichi Arai, Se- ung Chan Ahn, In Choi, Horag Choi, Robert Davies, Graham Elliott, Chirok Han, John Hillas, Jung Hur, Hide Ichimura, Isao Ishida, Yongho Jeon, Estate Khmaladze, Chang-Jin Kim, Chang Sik Kim, Tae- Hwan Kim, Naoto Kunitomo, Taesuk Lee, Bruce Lehmann, Mark Machina, Jaesun Noh, Kosuke Oya, Taeyoung Park, Peter Phillips, Erwann Sbai, Donggyu Sul, Hung-Jen Wang, Yoshihiro Yajima, Byung Sam Yoo and other seminar participants at Sogang University, the University of Auckland, the University of Tokyo, Osaka University, the Econometrics Study Group of the Korean Econometric Society, VUW, Yonsei University, and other conference participants at NZESG (Auckland, 2013).

(2)

1 Introduction

This note illustrates analysis of econometric models formed by directionally differentiable (D-D) quasi- likelihood functions and provides technical details that are not contained in Cho and White (2016). All theorems, assumptions, and corollaries are those in Cho and White (2016) unless otherwise stated.

2 Examples

In this section, we illustrate the analysis of directionally differentiable econometric models using the stochastic frontier production function in Aigner, Lovell, and Schmidt (1977) and Stevenson (1980), Box and Cox’s (1964) transformation, and the standard generalized methods of moments (GMM) estimation in Hansen (1982).

2.1 Example 1: Stochastic Frontier Production Function Models

A D-D quasi-likelihood function is found from the theory of stochastic frontier production function models.

Stochastic production function models are often specified for identically and independently distributed (IID) observations {Yt, X_t} as

Yt= X⁰_tβ_∗+ Ut,

where Yt ∈ R is the output produced by inputs Xt ∈ R^k such that β_∗ is an interior element of B ⊂ R^k, E[U_t²] < ∞, E[X_t,j² ] < ∞ for j = 1, 2, . . . , k, and E[X_tX⁰_t] is positive-definite. Here, U_t stands for an error that is independent of Xt. This model is first introduced by Aigner, Lovell, and Schmidt (1977).

One of the early uses of this specification is in identifying inefficiently produced outputs. Given output levels subject to the production function and inputs, if E[Ut] < 0, outputs are inefficiently produced. Aigner, Lovell, and Schmidt (1977) capture this inefficiency by decomposing Utinto Ut ≡ V_t− W_t, where Vt ∼ N (0, τ_∗²), Wt := max[0, Qt], Qt ∼ N (µ_∗, σ²_∗), and Vt is independent of Wt. Here, it is assumed that τ∗ > 0, σ∗ ≥ 0, and µ∗ ≥ 0, and W_tis employed to capture inefficiently produced outputs. If µ∗ = 0 and σ_∗² = 0, this model reduces to Zellner, Kmenta, and Drèze’s (1966) stochastic production function model, implying that outputs are efficiently produced. The key to identifying the inefficiency is, therefore, in testing whether µ∗= 0 and σ_∗²= 0.

The original model introduced by Aigner, Lovell, and Schmidt (1977) assumes µ∗ = 0, so that the mode of Wt is always achieved at zero. Stevenson (1980) suggests to extend the model scope by letting µ∗be different from zero, and the model with unknown µ∗has been popularly specified for empirical data

(3)

analysis since then (e.g., Dutta, Narasimhan, and Rajiv (1999), Habib and Ljungqvist (2005), and etc.).

Nevertheless, we do not find a methodology testing whether µ∗ = 0 and σ_∗² = 0 in prior literature to the best of our knowledge. This is mainly because the likelihood value is not identified under the null. Note that for each (β, σ, µ, τ ), the log-likelihood is given as

L_n(β, σ, µ, τ ) =

n

X

t=1

ln

φ Yt− X⁰_tβ + µ

√σ²+ τ²

−1

2ln(σ²+ τ²) − ln

Φ

µ

√ σ²

/Φ

µet

√ eσ²

,

where φ( · ) and Φ( · ) are the probability density function (PDF) and cumulative density function (CDF) of a standard normal random variable, respectively, and

µe_t:= τ²µ − σ²(Y_t− X⁰_tβ)

τ²+ σ² and eσ² := τ²σ² τ²+ σ².

Here, the log-likelihood is not identified if θ∗:= (β⁰_∗, µ∗, σ∗, τ∗)⁰= (β⁰_∗, 0, 0, τ∗) because µ∗/pσ²_∗ = 0/0, so that ln[Φ(µ∗/pσ_∗²)] is not properly identified. Furthermore, if we let

µe∗t:= τ_∗²µ∗− σ_∗²Ut

τ_∗²+ σ_∗² and σe²_∗ := τ_∗²σ²_∗ τ_∗²+ σ²_∗,

µe∗t/p

σe_∗² = 0/0, so that ln[Φ(µe∗t/p

σe²_∗)] is not also identified by the model. Even further, this model is not differentiable (D). This can be verified by examining the first-order directional derivative. Some tedious algebra shows that for given d := (d⁰_β, dµ, dσ, dτ)⁰,

limh↓0L_n(θ∗+ hd) = −n

2ln(τ_∗²) +

n

X

t=1

ln

"

φ Y_t− X⁰_tβ_∗ pτ_∗²

!#

,

which is the log-likelihood desired by the null condition. This limit is obtained by particularly using the fact that

limh↓0Φ hd_µ p(hdσ)²

!

= Φ d_µ pd²_σ

!

and lim

h↓0Φ µe∗t(h; d) p

eσ(h; d)²

!

= Φ d_µ pd²_σ

! ,

where

eσ∗(h; d)²:= (τ∗+ hd_τ)²(hd_σ)² (τ∗+ hdτ)²+ (hdσ)² and

eµ∗t(h; d) := (τ∗+ hdτ)²hdµ− (hd_σ)²(Yt− X⁰_t(β_∗+ hdβ)) (τ∗+ hd_τ)²+ (hd_σ)² .

(4)

Using this directional limit, the first- and second-order directional derivatives of L_n( · ) at (β_∗, 0, 0, τ∗) are

DLn(θ∗; d) =

n

X

t=1

1

τ_∗³ d_τ(U_t²− τ_∗²) +−d_µ+ X⁰_tdβ− ψ (d_µ, dσ) τ∗Ut , and

D²L_n(θ∗; d) =

n

X

t=1

1

τ_∗⁴ d²_σ(U_t²− τ_∗²) + d²_ττ_∗²− d_τU_t− (d_µ− X⁰_td_β)τ∗][3d_τU_t− (d_µ− X⁰_td_β)τ∗]

−

n

X

t=1

1

τ_∗⁴ψ(d_µ, d_σ)²U_t²+ ψ(d_µ, d_σ)[d_µU_t²− 4d_ττ∗U_t+ (d_µ− 2X⁰_td_β)τ_∗²] , respectively, where ψ(d_µ, d_σ) := |d_σ|φ(d_µ/|d_σ|)/Φ(d_µ/|d_σ|). Here, if θ∗ = (β⁰_∗, 0, 0, τ∗), U_t∼ N (0, τ_∗²).

These directional derivatives are neither linear nor quadratic with respect to d, respectively, so that Ln(·) is not twice D. Therefore, this model cannot be analyzed as for the standard D likelihood function. We examine this model by letting

∆(θ∗) :=

n

d ∈ R^d+3 : d⁰d = 1, dµ≥ 0, and dσ ≥ 0o

to accommodate the condition that µ∗ ≥ 0 and σ∗≥ 0.

It is not hard to identify the asymptotic behaviors of the first and second-order directional derivatives.

Note that DLn(θ∗; d) = Z1,n(d) + Z2,n(d), where for each d,

Z_1,n(d) :=dτ

τ_∗³

n

X

t=1

(U_t²− τ_∗²), Z_2,n(d) := 1 τ_∗²

n

X

t=1

X⁰_td_β+ m (d_µ, d_σ) U_t,

and m(dµ, dσ) := −[dµ+ ψ (dµ, dσ)]. Note that ψ (·, ·) is Lipschitz continuous, so that Assumption 5(iii) holds with respect to the first-order directional derivative. Furthermore, for each d, McLeish’s (1974, The- orem 2.3) central limit theorem (CLT) can be applied to Z1,n(d) and Z2,n(d): for each d,

n^−1/2





Zn,1(d) Z_n,2(d)



⇒



 Z₁(d) Z₂(d)



∼ N







 0 0



, 1 τ_∗²





2d²_τ 0

0 E[(X⁰_td_β+ m (d_µ, d_σ))²]







.

It also follows that for each d and ed,

E[Z1(d)Z1(ed)] = 2dτdeτ

τ_∗² , E[Z1(d)Z2(ed)] = 0, and

(5)

E[Z₂(d)Z₂(ed)] = 1 τ_∗²





m (dµ, dσ) d_β





0



1 E[X⁰_t] E[X_t] E[X_tX⁰_t]









m( edµ, edσ) de_β



.

Here, Zn,1(d) and Zn,2(d) are linear with respect dτ and [m (dµ, dσ) , d⁰_β]⁰, respectively. From this fact, their tightness trivially follows, so that n^−1/2DL_n(θ∗; · ) ⇒ Z(·), where Z(·) is a zero-mean Gaussian stochastic process such that for each d and ed, E[Z(d)Z(ed)] = B∗(d, ed) and

B∗(d, ed) := 1 τ_∗²







d_β m (dµ, dσ)

d_τ







0





E[X_tX⁰_t] E[X_t] 0 E[X⁰_t] 1 0

0 0 2











 de_β m( edµ, edσ)

de_τ





 .

It is also possible to define Z(·) as Z1(·) + Z2(·).

We provide another Gaussian stochastic process with the same covariance structure as that of Z(·). If we let eZ(d) := δ(d)⁰Ω^1/2∗ W such that for each d,

δ(d) :=







d_β m (dµ, dσ)

d_τ







, Ω∗ := 1 τ_∗²







E[X_tX⁰_t] E[X_t] 0 E[X⁰_t] 1 0

0 0 2





 ,

and W ∼ N (0_k+2, I_k+2), it follows that E[ eZ(d) eZ(ed)] = δ(d)⁰Ω∗δ(ed) that is identical to B∗(d, ed), so that eZ(·) = Z(·). Furthermore, e^d Z(·) is linear with respect to W. This feature makes it convenient to analyze the asymptotic distribution of the first-order directional derivative.

The probability limit of the second-order directional derivative can also be similarly obtained. Note that D²Ln(θ∗; ·) is Lipschitz continuous on ∆(θ∗), so that Assumption 5(iii) holds, and we can apply the law of large numbers (LLN):

1 n

n

X

t=1

U_t²= τ∗+ o_P(1), 1 n

n

X

t=1

U_tX_t= o_P(1), and

1 n

n

X

t=1

XtX⁰_t= E[XtX⁰_t] + o_P(1).

This implies that

n⁻¹D²L_n(θ∗; d)^a.s.→ −1

τ_∗² 2d²_τ+ E[(d_µ− X⁰_td_β)²] + ψ(d_µ, d_σ)²+ 2[d_µ− E[X_t]⁰d_β]ψ(d_µ, d_σ) ,

(6)

and this is identical to −B(d, d). Thus, 2{L_n(bθ_n) − L_n(θ∗)} ⇒ sup_d∈∆(θ_∗₎[0, Y(d)]²by Theorem 1(iii), where

Y(d) := δ(d)⁰Ω^1/2∗ W {δ(d)⁰Ω∗δ(d)}^1/2, and for each d and ed,

E[Y(d)Y(ed)] = δ(d)⁰Ω∗δ(ed)

{δ(d)⁰Ω∗δ(d)}^1/2{δ(ed)⁰Ω∗δ(ed)}^1/2.

This result shows that the directional limit of the likelihood is well defined under the null, and the null limit distribution can be obtained using this, although the log-likelihood is not properly identified under the null.

The efficient production hypothesis can be tested by the QLR, Wald, and LM test statistics. For this examination, we let υ = (µ, σ)⁰, λ = β, τ = τ , and π = (β⁰, υ⁰)⁰ = (β⁰, µ, σ)⁰. The hypotheses of interest here are

H0 : υ∗ = 0 versus H1: υ∗ 6= 0.

Then, for each d and ed,

B∗(d, ed) =





B^(π,π)∗ (d_π, ed_π) 0⁰

0 _τ²2

∗dτ0

deτ



, and

B^(π,π)∗ (d_π, ed_π) = 1 τ_∗²





d⁰_βE[X_tX⁰_t]ed_β d⁰_βE[X⁰_t]m(d_µ, d_σ) m( edµ, edσ)E[Xt]edβ m(dµ, dσ)m( edµ, edσ)



. By the information matrix equality, for each d, B∗(d) is identical to −A∗(d).

The null limit distributions of the test statistics are identified by the theorems in Cho and White (2016).

First, we apply the QLR test. Applying Theorem 2 shows that

LR⁽¹⁾_n := 2{L_n(bθ_n) − L_n(θ∗)} ⇒ sup

sπ∈∆(π∗)

max[0, Y^(π)(s_π)]²+ H₂,

where for each s_π∈ ∆(π_∗) := {(s⁰_β, s_µ, s_σ)⁰ ∈ R^k+2 : s⁰_βs_β+ s²_µ+ s²_σ = 1, s_µ> 0, and s_σ > 0},

Y^(π)(s_π) := {E[(s⁰_βX_t+ m(s_µ, s_σ))²]}^−1/2Z^(π)(s_π),

(7)

Z^(π)(s_π) := s⁰_βZ^(β)+ m(s_µ, s_σ)Z^(υ), and



 Z^(β) Z^(υ)



∼ N







 0 0



,





E[XtX⁰_t] E[Xt] E[X⁰_t] 1







.

Note that [Z^(β)⁰, Z^(υ)]⁰ is the weak limit of n^−1/2τ_∗⁻¹Pn

t=1[UtX⁰_t, Ut]⁰. Theorem 2(iv) also implies that LR⁽¹⁾_n := 2{L_n(bθ_n) − L_n(θ∗)} ⇒ sup_s_υ_∈∆(υ_∗₎max[0, eY^(υ)(s_υ)]²+ Z^(β)⁰E[X_tX⁰_t]⁻¹Z^(β)+ H₂, where for each sυ∗ ∈ ∆(υ_∗) := {(sµ, sσ)⁰∈ R²: s²_µ+ s²_σ = 1, sµ> 0, and sσ > 0},

Ye^(υ)(sυ) := ( eB^(υ,υ)∗ (sυ))^−1/2Ze^(υ)(sυ),

Be∗^(υ,υ)(s_υ) := m(s_µ, s_σ)²{1 − E[X_t]⁰E[X_tX⁰_t]⁻¹E[X_t]},

and also eZ^(υ)(s_υ) := m(s_µ, s_σ){Z^(υ)− E[X_t]⁰E[X_tX⁰_t]⁻¹Z^(β)}. Furthermore, Theorem 2 shows that

LR⁽²⁾_n := 2{L_n(¨θ_n) − L_n(θ∗)} ⇒ sup

sβ∈∆(β_∗)

max[0, Y^(β)(s_β)]²+ H₂,

where for each s_β ∈ ∆(β_∗) := {s_β ∈ R^k : s⁰_βs_β = 1}, Y^(β)(s_β) := {s⁰_βE[XtX⁰_t]s_β}^−1/2Z^(β)⁰s_β, and applying Theorem 2(iii) implies that LR⁽²⁾n := 2{L_n(¨θ_n) − L_n(θ∗)} ⇒ Z^(β)⁰E[X_tX⁰_t]⁻¹Z^(β)+ H₂. Therefore, Theorem 2(iv) now yields that

LR_n⇒ sup

sυ∈∆(υ₀)

max

0, m(sµ, sσ)

|m(s_µ, s_σ)|Z

2

under H0, where Z := {1 − E[Xt]⁰E[XtX⁰_t]⁻¹E[Xt]}^−1/2{Z^(υ)− E[X_t]⁰E[XtX⁰_t]⁻¹Z^(β)} ∼ N (0, 1).

If we let r(x) := φ(x)/[xΦ(x)],

m(s_µ, s_σ)

|m(s_µ, s_σ)| = − s_µ

|s_µ|

1 + r(s_µ/|s_σ|)

|1 + r(s_µ/|s_σ|)|

,

which is −1 uniformly on ∆(υ0). Thus, the null limit distribution reduces to max[0, −Z]², and this implies that LRn A

∼ max[0, −Z]²under H0.

We conduct simulations to verify this. We let (X⁰_t, U_t)⁰ ∼ IID N (0₂, I₂) and obtain the null limit distribution of the QLR test statistic by repeating the same independent experiments 2,000 times for n = 50, 100, and 200. Simulation results are summarized in Figure 1. Note that the null limit distributions of the QLR test statistics exactly overlap with that of max[0, −Z]².

(8)

The null limit distribution of the QLR test can be uncovered by several simulation methods. The Monte Carlo method proposed by Dufour (2006) can also be used as the model is correctly specified. Although the null likelihood is not identified, the directional limits obtained under the null can be used to form the QLR test statistic. Hansen’s (1996) weighted bootstrap can also be used to estimate the asymptotic p-values.

Second, we apply the Wald test. For this, if we let

Wc_n(s_µ, s_σ) := m(s_µ, s_σ)² (

1 − n⁻¹

n

X

t=1

X⁰_t(n⁻¹

n

X

t=1

X_tX⁰_t)⁻¹n⁻¹

n

X

t=1

X_t )

,

the LLN implies that sup_s_µ_,s_σ|cW_n(s_µ, s_σ)− eB∗^(υ,υ)(s_µ, s_σ)| → 0 a.s.−P. In particular, m( · , · )²is bounded by 1 and 2/π from above and below, respectively. Using cWn(sµ, sσ), we let the Wald test statistic be defined as

W_n:= sup

sµ,sσ

n{eh^(υ)_n (sµ, sσ)}{cWn(sµ, sσ)}{eh^(υ)_n (sµ, sσ)},

where eh^(υ)n (s_µ, s_σ) is such that for each (s_µ, s_σ),

L_n(eh^(υ)_n (s_µ, s_σ)s_µ,eh^(υ)_n (s_µ, s_σ)s_σ, eβ_n(s_µ, s_σ),eτ_n(s_µ, s_σ))

= sup

{h^(υ),β,τ }

L_n(h^(υ)(s_µ, s_σ)s_µ, h^(υ)(s_µ, s_σ)s_σ, β, τ ).

Theorem 3 now implies that W_n ⇒ sup_s

υ∈∆(υ₀)max[0, eY^(υ)(s_υ)]², and this is the weak limit identical to that of the QLR test statistic. Thus, Wn A

∼ max[0, −Z]²under H0.

Finally, we investigate the LM test statistic. We let the LM test statistic be defined as

LM_n:= sup

(sµ,sσ,sβ)∈∆(υ0)×∆( ¨β_n)

nfW_n(s_µ, s_σ, s_β) max

"

0, −DL_n(¨θ_n; s_µ, s_σ) De²Ln(¨θn; sµ, sσ, sβ)

#2

,

where ¨θ_n = ( ¨β_n, 0, 0, ¨τ_n) with ¨β_n = (Pn

t=1X_tX⁰_t)⁻¹Pn

t=1X_tY_t, ¨τ_n = (n⁻¹P2

t=1U¨_t²)^1/2, ¨U_t:= Y_t− X⁰_tβ¨_n, ∆( ¨β_n) := {x ∈ R^k: x⁰x = 1}, DLn(¨θn; sµ, sσ) = {m(sµ, sσ)/¨τ_n²}Pn

t=1U¨t, and

− eD²L_n(¨θ_n; s_µ, s_σ, s_β) =1

¨ τ_n⁴

n

X

t=1

{s²_σ(¨τ_n²− ¨U_t²) + ψ(s_µ, s_σ)²U¨_t²+ ψ(s_µ, s_σ)s_µ( ¨U_t²+ ¨τ_n²) + s²_µτ¨_n²}

−m(sµ, sσ)²

¨ τ_n²

n

X

t=1

s⁰_βXt s⁰_β

n

X

t=1

XtX⁰_tsβ

!−1 n

X

t=1

X⁰_tsβ.

(9)

In particular, applying the LLN implies that for each (s_µ, s_σ),

−1

nDe²L_n(¨θ_n; s_µ, s_σ, s_β) = m(s_µ, s_σ)²

τ_∗² {1 − s⁰_βE[X_t](s⁰_βE[X_tX⁰_t]s_β)⁻¹E[X⁰_t]s_β} + o_P(1).

This LLN also holds uniformly on ∆(υ0) × ∆( ¨β_n). Thus, for each (sµ, sσ, s_β), we may let

Wfn(sµ, sσ, sβ) := m(s_µ, s_σ)² τ_∗²







1 − n⁻¹

n

X

t=1

s⁰_βXt s⁰_βn⁻¹

n

X

t=1

XtX⁰_tsβ

!−1

n⁻¹

n

X

t=1

X⁰_tsβ





 .

Here, applying the proof of Corollary 1(vii) implies that

sup

sβ∈∆( ¨β_n)

nfW_n(s_µ, s_σ, s_β) max

"

0, −DL_n(¨θ_n; s_µ, s_σ) De²Ln(¨θn; sµ, sσ, sβ)

#2

= max

"

0, m(s_µ, s_σ)

|m(s_µ, sσ)|

n^−1/2Pn t=1U¨_t

{τ_∗²(1 − E[Xt]⁰E[XtX⁰_t]⁻¹E[Xt])}^1/2

#2

+ o_P(1)

by optimizing the objective function with respect to sβ, so that

LM_n= sup

(sµ,sσ)∈∆(υ0)

max

"

0, m(sµ, sσ)

|m(s_µ, s_σ)|

n^−1/2Pn t=1U¨t

{τ_∗²(1 − E[X_t]⁰E[X_tX⁰_t]⁻¹E[X_t])}^1/2

#2

+ o_P(1)

under H0. Therefore, LMn A

∼ max[0, −Z]²by noting that m( · , · )

|m( · , · )| = −1 on ∆(υ₀) and n^−1/2P_n

t=1U¨_t∼ N [0, τ_∗²(1 − E[X_t]⁰E[X_tX⁰_t]⁻¹E[X_t])]. This is exactly what Theorem 4 asserts.

Before moving to the next example, some remarks are in order. Here, we assume µ∗ ≥ 0 so that d_µis always greater than or equal to zero, and this is assumed to avoid the failure in numerical simulation. It is more general to suppose that µ∗ can be negative, so that for some positive c > 0, µ∗ ∈ [−c, c]. For such a case, for example, the null limit distribution of the QLR test is modified into

LR_n⇒ sup

sυ∈∆(υ0)⁰

max

0, m(s_µ, s_σ)

|m(s_µ, sσ)|Z

2

,

where ∆(υ0)⁰ := {(sµ, sσ) ∈ R² : s²_µ+ s²_σ = 1 and sσ > 0}. Furthermore, it analytically follows that m(s_µ, s_σ)/|m(s_µ, s_σ)| = −1 uniformly on ∆(υ₀)⁰, so that LRn⇒ max[0, −Z]², which is the same as for

(10)

the previous case. Nevertheless, our Monte Carlo experiments assuming the same condition showed that the empirical distribution of LRnexactly overlaps with that of Z²under the null.

This discrepancy is mainly because the value of m(s_µ, s_σ) sensitively responds to the value of (s_µ, s_σ), so that we obtain that m( · , · )/|m( · , · )| = ±1 numerically on ∆(υ0)⁰. This also implies that

LR_n⇒ sup

sυ∈∆(υ0)⁰

max

0, m(s_µ, s_σ)

|m(s_µ, sσ)|Z

2

= sup

sυ∈∆(υ0)⁰

max [0, −Z, Z]²= Z²

as can be easily verified by Monte Carlo experiments. More precisely, if s_µ < 0 and s_σ > 0, so that we can let sµ = −p1 − s²_σ, it analytically follows that limsσ↓0m(−p1 − s²_σ, sσ) = 0 and for any sσ > 0, m(−p1 − s²_σ, s_σ) < 0. Nevertheless, computing this value requires pretty high level of precision around sσ = 0, and standard statistical packages do not often provide this level of precision. Numerically, they compute m(−p1 − s²_σ, s_σ) oscillating around zero as s_σ converges to 0, so that m( · , · )/|m( · , · )| is obtained as ±1 on ∆(υ0)⁰. Our parameter space restriction is imposed to avoid this numerical failure by letting µ∗≥ 0.

2.2 Example 2: Box-Cox’s (1964) Transformation

Applying the directional derivatives makes model analysis more sensible for nonlinear models with irregular properties. Box and Cox’s (1964) transformation belongs to this case. We consider the following model:

Yt= Zt0

θ0+θ1

θ2

(X_t^θ²− 1) + U_t, (1)

where {(Yt, Xt, Zt0

) ∈ R^2+k : t = 1, 2, · · · } is assumed to be IID, Xtis strictly greater than zero almost surely, and Ut := Y_t− E[Y_t|Z_t, X_t]. Furthermore, θ := (θ⁰₀, θ₁, θ₂)⁰ ∈ Θ₀ × Θ₁₂, Θ0 is a convex and compact set in R^k, and

Θ12:= {(y, z) ∈ R² : cy ≤ z ≤ ¯cy < ∞, 0 < c < ¯c < ∞, and z²+ y²≤ ¯m < ∞}.

Our interests are in testing whether X_tinfluences E[Y_t|Z_t, X_t] or not by testing that the second term of (1) vanishes.

This model is introduced to avoid Davies’s (1977,1987) identification problem. If the Box-Cox transformation is specified in the conventional way as in Hansen (1996), so that

Yt= Zt0

θ0+ β1(X_t^γ− 1) + U_t