JS+2, JS+1.8, JS+1, JS 1 (the axis)
JS
0 . 5 ,JS 1.8, JS 2(sarr.e risk as
the maximum likelihood estimator)
C I MENSI ON-008 DECREES OF FREEOOM-0020 PLOT NUMBER-2102
F .<.gWte. 2 b Compa::oison wi th the Efron and Morris Posi tive part estimator
From top to bottom at the left
the curves are:
JS+2, JS+1. 8,JS+1, JS 1, JS+O.S,
JS as the maximum likelihood estimator
0 . 5 ,JS 1.8, JS 2(same risk
F igWte. 2 CompaJr....L6 on o n the. R-i.-6 k FunmoYL6 o n Stun -..Uke. E.6.t.i..matoM[ 2 . 2 ] 0. e -0.. B . - 1 I - 1 . 2 r
-·-·I
- 1 . 4 + - 1 . s i + - 1 .=�::I
- 1 . g F�g une 3 3 1D I M E N S I ON-00e DEGREES OF FREEOOM-0004
PLOT NUMBER-003
F�gune 3a Comparison wi th the
James-Stein Es timator
From top to bottom at the left the curves are :
JS+ 2 , JS+1 . 8 , JS+1 , JS 1 ( the axis ) JS 0 . 5 , JS 1 . 8 , JS 2( same risk as the maximum likelihood estimator ) .
D I MENS I ON-00e DEGREES OF FREEDOM-0004
PLOT NUMBER-004
F�gune 3b
Efro"'l and Es timator
Comparison wi th t he Morris Posi tive part
From top to bottom at the left the curves are :
JS+2 , JS+ 1 . 8 , JS+1 , JS 1 , JS+0 . 7 , JS 0 . 5 , JS 1 . 8 , JS 2 ( same risk as the maximum likelihood
[ 2 . 2 ] li!l. l -li!I. B -li!I. G - 1 . s l - 1 .
"I
FigWI.e. 4D :L MENS I ON-Ii!l l li!l
32
DEGREES O F FREEOOM-01i!12B PLOT NUMBER-Ii!lli!l�
l. 2
F igWte. 4a Comparison wi th the James-Stein Es timator
from top to bottom at the left curves are :
JS+2 , JS+1 . 8 , JS+l , JS 1 ( the axis ) JS+ O . 5�', JS 0 . s �" , JS 1 . 8 , JS 2( same
risk as the maximum likelihood estima-tor) .
*These curves nearly coincide .
D I MENS I ON-Ii!l l B DECREES OF FREEOOM-Ii!lli!I2B
PLOT NUMBER-Ii!l0e
FigWte. 4b Comparison wi th the
Efron and Morris positive part Estimator
from top to bottom at the left the curves are :
JS+2 , JS+1 . 8 , JS+l , JS 1 , JS+O . S* , JS 0 . 5* , JS 1 . 8 , JS 2
( same risk as the maximum likelihood estimator) .
[ 2 . 2 . 1] 33
0 ( X , S ) =
(
1-
)
X of which the James-Stein estimator is aspecial case . For the positive part version of this estimator we have written JS+a . The value c is the value recommended by James and Stein so for the James-Stein estimator we put a = 1 . Although these result are well known , the author has not seen the graphs plotted elsewhere .
Since the curves were smoothed using cubic splines , the graphs
tend to have too steep a gradient at � = 0 , the theoreti cal gradient
being zero . The fit elsewhere is good - that it is not perfect is
shown by the crossing of graphs for the positive part and corresponding non-positive part estimators : mid way between the points of intersection the difference is only about 0 . 01 .
2 . 2 . 1 Spec i a l Cases
If we put h = 0 and take H to have full column rank then the
...
null space of H and the hyperplane H�" = h reduce to a single point
( have dimension zero) . In this case the estimator ..,;
� is the James-
Stein estimator and the shrinkage is towards the origin . Stein suggested
choosing the origin at the best prior estimate for each coordinate . Alternatively , without essentially changing the estimator , we may choose h such that Gh is the best prior estimate of the mean . In particular ,
taking r = p
H = I and G p = I , h will be the prior estimate for p and the hypothesis is �1 = �2 = = �p = �0 .
� . Here If we do not wish to make p prior.estimates then we may follow the suggesti on of Lindley and let the data choose the origin towards which all coordinates are to be shrunk . We do this by putting h = 0
and H = 1 0 0 -1 1 0 0 -1 0 0 0 1 0 0 -1
in which case our prior hypothesis is �1 = �2 = • • • = �p = �O ' �O unknown , and the estimator shrinks each coordinate towards the common
·'· T
mean . In other words X is shrunk towads the line �" = [ 1 ,1 , . • .
,
1 ] a .In this case r = p - 1 and we have a reduction in risk if p � 4 . If
p = 3 then the estimator gives no reduction in risk over the maximum likelihood estimator ( in fact it is the maximum likelihood estimator ) whi le the ordinary James-Stein estimator does . However , when p � 4
[ 2 . 2 . 1 ] 34
more efficient , The James-Stein estimator being more efficient if �0 is close to the true mean and les s efficient if it is not .
Taking h * 0 gives a hypothesis of the form � . = a . + �0 , 1 l with �O unknown and the ai known , towards which we shrink the maximum likelihood estimator . Since H has full row rank every
generali sed inverse is a right inverse satisfying GH = ( GH )T whi ch
means that it is the unique Penrose inverse . The Penrose inverse is
G = -1 p-1 p-2 2 1 p -1 p-2 2 1 -1 -2 2 1 -1 -2 -( p-2 ) 1 -1 -2 . - ( p-2 ) - ( p-1 ) T
and taking h = [a1 -a2 , a2-a3 , • • • ,ap_1 - ap]
ensures that our estimator shrinks the maximum likelihood estimator
towards the required line .
Alternatively , we may obtain the same estimator with
H = 1 0 0 0 -1 G - -p 1 p-1 -1 -1 - 1
0 1 0 0 -1 -1 p-1 -1 -1
0 0 1 0 -1 -1 -1 p-1 -1
0 0 0 0 -1 -1 -1 -1 p-1
0 0 0 1 -1 -1 -1 -1 . -1
and h = [ a1-ap , a2-ap , . . . , a 1 - a ] p- p T
or with
H = 1 -1 0 0 0
1 1 -2 0 0
1 1 1 -3 . 0
1 1 1 1 1
Whichever form we take for
Gh = a - a = [a 1 ' . . . ' a ] p T 0 G 2 1 6 1 12 1 . 1 0 1 2 1 12 1 . p(pt1 ) 1 0 0 - 3 1 12 1 1 -p 0 0 0 p(pt 1 ) T
, a1ta2+ . . . +ap_ 1 - (p-1 )ap] .
H , GH = I - _!_ p 1 1 T and
1
I
a . 1 . p i=1 1[ 2 . 2 . 2 ]
If H splits into several mutually orthogonal submatrices
T T T T 1· * J'
H = [ H1 , H2 , . , Ht ] with HiHj = 0 for then the
shrinkage can be divided into components orthogonal to each of the
null spaces H .lJ�·: 1 = 0 . Figure 5 represents the hyperplane in which
35
Z lies , the axes representing the null spaces of H1 and H2 . ( In this diagram the hyperplane Hl-11: = h is represented as a single point ).
I \
\
!\ -v::t-:oOF..tguJte. 5 SlvUn.k.age o 6 :the. Maruum Uk.eliho o d u.thnatoJt :towaJtd6 a
Hype.Jtptane.
The points in the diagram labelled a , b and c are respectively the vector towards which Z shrinks and the components of this vector
in each of the hyperplanes H ll�·: = h 1 1
2 . 2 . 2 General i sed S h ru n ken E s t i ma to rs
and
The last special case considered suggests a possible generalisation . The shrinkage towards each of the hyperplanes H .ll = h . 1 �': 1 does not need
to involve the same shrinkage factor in each case . The amount of shrinkage in the case considered is proportional to the weight of evidence in the sample for the hypothesis Hll = h ( this weight of
evidence being measured by pF -n s if