First Order Convergence Analysis for Sparse Grid Method in Stochastic Two Stage Linear Optimization Problem

(1)

doi:10.4236/ajcm.2011.14036 Published Online December 2011 (http://www.SciRP.org/journal/ajcm)

First Order Convergence Analysis for Sparse Grid Method

in Stochastic Two-Stage Linear Optimization Problem

Shengyuan Chen

Department of Mathematics and Statistics, York University, Toronto, Canada E-mail: [email protected]

Received August 27, 2011; revised September 20,2011; accepted September 30, 2011

Abstract

Stochastic two-stage linear optimization is an important and widely used optimization model. Efficiency of numerical integration of the second stage value function is critical. However, the second stage value function is piecewise linear convex, which imposes challenges for applying the modern efficient spare grid method. In this paper, we prove the first order convergence rate of the sparse grid method for this important stochastic optimization model, utilizing convexity analysis and measure theory. The result is two-folded: it establishes a theoretical foundation for applying the sparse grid method in stochastic programming, and extends the convergence theory of sparse grid integration method to piecewise linear and convex functions.

Keywords:Convergence Analysis, Stochastic Optimization, Scenario Generation, Convex Analysis, Measure Theory

1. Introduction

Stochastic two-stage linear optimization, also called sto-chastic two-stage linear programming, models a sequen-tial decision structure, where the first stage decisions are made now before the random variable manifests itself; and the second stage decisions are made adaptively to the realized random variable and the first stage decisions. The adaptive decision model has been applied many im-portant application areas. For example, in the introduc-tory farmer’s problem [1], a farmer needs to divide the land for different vegetables in spring. The farmer’s ob-jective is to maximize profit in the harvest season. The profit is related to the market price at that time and the weather dependent yield. Neither the price nor the weather is known at the present time, hence the farmer’s decision in spring has to take into account multiple sce-narios. It is not a simple forecasting problem though, since the farmer’s second stage decision in fall, which adapts to different scenarios, also jointly determines the profit. The second stage decision problem is also called “recourse” problem. [2] collects more recent applications in engineering, manufacture, finance, transportation, tele- communication et al.

A stochastic two-stage linear problem with recourse has the following general representation:

   

 

min , d , and

, = min

. . = , 0,

T x X

T

c x x P

x s y

s t Qy Tx y

  

 





 

 



(1)

where  is a random vector properly defined on



,,P



, X is a polytope feasible region for the first stage, Qm n , s and are a vector and a matrix of proper sizes,

T







: X, ,



      is a real valued function.

The high dimensional integration in (1) is difficult and is usually approximated by using a set of scenarios and weights



k, k



, 1, ,

w k K

  

K as:









1

min , ,and

, min

. . , 0.

T k k

x X k

k T

k

c x w x

x d y

s t Qy Tx y

 



  _



  



(2)

Under this scenario approximation, the optimal objec-tive value *

K

z of (2) provides an approximation of the optimal objective value z* of (1). An optimal solution

* K

x of (2) provides an approximation of an optimal solu-tion x* of (1).

(2)

sampling points and wk = 1K. The convergence theory of Monte Carlo method has been extensively studied [3-6]. The core result is the epi-convergent theorem: un-der mild assumptions, zK* converges to w.p.1 as

; and any clustering point of the =1, which is the sequence of optimal solutions to (2), is in the op-timal solution set of the original problem. Quasi Monte- Carlo (QMC) method has also been recently studied [7], and similar convergence result has been achieved.

*

z

*

{x_K}_K

K 

The sparse grid (SP) method is an established high dimensional quadrature rule, which was originally pro-posed by Smolyak [8], and has been studied by many authors in the context of numerical integration [9] (and references therein). Its application in the stochastic two- stage linear optimization is only shown in a recent numerical study in [10]. Though [10] shows the superior numerical performance of sparse grid method, compared with both MC and QMC, the convergence analysis is based on an assumption that the recourse function is in a Sobolev space, which only holds for a very narrow subset of the two-stage linear problems, i.e., separable problems. The contribution of this paper are 1) establishing the epi-convergence of the sparse grid method for this important decision model; 2) prove the first order convergence rate of the method.

We first introduce the spare grid approximation error for integrand functions in Sobolev spaces.

Let Dj denote the partial derivative operator

 

_j

 

=

 

j

f

D f x x

x  

Let  =



1, ,d



,j 



  be a multi-index, and

define

=1

= =

d j

j d ,

j j

D D

x

 



 



where   1  d



. The Sobolev space with smooth- ness parameter r1 is defined as

 



: 2 0,1 for all , d

r

d f D f r

 _

  

 

where  r means component-wisely _j r, 1, ,

j

   d . Sobolev spaces could also be defined using

p

 norms, see Evans [11]. The derivatives in the defini-tion of Sobolev space are weak derivatives. Formally,

D f is the -derivative of f if for all 0

 

0,1 ,

d

vC i.e., infinitely differentiable function on

 

0,1d, D f

satisfies the following equation:

 





   

 

0,1

[0,1]

d

1 d

d

D f x v x x

.

f x D v x x



 

 



For example, f x

 

= x defined in

 

0,1 has the first order weak derivative function Df sign

 

x ; but the function is nondifferentiable at 0 in the usual strong sense. It has been shown that weak derivative is essen-tially unique, and coincides with the classical strong rivative when it exists. Various properties of strong de-rivatives carry over to weak derivative as well, for ex-ample, D  f D



D f



D



D f



for all multi- index   , ,  r. For more calculus rules regard-ing weak derivative, includregard-ing the extended Leibniz theorem, see Evans ([11], Section 5.2.3).

The norm of the defined Sobolev space is



2



₂

,

r

d _r

f D f

 



 

where 2

_ is the L2 -norm of a function, r component-wisely, ₂ is the Euclidian 2-norm of a finite vector:

 



₂



1/ 2 [0,1]

2 d d

f _ 



f x x

1/ 2 2 2

1

.

k

j j

h h



 

  







For , the sparse grid method achieves the fol-lowing convergence rate [12]:

r d

f 

 

 





  

0,1

=1 1 1 ,

d

log

,

K

k k d

k d r

r

r d r _d

f w f

K

f K



 



 











(3)

where K is the number of function evaluations, r d, is a constant independent of f , increasing with both d and r, see Brass [13]. Note that f_dr implies

,

i d

f ir. Since the norm r d

f _ and d r, are non- decreasing in r, and the term r



ln



d1 r1

K K   is non-increasing in for large , it is none trivial to tell which space will yield the tightest bound. The prob-lem is called fat

r K

F problem in Wonzniakowski [14]. In this paper, as we shall see, only is relevant for our discussion.

= 1

r

The convergence result in (3) only holds for the two- stage stochastic linear programming (1) in the trivial case, i.e., when the integrand function is separable. For example,

 

x, :

   





1 1

1 2 1 2 1 1 2 2 ,

, , , = min

y y

x x y y y

   _ _ _ _ _

(3)

1 1 1 = 1

yy  x

2 2 = 2 2 yy  x

1, 1, 2, 2 0 y y y y is equivalent to



x x1, 2, ,1 2



1 x1 2 2 ,

      x

where 1 x1 11,

1

2 x2 1

   and the conver-gence result in (3) can be applied directly.

However, in general, 

 

x, is non-separable piece-wise function, see Birge and Louveaux [1], and does not belong to dr for any . For example, r



1 2 1 2



,

, , , = min

y y

x x y

    

  y

2 1 2 1

=

yy    x x

, 0

y y

is equivalent to



x x1, 2, ,1 2



= 1 2 x1 2 ,

      x

and does not have the (weak) derivative D 1,1



x x₁, ₂, , 



discon and non-differentiable even in the we D 1,1 f =D 1,0



D 0,1 f



if nd in (3) can not be applied to two-stage linear problem directly. The major contribution of this paper is to prove the convergence of (2) to (1) in the rate specified in (3) with r= 1, i.e., the first order conver e rate, even though

 

_, 1

d

x

   . On the other hand, extends the convergence theory of sparse grid method to convex multivariate piecewise linear functions since for such a function

: : _i

since is tinuous

ak sense, while Hence the erro

genc this analys (0,1)

D f

exists.

 1,1

D f r bou

is

f  d x for xBi polyh

and



1,,Bm



partitions  presented a

 

= m

f x

; each B

, could be

i is edron

re s1

B

in

, = 1, , .

y

i

y yd x i  m

The paper is organized as the followings. In Section 2, we introduce a logarithmic mollifier function and prove its various properties. The mollifier function is quite fa-miliar to the optimization community as it is the barrier function used in the Interior Point Method for linear pro-gramming. In Section 3, we use the limiting properties of the mollifier function to prove the uniform convergence and the first order convergence rate for the objective function. We also show the converging behaviour of the optimal solutions x*K in a subsequence. Finally, Section 4 presents our conc ions.

In the coming sections, we a lus

ssume . For a

m

 

= 0,1d 

ith a inve ore general continuous distribution w rse

cu-mulative function F1: 0,1

 

d , one can apply transformation

   

d = _{ }_0,1d

f  F  f



F1

 





d.





The transforma re complexity in the

an

2. Mollifier Function

We make the followi tions of the problem

:

tion brings in mo

ng mild assump

alysis without changing our conclusion, hence we as-sume  

 

0,1d in the following sections and extend the an ugh inverse transformation and trunca-tion in the Appendix.

alysis thro

(1):

A1 X is compact with nonempty relative interior;

 

, , <

x X,  x

   , or relative;

A2: completeness, and 

 

x, has nonempty relative interior;

A3: rank

 

Q =m;

A4:  is a continuous random vector with an

in-ve cu

ysis using the In

rtible mulative distribution function. Assumption A1 is necessary for our anal

terior Point Method theory. Assumption A2 is for convenience since otherwise we need to discuss the case

 

x, =

  , which will drag our analysis to a different ption A3 is implicitly assumed in many analysis of linear programming, since the rows of Q could be preprocessed such that the reduced Q has fu row rank. Assumption A4 facilitates the conversion from a unit cube

focus. Assum

ll

 

0,1d to  through the inverse c.d.f. transformation.

We define a mollifier function  , : :

 

min T

 

,

. . =

x

, s y

   B y

Tx

 

s t Qy  

=1

=



ni logyi. In the

(4)

where B y

 

 following, we call

 

x,

  the recourse function, and ,x

 

 the molli-fier function. We let

 

1

1 1

, ,

T

n

g y B y

y y

 

   _  _

  

 

2

 

2 2

1

1 1

diag , , .

n

H y B y

y y

 

   _ _

   (5)

is in fact a barrier function widely used in the

( )

B y

Interior Point Method for linear programming, and its properties are well studied. As 0, the convergence of __,_x

 

 and



* *



,x, ,x

y u i in the following

theo Theore

s stated rem.

m 2.1. For any xX,

 

0,1, let



y*,x,u*,x



e mollifier be the optimal primal and dual solutions of th

(4)

function, then

 

, 0

= ,

lim x x



   

 



* *

 

* * , ,

0

, ,

lim yx ux y ux x





 ,

where



y u*x, *x



unctio

is an optimal primal and dual pair of the recourse f n 

 

x, .

Proof. Due to the barrier function , the objective fu

( )

B

nction of ,x

 

 is strictly conv Together with the relative c eness assumption, it is clear that

 

,x



ex. omplet

  has an unique optimal solution. Since the pro- non-empty relative interior by assumption, the central path





y* u*





blem has

,x, ,x

  _ exists for each x, and con-

verges to the analytical center of the optimal set, see

property of

Roos et al. [15] Theorem I.30 and its Definition I.20 for analytic center. 

The converging



* *



,x, ,x

y_ u_

tant topic in

(central path) as

ap-pe

where . Clearly

0

_ 

has been an impor Interior Point Me d has been extensive studied by many authors. For interested readers, in addition to the reference given in the proof, we refer the extensive research in Megiddo [16], an early work Fiacco [17], and a survey of degen-eracy and IPM by Güler et al. [18]. For readers interested in the interior point method in general, we refer Nesterov and Nemirovskii [19], Renegar [20] and Wright [21].

The KKT condition of the optimization problem thod, an

ared in (4) is





, , , 0,

T x

s g Q u F y u

Qy Tx



 



   

_ _

 

 

( 1) , :

n m m

x

F_ _     F__,_x

 

  , ,

is infinitely differentiable. Furthermore,

 

* T

, ,

,

0

x x

y u

H y Q

 

F

Q



  

 _ _

 



, 0

,

x

F

I

 

 

 _{  }



  (6)

where I is an identity matrix, _{ }y u, F,x is invertible

since H( ) is positive definite. the implicit functio eorem, *, , *,

Hence by

n th yx ux are infinitely differentiable

functions of  . So ,

 

= *,

 

*,

T

x  s yxB yx is

infinitely differ tiable. Proposition 2.1. For a



 en

ny xX,

 

0,1 , ,x( ) :

 C .

 _{  }__ 

In the following, we directly derive t e (strong) partial de

h rivatives of D ,x

 

 

  for all 1. Note that there are 2d 1 number of  s satisfying 1. We also prove these partial derivatives are finite for all 

 

0,1 ,

xX . Finally, we show that their limits are finite when

0

_  <

. Hereinafter, a vector v< or a matrix

M 

Prop

means the inequality holds c nent-wisely. osition 2.2. For all xX,

 

0,1 ,

ompo

 

  *

,x u ,x

 =  < .

  

= <

x ux

Furthe lim0 ,

 

 *

rmore,   

grangian functi o

.

ization

Proof. The La on of the ptim

problem in (4) is



, ,



L__,_x y u s yT B y





u QyT uTu Tx.

Since

T



* *



,x ,x, ,x, = ,x

L_ y_ u_  _

 

 ,

 





 









* *

, ,

, , , ,

* *

* ,

* * *

, , ,

* , *

, *

,

= ,

=

= ,

x x

x x x

, ,

* *

,

, ,

,

x x

x T

u

L u

Q u

  

x x x

x x

x

L y

L L y

y u

y

u s g y

u

Qy Tx

u

 

    

 



  

 



 

  





 

  

 

    



  

 

  

 

e last equality follows the KKT condition.

 



where th

Clearly u*,x

 

 <. As  0



 , u*,xu*x by Theo-

rem 2.1. We let

 

*



*

,x diag y ,x , x diag y .

   

  _x



at the

Recall th *

x

y

1,

refers to the limiting point defined in the Theorem 2. not an arbitrary optimal solution of

 

x,

  .

Lemma 2.1. For any xX,

 

0,



1 ,

 

*

 



₂



1

, = , ,

T

x x

u_   Q_ Q

 





* 2 2

, = , , .

T T

x x x

y_  _ Q Q _ Q 

   1

Proof. F



,y,x,u,x



=

* *

0 defines implicit functions



* *



,x, ,x

  ,

y u . With _{ }y u, F,x, F x given in (6),  y u, F,1x



 can be explicitly computed as





















1 1

1 1 1 1 1

1

1 1 1

,

T T T T

T T

I Q QH Q QH H Q QH Q

QH Q QH  QH Q

 

    



  

 _ 

 

 _ 

 

1

H



where is a shorthand notation of

 

* ,x

H y_

H . Hence

by the i licit function theorem

 

* mp

 

, 1

( , ) , , *

,

= ,

x

y u x x x

y

F F

u



  

 



 

 

_ 

 



 

(5)

tion 2.3. For all 1  . Furthermo essentia

a 2.1,

(7)

Hence and (5).

Proposi xX,

 

0, ,

 





1

2   _2 T 

re,

,x Q ,xQ

 



2

 

0 , 0

lim x  

Proof. By Propositio

lly. n 2.2 and Lemm

 





,x Q ,xQ

 

  

2  2 T 1

 

2

,x < , ,x X.



  

     By Theorem

2.

If the optimal set of 1,

 





1

2 2

, 0

= 0 .

lim x Q xQT



  

 

  

 

x,

 

ce the a

is non-degenerate, 2 T

xQ

 is non-singular, hen bove limit is zero. If al set is degenerate, then the limit 0

Q

the optim  is not

defined. However, in the following, we sh at the degenerated case has zero Lebesgue measure, hence the limit is zero with probability one. Let’s first consider a special case of degeneration. Let the first

m

columns of Q be an optimal basis B and

1 , 2, , m 0, m1 n

y y  y  y _ y  be e the set







* 1



= | = , = 0, , , > 0

E  y B Tx y y  y

ow th

degene

0

optimal b 

asic feasible solu

rated

. Since

set 0 has zero

a tion. Defin

1 2 0

B m

m



1 2



= | = 0, , , _m>

Y y y y  y

easure in _m

, and B is both



for an arbitrarily d

n.

Lebes-gue m injective and

sur-jective, hence m E



m B Y





0. Clearly the same

argument holds egenerated optimal

basis B with an arbitrarily chosen degenerated basic colum Furthermore, there are only finitely many ways to choose basis and degenerated columns. Hence the total measure of the set  which leads to a degenerated

 

,x

  is zero. Hence we conclude that the limit is ntially. 

We continue t ca zero esse

o lculate higher order partial deriva-tives. Note that 2,x

 

 is a m m matrix, and

 

_{ }

3 

 ,  2

,

= .

x

i j k k ij

                _ _

In general, for any kd and any set of i₁,,i_k,

 

2 , 3 x i _ik     

  s a m

 



 i m matrix, and

 

_{ }

, 2

,

1 2 3 _{1 2},

= .

k x

x

i i i_k i i_k

i i                       _  _

Proposition 2.4. For any xX,

 

0,1 , and any set of indices i1,,ik,

 

, 1 < , k

 

, 0 1 = 0 lim k x i _ik            essentially.

Proof. We first prove the conclusion f r , and extend by induction.

o k= 2





1

2 2 , ( ) = , T x x k k Q Q              

















1 1

2 2 2

, , , * 1, , * 2, , 1 2 , * , , 1 2 , = 0 0 0 0 = 2 0 0 0 ,

T T T

x x x

k x k x T x k n x k T T x

Q Q Q Q Q Q

y

Q Q Q

y

Q Q Q

                          _                 _            _     _                 (8) where

 





* 1

, , * 2 2

, , ,

= =

j x T T

x _jk x x

jk k

y

y Q Q Q

        _ _     _ _   

is shown in lemma 2.1. Clearly (8) is finite for > 0. Now taking limit 0, by Theorem 2.1 and

tion 2.3:

Proposi- 

2 , 0

= 0 essentially.

lim x      k    

Furthermore, we prove by induction that 

 

2 ,  x i _ik          3 i _ik x       _

  is in the form 2 S Q



,,x,,x



,

ere S

 

 is an algebraic expression invo

wh lving

multi-plicati ummation

claim

on and s

is true for

of



2



1

, , ,

T

x Q xQ

 



  , Q. The

 2

_{ }

,x k



 

 it is



 in (8). Suppose

true for 2

 

, 3 1 x i _ik         _

  , then by the chain rule,



the term



2 ,

T x

Q_ Q 1 panded into multiplica-

tion of th ,xQ



, Q, hence the

will be ex

e same terms

form



1 2 , , T x Q    



, ,



2 S Q,x,x is established. It is clear that





0 2 0 essentially

lim S Q,,x,,x  by the Theo-

rem 2.1 and Proposition 2.3. We also der e hiv igher order partial derivatives explicitly in the Appendix A.

Theorem 2.2. For any 1 ,



 

, 0,

(6)





1 ,

2

1 2 2

( ) < ;

x n 

  



* *

1

, 1, ,

2 2

0

( ) , < .

lim x x m x

n

u u

 

 

 

  

  

Furthermore,





1

2 2 2

* *

1, ,

2 2

, .

sup _x _{m x}

x X

u u



 

 

  

Proof. follows the definition of the norm 1 n _ e

finite-, Propositions 2.2, 2.3, 2.4, and Theorem 2.1. Th

ness of u*j x, , j= 1,, mption of the

. Furthermo

m, follows the relative com-pleteness two-stage linear problem and

duality th re,

assu

eory X is comp t, hence

o (1 gence e.

ac

<

 . 

3. First Order Convergence Rate

We first discuss the convergence of the objective func-tion specified in the approximati n model (2) to the ob-jective function of the true model ), and the

rat

Theorem 3.1. For all xX ,

 

 









2 1

1, 0,1

=1

, k , k .

d d

k

x d w x

K



  



   





Hence,

K

log d

K _K 

.

Proof. For notation convenience, define operators and



,



K

k k

w  x  



_{ }_0,1

 

=1

,

d k

x d uniformly_



E

K

A as

 

_{ }_0,1d

 

d ,

E f 



f  _

 

=

 

K

k k K

A f w f

=1 k





.

Then for any xX,

 

0,1 ,K,

 







 



 







 



 







 



 









,

, ,

,

, ,

,

K

x

x K x

K x K

E x A x

E x E

E A

A A x



 



 

  

   

   

Taking limiting on both sides, then the first and third term on t nd side go to zero by Theo-rem 2.1, and the sec is bounded by the classical convergence rate grid method, see (3), Proposi-tion 2.3 and Theorem

Let the objective function of the true problem (1) and the approximated problem (2) be

0

_ 

he right ha ond term of sparse

2.2. 

 

_{ }

   

 





0,1

=1

= , d ,

= , ,

d K

T k k

K

k

z x c x P

z x c x w x

T_x   

 









an



d let the optimal objective value and optimal solution set of the true model and approximated model be

* * * *

, , _K, _K

3.3 state the resu

z X z X respectively. Theorem 3.2 and Theorem lts for the optimal objective value and the optimal sets separately.

Theorem 3.2. The optimal objective value converges, i.e., limKz*K z*, and the rate of convergence is





2 1

* *

1, log

.

d

K d

K z z

K 



 

Proof. For the minimization problem we note that

 

*

 

z x z x for any * * ,

x X xX, and

 

K xK zK x for any *

, K K

z x X xX . Let







2

1, log =

d

K d

K 1

K

 



 , then

 

   

 

* *

=

 

K K K K K K

K

z 

 _K _K _K 

z x z x z x z x z x z x

x z x

   



  

 

  



 





 



 

* * *

* ,



*

* =

K K K K K

K

z x z x z x z x x

z x x 

  

   

 

   _K

where the inequalities follow Theorem 3.1. Theorem 3.3. For any

K

z

z x z



* * *

,

K K

x X x X ,

K

x

1) is feasible;

2)

 





 

2 1 *

1,

log 2

d

K d

K z x z x

K





 

  ;

*

of a subsequence 3) For any clustering point x

*

, ,

K_t K_t K_t

x t x X , * *

x X ; furthermore, *

 

* = if X x is a singleton, then x*=x*.

Proof. xK satisfies the first stage constraints and by

the relative completeness assumption, xK is feasible.

To show that xK is also very close to a mal

solu-tio m

3.1.

ny opti n, we apply the similar technique used in Theore

 

*

 

*

K K K K K

z x z x z x z x



2_K, 2_K



,

 

since

=

K

z x z x 

 

K z xK zK xK K

 

      by Theorem 3.1, and

 

*

K z xK z x K

 

     by the steps in the proof of

Theo _x_*

is a clustering point, rem 3.3. Since

   

* *

 

* ₀

limt _Kt

z x z x   z x z x  by the ine-

quality above. As a special case, if X =

 

x* , then

* *

=

x x . t

(7)

based the uniform conve

bsequ al sets are

eto , and on xpect

op-timality of clustering points. The result h [23,24

onverg c nvergenc

K

he sparse grid method for the stoch -mming not only converges but

order rate. Our constructive proof

Mathematics, Philadelphia, 2005.

R. J.-B Wets, “Epi-Convergency of Con- Programs,” Stochastic and Stochastic Re-ports, Vol. 34, 1991, pp. 83-92.

chastic Pro- rgence, see Römisch [22]. The result is stated in su ence since the optim

not necessarily singl ns e can only e

can also be proved by epi-convergence, see Attouc ]. Epi- c en e is implied by uniform co e, see

all [25].

4. Conclusions

The modern sparse grid method is very efficient in nu-merical integration for integrant functions the Sobolev space dr. However, the integrand function in two- stage linear programming does not belong to r. We prove that

d



astic two t

stage linear progra converges in the first

also

uses a logarithmic mollifier function from interior point method.

5. References

[1] J. R. Birge and F. Louveaux, “Introduction to Stochastic Programming,” Springer, New York, 1997.

[2] S. W. Wallace and W. T. Ziemba, Eds., “Applications of Stochastic Programming,” Society for Industrial and Ap-plied

[3] A. J. King and vex Stochastic

[4] A. J. King and R. T. Rockafellar, “Asymptotic Theory for Solutions in Statistical Estimation and Sto

gramming,” Mathematics for Operations Research, Vol. 18, No. 1, 1993, pp. 148-162. doi:10.1287/moor.18.1.148 [5] A. Shapiro, “Asymptotic Analysis of Stochastic Programs,”

Annals of Operations Resesrch, Vol. 30, No. 1, 1991, pp. 169-186. doi:10.1007/BF02204815

[6] J. Dupacova and R. Wets, “Asymptotic Behavior of Sta- tistical Estimators and of Optimal Solutions of Stochastic Optimization Problems,” Annals of Statistics, Vol. 16, No. 4, 1988, pp. 1517-1549. doi:10.1214/aos/1176351052 [7] T. Pennanen and M. Koivu, “Epi-Convergent Discretiza-

tion of Stochastic Programs via Integration Quadratures,”

Numerische Mathematik, Vol. 100, No. 1, 2005, pp. 141- 163. doi:10.1007/s00211-004-0571-4

[8] S. A. Smolyak, “Interpolation and Quadrature Formula for the Class a

s

W and a s

E ,” Doklady Akademii Nauk SSSR, Vol. 131, 1960, pp. 1028-1031. (in Russian, Eng- lish Translation: Soviet Mathematica Doklady, Vol. 4, 1963,

9129717644 pp. 240-243).

[9] T. Gerstner and M. Griebel, “Numerical Integration Us- ing Sparse Grid,” Numerical Algorithms, Vol. 18, No. 3-4, 1998, pp. 209-232. doi:10.1023/A:101

Cost

[10] M. Chen and S. Mehrotra, “Epiconvergent Scenario Gen-

eration Method for Stochastic Problems via Sparse Grid,”

Stochastic Programming E-Print Series, Vol. 2008, No. 7, 2008.

[11] L. C. Evans, “Partial Differential Equations,” American Mathematical Society, Vol. 37, No. 3, 1998, pp. 363-367. [12] G. W. Wasilkowsi and H. Wozniakowski, “Explicit

Bounds of Algorithms for Multivariate Tensor Product Problems,” Journal of Complexity, Vol. 11, No. 1, 1995, pp. 1-56. doi:10.1006/jcom.1995.1001

[13] H. Brass and G. H¨ammerlin, Eds., “Bounds for Peano kernels,” Vol. 112, Birkhäuser, Basel, 1993, pp. 39-55. [14] H. Wozniakowski, “Information-Based Complexity,” An-

nual Review of Computer Science, Vol. 1, No. 1, 1986, pp. 319-380. doi:10.1146/annurev.cs.01.060186.001535 [15] C. Roos, T. Terlaky and J.-P. Vial, “Interior Point Methods

for Linear Optimization,” Springer, New York, 1997.

ew De- [16] N. Megiddo, “Progress in Mathematical Programming, Chapter Pathways to the Optimal Set in Linear Program- ming,” Springer-Verlag, New York, 1989, p. 132. [17] A. V. Fiacco, “Introduction to Sensitivity and Stability Ana-

lysis in Nonlinear Programming,” Academic Press, N York, 1983.

[18] O. Güler, D. den Hertog, C. Roos and T. Terlaky, “ generacy in Interior Point Methods for Linear Program- ming: A Survey,” Annals of Operations Research, Vol. 46-47, No. 1, 1993, pp. 107-138.

doi:10.1007/BF02096259

[19] Y. Nesterov and A. Nemirovskii, “Interior Point Polyno- mial Algorithms in Convex Programming,” Society for

2001.

Interior-Point Methods,”

Soci-800

Industrial and Applied Mathematics, Philadelphia, 1994. [20] J. Renegar, “A Mathematical View of Interior-Point Me-

thods in Convex Optimization,” Society for Industrial and Applied Mathematics, Philadelphia,

[21] S. J. Wright, “Primal-Dual

ety for Industrial and Applied Mathematics, Philadelphia, 1997.

[22] W. Römisch, “An Approximation Method in Stochastic Optimal Control,” In: Optimization Techniques, Part 1,

Lecture Notes in Control and Information Sciences, Sprin- ger-Verlag, New York, 1980, pp. 169-178.

[23] H. Attouch, “Variational Convergence for Functions and Operators,” Pitman (Advanced Publishing Programs), 1984.

[24] H. Attouch and R. J.-B. Wets, “Quantitative Stability of Variational Systems: I. The Epigraphical Distance,” Tran- sactions of the American Mathematical Society, Vol. 328, No. 2, 1991, pp. 695-729. doi:10.2307/2001

Vol. 11, No. 1, 1998, pp. 9-18.

[25] P. Kall, “Approximation to Optimization Problems: An Elementary Review,” Mathematics of Operations Re- search,

doi:10.1287/moor.11.1.9

(8)

tion and

or a random vector on with invertible cumulative ion om

Appendix A. Inverse Transforma

Truncation

F 

distribution function F1: 0,1

 

d  , the integrat domain of a mollifier function can be transformed fr

 to

 

0,1d:

   



1

 



,x F d = ,x F d ,

 _[0,1]d 

      





then we apply the sparse grid method to generate scenar- s and weights

io for on the cube . We need to

check the properties of the integrand . Its differentiability only depends on

 

0,1d

function

 

1 ,x F 

 _  _ 1

( )

F 

since ,x C ( ) 

  . Most commonly used invertible cu-mulative distribution functions, for e ple, inverse of normal distribution function 1

 

 , is also in C

xam



. The finiteness of the partial derivatives of ,x F 1

 

 _



only d



also epends on F1 



since partial derivatives of

 

,x



  are finite for any multi-index  1 compo-nent-wisely.

The higher order partial derivative of a posite function can be calculated explicitly. For

: f g

h x     X  y Y   z , we can ap-ply the Faà di Bruno’s formula:

com

 









 

_{ }

 

_

_{ }

_

 

_{ }

n

d

= = ,

d

n

n P B

n

P B P

f g x f g x f g x g x

x











where n is the set of all partit 

ions of the set Jn of

integers 1,,n. A partition of Jn

s of

is a family o ir-wise di nempty subset

f pa

sjoint no Jn whose union is

n

J . A means the cardinality of the set A. For a vec-p

rm tor com site function:

: f g ,

h    y Y   z 

We apply Tsoy-Wo Ma’s higher chain fo ula [26]: o

x X

, ,  =1

1 1 = !

! !

mk p

m s k

m _pk

s p m k k k

z



  z y

m p

x   y x

 _ 

 











_  _

where  is the set of all decompositions of multi-index  with multiplicities Calculation involving a multi-

x

m.

inde



1, ,



 

    follows rules:

=1 =1

= , ! = !,

= .

j j

j

z

=1 =1

= _jj,

j j j

x x z

x x



 _ 

  

 



A multi-index



 



_

 

_





 decomposes into s parts 1, , s

p  p in  with multiplicities m1,,ms in 

 respectively if the decomposition equation

1 1 2 2

= m p m p m p_s _s

   

holds and all parts are different. The total multiplicity is defined as

1 2

= _s.

m m m   m

The list



s p m, ,



is called a -decomposition of . nsure all parts are different we may impose

1 2

0

To e

s

p p p

     , where    means ₁=_j,,_j_₁=_j_₁ , but _j<_j , for a

j.

For the problem under discussion, let 1 compo-nent-wisely, i.e., r= 1, note that are 2d 1 number of such  (s). Fol g the higher c formula, we get

lowin hain rule

 

1 ,

1 1

= .

x

k

F

 

  





 

 

 



,

!

m p

m s k

x

m  _pk



 

 

 







ce

, , =1 ! !

s p m k mk pk  

Furthermore, sin m02,x



 by Proposi-

tion 2.3, the computation of



= 0 li

 

1 F



0 ,

lim   x  







 

can be simplified significantly. In this case, only the de-compositions



1, , e_i



,

-zeros in the

i

e is the th unit basis of cor-

respond to non ula. Otherwise, for

i

above form

d  ,

, m=m1  ms, and lim 0 ,

 

= 0

m x m

 _  

 .

2

s

Hence

 

,

0 0 =1

im

d

x i

i i w





1

, =

lim x F l i

* , =1

= .

d

i i i x i

u w



 

 

  

 

 

 

 



   

 

 





 







Hence,

 

1 *

, 1 ,

0

= <

d

x i

 

 

=1

2 _{1 2}

,

lim i x i

d _i

F u

 

 _



  









 _



  

 





if and only if



< i

 

  



 almost surely 1, i= 1, ,d



    . For some distributions, the condi-tion might not hold. For example, the inverse of a cumu-lative function of the normal distribution does not have this property nearby 0 or 1. To remove the singularities, truncation of the cube [0, 1] could be applied:

where

 _0,1dh x x( )d  __,1__dh x

 

d ,x



0 << 1 han we need t

is a small positive number. To

com-d sicom-de rd sparse grid

method, o change the variable to , where pute the right using the standa

(9)







= 1 2 : 0

x    y ,1



d



,1



d

1

 

  . Hence

,1 

 

d = 1 2





 0,1



1 2



d . d

dh x x dh y y

      



Hence for a two-stage linear problem with an invert-ible but unbounded cumulative distribution function

F , we shall first generate the standa weights





,

















1

= 1 2 , = 1 2 d ,

k _F k _wk _wk

  _{ }   _ _ 

finally use the

and



k, k



w

  in the approximation model (2)

The er proximation model is exactly the .

ror of this ap

sum of truncation error et, and sparse grid

approxima-tion error es. Erro

rd grid points and

=1

K k k

w

 using the sp

k arse grid method,

then scale and transform them to the original random variable  by

r et goes down with  and error

s

e goes down with increasing K at der rate sparse g ethod.

the first or