• No results found

First-Order Cost Function Error for a Partially-observed Sys- Sys-tem with T-LQG Policy: Given that process and observation noises are zero mean

i.i.d. Gaussian, the initial error is zero mean Gaussian, and all the functions are in C1, under a first-order approximation for the small noise paradigm, the stochastic cost function is dominated by the nominal part of the cost function, and the expected first-order error is O(δ). That is,

E[J˜1] = O(δ), and E[J] = Jp+ O(δ).

Moreover, by choosing δ =qlog(1), we have

E[J˜1] = O(1−γ), and E[J] = Jp+ O(1−γ),

for some 0 < γ  1, which shows that this error tends to zero with a near-first-order rate as  ↓ 0.

Proof 20 Let ˜J1l:=PK−1t=0 (Cxtx˜tl+Cutu˜lt)+CxKx˜Kl . Then,

J˜1l :=

K−1

X

t=0

(Cxtx˜lt+Cutu˜lt)+CxKx˜lK =

K−1

X

t=0

(Cxtx˜lt−CutLt˜elt)+CxKx˜lK.

Also note E[˜xl0] = E[˜x0] = E[x0 − ˆx0] = 0, ˜el0 = 0, and E[wt] = E[vt] = 0 for all t.

Then, we use Lemmas 6, 7, and 8. First, we calculate E[˜elt], 1 ≤ t ≤ K:

E[˜elt] = ˜Txt0E[˜x0] +

t

X

s=0

T˜ws,tE[ws] +

t

X

s=0

T˜vs,tE[vs+1] = 0.

Then, we calculate E[˜xlt+1], 0 ≤ t ≤ K − 1:

E[˜xlt+1] = ˜Axt0E[˜x0] +

t

X

s=0

A˜ws,tE[ws] +

t

X

s=0

A˜bs,t˜ E[˜es] = 0.

Therefore, we have:

E[J˜1l] =

K−1

X

t=0

(CxtE[˜xtl] −CutLtE[˜elt])+CxKE[˜xlK] = 0.

Now, we take expectation of both sides of (6.58b). Since, for ω /∈ Ω(δ), J ≤ M , then

E[J − Jp] = P (Ω(δ))(E[ ˜J1l] + O(δ)) + M (1 − P (Ω(δ)))

= P (Ω(δ))O(δ) + M (1 − P (Ω(δ))) (6.59)

Now, the last expression is the same as (4.19). Although Ω(δ) is not the same as in Theorem 2, P (Ω(δ)) is still the same. In the proof of Theorem 2 while we discussed on the probabilistic argument and choosing the proper δ, we showed that by choosing δ := q− log(), the E[J − Jp] = O(1−γ). The same argument follows through and this theorem is proved.

Hence, the expected stochastic cost is equal to the nominal cost with a high proba-bility as  ↓ 0. Therefore, it follows that the open-loop nominal design can be done decoupled from the closed-loop design, summarized below:

Corollary 6 Decoupling Principle: Decoupling of the Open-Loop and Closed-Loop Designs Under Small Noise. Based on Theorem 8, for a partially-observed system where the function are in C1 under the small noise paradigm, as  ↓ 0, the design of the feedback law can be decoupled from the design of the open-loop optimized trajectory.If the functions are in C1, this result is O(1−γ)-optimal for 0 < γ  1 as

 ↓ 0.

Proof 21 Using Theorem 8, for ω ∈ Ω(δ) we have E[J] = Jp+ O(1−γ), which is the cost of applying policy πt(z0:t) = utp− Lt(ˆxt− xpt) to the stochastic system (note that ˆ

xtis a function of z0:t). Now, suppose π is the optimal stochastic policy. We showed in the proof of Corollary 5 that for this policy, we have E[Jπ] = J∗p+ O(1−γ). Now, by construction Jp ≤ J∗p, and

E[Jπ] = J∗p+ O(1−γ) ≥ Jp+ O(1−γ) = E[Jπ] + O(1−γ)

As a result, policy π is within O(1−γ) of the optimal stochastic policy.

6.3 Near-Second-Order Optimality of The Deterministic Law

In this section, we provide a second-order analysis of the deterministic feedback law and show that applying the optimal feedback law of the deterministic problem to the stochastic problem results in a near-second-order optimality as well. Therefore, we improve the results of Section 6.1.

Assumptions: Other than the assumptions of Section 6.1, we assume for the analysis of this section that all the functions (including the dynamics and observation

models, feedback law, and the cost functions) are in C2, i.e., they are continuously differentiable to the second-order.

Second-order expansion of the control law: Here, we will use the same policy ut= πdt(z0:t) defined in Section 4.4. However, as opposed to that section, for the analysis of this section we expand this law to the second-order. Let us define upt := πdt(zp0:t),

Also note the simplified from of the second-order terms comes from the fact that we can simplify the following expression:

Therefore, the second-order term is indeed the following:

Second-order expansion of the system equations: We obtain the second-order expansion of the process model around the nominal trajectory, for 0 ≤ t ≤ K − 1:

x˜t+1= f (xt, ut) − f (xpt, upt) + σftwt (6.61a)

as (||˜x||+ (||˜u||) ↓ 0, where we have:

Feedback compensation: Next, we replace the feedback law of (6.60c) into (6.61c).

Note that after th feedback compensation, the first-order terms of (6.61c) which are linear in ˜ut, result in both first-order and second-order expressions in ˜xt. That is because, according to (6.61f), the observations can be written in terms of ˜xt. On the other hand, replacing the order terms of the feedback law into the second-order terms of the dynamics in (4.43c) results in second-, third- and fourth-second-order expressions in ˜xt. However, since the error term in (6.61c) includes o(||˜x||2), the third- and fourth-order terms can be ignored. As a result, just like the fully-observed case of (4.44), we replace those terms with o(||˜x||2).

Next, we simplify the second-order expansion of the control error:

˜

+ the following scalar value

˜zTi Hπtkij˜zj =(Hix˜i+ Mivi)THπtkij(Hjx˜j + Mjvj) + o(||˜x||2)

xTi HTi HπtkijHjx˜j + ˜xTi HTi HπtkijMjvj + vTi MTi HπtkijHjx˜j + vTi MTi HπtkijMjvj + o(||˜x||2)

xTi HTi HπtkijHjx˜j + 2˜xTi HTiHπtkijMjvj + vTi MTi HπtkijMjvj + o(||˜x||2).

Note that the error in the above expression is in fact O(||˜x||4). Similar terms in the next equations also will be treated the same as long as there is an o(||˜x||2) error in the overall expression.

Now, we can simplify the second-order expansion of the dynamics:

˜

+ for 1 ≤ k ≤ nx, we can evaluate the following scalar value, and define the related

matrices, such that

+

Now, in (6.63d), the linear recursion in ˜x can be solved by defining the Q poly-nomial the same as in (6.53) and using (6.54). In particular, there exists matrices Uxt0, 0 ≤ t ≤ K −1, Vs,tv , 0 ≤ s ≤ t, 0 ≤ t ≤ K −1, and Wws,t, 0 ≤ s ≤ t, 0 ≤ t ≤ K −1

where Uxt0 := (Pts=0QsU0,t−s), Vvs,t := t−sP

r=0

QrVs,t−r, 0 ≤ s ≤ t, and Wws,t :=

Qs−tGs, 0 ≤ s ≤ t. Note in the last equation, we used the following summation exchange formula (which can be easily proven by writing expanding and collecting the terms)

for some xr and fs,r. Therefore, we wrote the following

t

Finally, note that we simplified the following expression (by redefining y = t − s and the relabeling):

Validity region: Similar to the fully-observed situation, the definition of ˜xt :=

xt− xpt. Therefore, the properties of O(||˜xt||) that we have proven in Section 6.1 for a deterministic feedback design still hold for the above Taylor expansion, as well.

Particularly, we proved that for πd design, O(||˜xt||) = O(δ) in a set Ω(δ) properly defined as before with probability 1 − o(). Hence, for ω ∈ Ω(δ), O(||˜xt||2) = O(δ2).

Thus, for ω ∈ Ω(δ) (the same set and with the same probability), we have:

˜

+

t

X

s=0 nx

X

k=1 t−s

X

i=0 t−s

X

j=0



x˜Ti H(h,f,h)t−s kijx˜jxTi H(h,f,v)t−s kijvj+vTi H(v,f,v)t−s kijvj



Qsenkx

t

X

s=0 t−s

X

r=0 nz

X

j=1

xTrHhrjx˜r)QsBt−sLr,t−senjz+O(δ2). (6.65)

Second-order expansion of the cost function: Similarly, we obtain the second-order Taylor series expansion of the cost function around the nominal trajectory:

J =Jp+ ˜J1+ ˜J2+ o(

K−1

X

t=1

(||˜xt||2+ ||˜ut||2) + ||˜xK||2) (6.66a)

=Jp+ ˜J1+ ˜J2+ o(||˜x||2+ ||˜u||2), (6.66b)

as (||˜x||2+ ||˜u||2) ↓ 0. Moreover, we have:

• Jp:=PK−1t=0 ct(xpt, upt)+ cK(xKp ) denotes the nominal cost;

• ˜J1:=PK−1t=0 (Cxtx˜t+Cutu˜t)+CxKx˜K is the first order cost error;

• ˜J2 :=PK−1t=0 (12x˜tTCxxt x˜t+12u˜TtCuut u˜t+ ˜xTtCxut u˜t)+12x˜TKCxxKx˜Kis the second order cost error.

• J2 := Jp+ ˜J1 + ˜J2 is the second order approximation of the cost function;

• Cxxt = ∇2xxct(x, u)|xp

t,upt, Cuut = ∇2uuct(x, u)|xp

t,upt, Cxut = ∇2xuct(x, u)|xp

t,upt, and CxxK = ∇2xxcK(x)|xp

K, where we have used the fact that ct∈ C2.

Next, we provide the main result regarding the expected second order error of the cost function.

Theorem 9 Second-Order Cost Function Error for a Partially-Observed