• No results found

1 0 10 g u =[ τ ,τ ] torque θ θ x =[ θ ,θ , θ , θ ] θ θ ˙ ˙ ˙ ˙ m m l l m m Maple Matlab symbolictoolbox • debugging Matlab controltoolboxes • •

N/A
N/A
Protected

Academic year: 2021

Share "1 0 10 g u =[ τ ,τ ] torque θ θ x =[ θ ,θ , θ , θ ] θ θ ˙ ˙ ˙ ˙ m m l l m m Maple Matlab symbolictoolbox • debugging Matlab controltoolboxes • •"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

mkqn libxz

zebefa yibdl ozip

:zeillkzexrd

ewae` ziteqdaeyza wtzqdloi` .mini`zn miaeyigaellnazeaeyzd lk z`xiaqdlyi

.xaqdk

,(

debugging

ikxevloaenkozip)dybda

Matlab

ly

control toolboxes

djeznze ewtdynzydloi`

.zxg`yxetnaoiievokm``l`

zipkhze earjeqglzpnlr(zeye

Maple

e`)

Matlab

ly

symbolic toolbox

axfridlulneneozip

.zerbiin

m 1 m 2

l

2

l

1

x

1

x

2

g

ixqg migiyw miwlg ipy zlra rexfd .i nin e xeyina drpy rexf ly zkxrna weqrp df libxza

xe`izl ozipzkxrnd avn .

m 2

e

m 1

zeiz ewp zeqnze`vnp mdizevwa xy`

l 2

e

l 1

mkxe`y dqn

zkxrnd .

x = [θ 1 , θ 2 , ˙ θ 1 , ˙ θ 2 ] T

:avnxehweea onqpxy`

˙θ 2

e

˙θ 1

zeiziefd zeiexidnde

θ 2

e

θ 1

zeiefd i"r

g

dkiynd gek .

u = [τ 1 , τ 2 ] T

epnqp .drepzdixiv lr(

torque

) lezithpnen zlrtd i"rdxwal zpzip

.zakey`idyk

0

e`z ner zkxrndyk

10

ekxryxhnxt`ed

:zkxrnazegekd of`nz`zex`znd drepzdze`eeyn

(2)

u = M (θ 1 , θ 2 )

 θ ¨ 1

θ ¨ 2



+ v(θ 1 , θ 2 , ˙ θ 1 , ˙ θ 2 ) + g(θ 1 , θ 2 )

 θ ¨ 1 θ ¨ 2



= M(θ) −1 (−u + v(θ) + g(θ))

M =

 l 2 2 m 2 + 2l 1 l 2 m 2 cos(θ 2 ) + l 1 2 (m 1 + m 2 ) l 2 2 m 2 + l 1 l 2 m 2 cos(θ 2 ) l 2 2 m 2 + l 1 l 2 m 2 cos(θ 2 ) l 2 2 m 2



v =

 −m 2 l 1 l 2 sin(θ 2 ) ˙θ 2 2 − 2m 2 l 1 l 2 sin(θ 2 ) ˙θ 1 ˙θ 2

m 2 l 1 l 2 sin(θ 2 ) ˙θ 2 1



g =

 m 2 l 2 g cos(θ 1 + θ 2 ) + (m 1 + m 2 )l 1 g cos(θ 1 ) m 2 l 2 g cos(θ 1 + θ 2 )



zegeke milbetxhpvzegek ly xehwe `ed

v

.(

positive definite

`idz`fkke) dqn zvixhn dpekn

M

.(

g = 0

ykqt`zndf xehwe)dkiynd geknmiraepdzegeklyxehwe df

g

eqileixew

:(

control project files.zip

uaewa)ze`ad

Matlab

zeivwpetmkzeyxlz ner"d arn"-azkxrndzpigajxevl

zxfra zixnep divxbhpi`) sivx onfa zkxrnd ly divleniq zrvan

two link arm control •

oezp izlgzd avnn (

Matlab

a

ode45

ziivwpet ,dpzyn l eba onf rve 4 x qn

Runge Kutta

.

u(t, x)

dxwaziivwpetmroezponfoelga

m ewnksivxonfazkxrndly g` rvlydivleniqzrvan

arm noisy discrete control step •

zitvz dxifgn divwpetd .(dxwa yrx `ed

η

)

u + η

reaw dxwaze` dxwa ziivwpet mr j`

.(awrnjxevlmipezpdx`yz`dxifgndivwpetdsqepa)divleniqd rvseqaavndly(zyrex)

.divleniqzvixlyzil`efie dbvdl

show 2 link arm simulation •

.l"pdzeivwpetayeniyl ze`nbe

arm usage example1/2 •

(4wlgayeniyl)5zialibxzjezn

get kf P and K •

open loop

adxwa .1 wlg

aygpjk myl .dve`z menipinaexyiewa repirexfddvwy jkrexfdlydrepz xviildvxp dfwlga

.miwxtndlrievx dxwaze`leze`xinpeifhxwdagxnaievxdlelqndz`

`l iteqonfa)iteq mewine,

x (0) = [x 1 (0), x 2 (0), ] = x 0

ifhxwdagxna izlgzd mewinoezp .1

:d`adxignd ziivwpetz`xrfnny(ihilp` iehia)

x (t)

lelqn `evnljilr.

x (t f ) = x f

(re i

J = 1 2 t 2 f + 1

2 Z t

f

0

||¨ x(t)|| 2 dt

ew lr `ed lawznd lelqnd ik gipdl ozip .0 `id eteqaelelqnd zligza zexidndy dgpda

.xyi

agxnarexfddvwmewinnietind)

inverse kinematics

-dz`zepzepze`adze`eeynd ikgked .2

: 1

(miwxtndzeieflifhxwd

zeaygzdjez

x

e

y

odeizerlvyziefxyiyleynaxzil

x

dxivnziefdz`dxifgn

Atan(y, x) : R

2

→ [−π, π]

divwpetd 1

tan(y/x) : R

2

→ [−

π2

,

π2

]

l ebipaz`f,

y

e

x

lymipniqa

(3)

θ 2 = Atan2(s 2 , c 2 ) s 2 = sin(θ 2 ) = ±

q 1 − c 2 2

c 2 = cos(θ 2 ) = x 2 1 + x 2 2 − l 2 1 − l 2 2

2l 1 l 2

(a)

θ 1 = Atan2(x 2 , x 1 ) − Atan2(k 2 , k 1 ) k 1 = l 1 + l 2 cos(θ 2 )

k 2 = l 2 sin(θ 2 )

oezpyk 1 dl`y z`miniiwny

t = [0 : 0.01 : t f ]

mipnfa

x(t)

ikxrly xehwe (

Matlab

a) xev .3

zixnepxefb .mini`zndzeiefd ixehwez`2dl`yzxfra ,`vn.

x f = [−0.5, 1]

e

x 0 = [1, −1] T

ik oezp .zeizieif zeve`z lye zeiexidn ly xehwe zlawl miinrt zlaiwy zeiefd ixehwe z`

ly drepzd ze`eeyn t"r

(open loop)

dxwa xehwe `vn

g = 0

e

l 1 = l 2 = m 1 = m 2 = 1

dlrtdzen li k(qt`yrxzevixhnmr)

arm noisy discrete control step

aynzyd.zkxrny

lelqn yi m`d .lawzdydf z`e ievxdlelqndz` sxbabvd .l"pddxwadze` mr rexfdly

?eppevxkle bdxwaze` yx perevialj` 1dl`y zeyix z`miiwny

.avndly zeyrex zeitvz jezn i a onfadxwa .2 wlg

H

K +

η

+ v

+ -

H y

L

x

0

+

u

-

+ x

ix`pildaexiwdxeardzpapydxwamr(zix`pil-i`d)zkxrnd

jildz yrxsqeezi okenk .(zebx nziivwpet) i adxwaze` i"rzkxrnd lydxwarvapdfwlga

.(izin`davnlziaihi `ztqezk)zitvzyrxe(ievxddxwadze`lzi`xw`ziaihi `ztqezzxeva)

certainty equivalence

doexwraxfrped i azix`pilzkxrnlzkxrnd lydivfix`pilrvapjkjxevl

z`xtyl zpnlr .

steady state Kalman gain

d zxfra avnjexrye i a onfa

LQR

zxwaoiaalyl

.ix`pilote`amewnazixnepdivxbhpi` i"rjxreynd avndz`m wp avndjexry

l 1 = l 2 = m 1 = m 2 = 1

e

g = 10

ikegipd3-8zel`ya

(

˙x = f (x, u)

dxevdn) zkxrndlydwinpi dze`eeyn z`aezk .1

zix`pilzkxrnzlawl

u 0 = [0, 0]

,

x 0 = [ π 2 , 0, 0, 0] T

d ewpdaiaqzkxrndlydivfix`pilrva .2

.

˙x = Ax + Bu

(4)

x 0

ikxrz`e

l 1 = l 2 = m 1 = m 2 = 1

e

g = 10

mikxrdz`avd,zepekpjze`vezy` eel zpnlr

:zeidlmixen`zevixhndikxr.l"pd

u 0

e

A =

0 0 1 0

0 0 0 1

10 −10 0 0

−10 30 0 0

B =

0 0

0 0

1 −2

−2 5

zebx n ziivwpet `ed dxwad ze`y gpd :zn ewd dl`ya z`vny zix`pild zkxrnd xear .3

.

ol eby

:dxevdnonfazihxwqi ,dlewyzkxrn`vn (`)

x(t + ∆) = Fx(t) + Gu(t)

?

poles

dokid

∆ = 0.1sec

avd (a)

?daivizkxrnd m`d (b)

G

ly zipnidd enrdy ,xnelk .wtxnd lr zegek lirtdl zlekid z` ep ai`yzrk gipp

qt` `id

u 2 = 0

ydgpdae,zn ewddl`yayzihxwi dzkxrndxear .4

z`xrfnn

u(t) = −L T x (t)

yjk(hehxy d`x)

L

,

gain

xehwe`vn (`)

X

i=0

x T (i∆)Qx(i∆) + u 2 (i∆) 

Q =

1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

zpnlr .

debugging

jxevl `l` (

dlqr

oebk)

control toolbox

d ly zeivwpeta ynzydl oi`

drbdl rdni`znddbiqpdzgqep z`ayglozipdni`znd

Riccati

dz`eeynz`xeztl

(m ewzia libxzamzyx pyitk)

steady state

l

?aeyndmrzkxrnd ly

poles

dmddn (a)

?daivizkxrnd m`d (b)

zix`pild)zkxrndm`d .

y(t) = θ 1

xnelk,

θ 1

z`wxze`xlepizlekiaik( aladfsirqa)gipp .5

?

observable

`id(zihxwqi d

covariance

zvixhnmr

η

ilnxepyrx,

u (t)

dxwaze`lklsqepmiwxtnayrxnd`vezkikgipp .6

:y jk

σI

zipeqkl`

u(t) = u(t) + η η ∼ N (0, σI)

lr ezrtydl ddfote`a zix`pil`ld zkxrnd lr drityn

u

l iaihi `dyrxd zrtydik gpd

xnelk.3 dl`yaz`vnyzix`pildzkxrnd

x(t + ∆) = Fx(t) + G(u(t) + η)

= Fx(t) + Gu(t) + w

w ∼ N (0, W)

?

W

`ed dn

(5)

yjk

ǫI variance

mrilnxepzitvzyrx mbepyizn ewddl`yayjildzd yrxlsqepa .7

y(t) = x(t) + v

v ∼ N (0, ǫI)

zyrexdzihxwqi dzix`pildzkxrndly(hehxya

K

)

steady state Kalman Gain

dz``vn

-peta ynzydl ozip jk jxevl .(lirl zitvz yrxe zn ewd dl`ya mz`vny jildz yrx mr)

z`lawli k.(dflibxzlsxevnoexztd)m ewzialibxzamzazkxy`

get kf P and K

divw

:mi`admipezpdmr.(

500

-donf rva)

K(500)

zlawlzipkzdz`evixd

steady state

dzvixhn

σ = 0.1

yk6dl`yamz`vny itk`edjildzd yrx

.(dlrnivg)

ǫ = 360 π

,efdl`ya oezpyitkzitvzdyrx

.(yrx`ll)

x 0 = [ π 2 , 0, 0, 0] T

izlgzdavn

||A|| 2 F = P

i,j A 2 i,j

)

i = {1, . . . 500}

xear

||K(i) − K(i − 1)|| 2 F

z` d`xnyhehxy bvd (`)

.qpkzdjildzdy ew a.(

Forbenius Norm

d `id

?

K (500)

`iddn (a)

?aeh zeidlietvavndjexrym`d?mzlaiwy

P (500)

zernyndn (b)

oelykl re`zeipy20jynl4dl`yndxwaze`mrzyrexdzkxrnd lyzeivleniq100uxd .8

:zih xphqddgqepaaivpavndlyxzeiwiie njexrylawlzpnlr.(

ndle bzeiefdzg`)

¯

x t = ¯ x t|t−1 + K 500 (y t − C¯ x t|t−1 )

:z`

¯

x t|t−1 = ¯ x t−1|t−1 + Z t

t−1

f (x τ , u t−1 )dτ

zpnlrzix`pil`ldzkxrnd lydivleniqaynzyp,xnelk.

x ¯ t|t−1 = Fx t−1 + Gu t−1

mewna

zpn lr .zaxewnd zix`pild zkxrna ynzydl mewna (re i epi`y yrxd `ll)

x

z` m wl

`ll

arm noisy discrete control step

divwpetaynzydl ozipyex d ixnepdaeyigd z`rval

.dreaw dxwa ziivwpet mr

two link arm control

divweta e` (

0

qp`ix`ew zevixhn) miyrx

.`nbe ldvixx`zndsxbybd .elawzdydivleniqdipnflydnxbehqidbvd

sivx onfa

Reinforcement Learning

i"r dxwa 3 wlg

:xn`ndxg`awerdf wlg

K. Doya, Reinforcement Learning in Continuous Time and Space, Neural Computation 12,2000 http://www.cs.huji.ac.il/˜control/handouts/Doya2000.pdf

:qxewdxz`nxn`ndz` ixedlozip

lezithpnenzxevadxwazqipkmr(lwynxqghendvwadqn)mel ptlyzkxrnlqgiizndfwlg

.dgepnlyavnnzg`dtpdamel ptdz`mixdli knylgiaxinddxwadze` .dxivlr lrteny

.sxevnd ewa

TDLambda

ziixtqzgz`vnpihpeelxd ewd

`l xwaly d inlrval lkez`id xqgrhwenilyzy xg`l .

TDpendulum

`iddpeilrd divwpetd

ewd .dlrnldze` wifgdle miteptpxtqn i"rdqnd z`mixdl,d inldmeiqa,lbeqn xy` ix`pil

.mitqepmipezpe nlpyl endzxinyl

E

mya

struct

aynzyn

.xwadmrzkxrndzvxdl(xqg ewenilyzy xg`l)eazynzyne

E

z`zlawn

simulate

divwpetd

vikmbx`znxn`ndy era .libxzdreviaiptl(xeriydmekiqz`e)xn`ndz``exwl `nulnen

eaavndz`lbxzpep`,xwalzere i opi`

reward

ddiivwete(

f

)zkxrnd ze`eeyneaavna enzdl

.zere iel`zeivwpet

.enilyzy ewdz``edyibdlmkilrylk el`zel`ya

(6)

z` rvazy jk dze` milydl mkilr .

u = policy(E, x)

z` aygl dxen`

policy.m

divwpetd .1

.

E.c

aoezp

c

,reawd.

s(x) = tanh(x)

`id

s

divwpetdefd`eeyna.xn`na24d`eeynayaeyigd

approx.m

divwpetlz`xewezexey2nzakxenefdivwpet

epi ilr nlipxaky oezp l enzxfra

simulate.m

z`uixdl elkez

policy

z` enilyzy xg`l

.d`ad dl`ya enlleyx zy xwadzlertz`ze`xle(

E.mat

uaew)

z`dfmilydlmkilrylk (

TDpendulum

zvxd)xwaz inllyjildzuixdlelkezyzpnlr .2

ef divwpet .

E = DoyaU pdate(E, x)

xwad ly oek r zrvan xy`

DoyaUpdate.m

divwpetd

dxeyd.

approx

e,

reward

,

f

,

policy

zeivwpetlze`ixwodzepey`xdzexeyd4.zexey8nzakxen

E.Etrace

z`zepk rn7e6zexey.xn`nay10d`eeynadzx bditl

δ(t)

z`zaygnziyingd

ly(oey`xx qnix`pilaexiwi"r,xnelk)xliie`beqnzixnepdivxhpi`i"rdn`zda

E.W

z`e

8dxey .(

w i (t + ∆t) = w i (t) + ˙ w i ∆t

e

e i (t + ∆t) = e i (t) + ˙e i ∆t

xnelk ) xn`na17 ze`eeyn

.(dxrddnd`ivedlyi)dpezp

References

Related documents