Stochastic Approximation Algorithms With Applications To Particle Swarm Optimization, Adaptive Optimization, And Consensus

(1)

Wayne State University Dissertations

1-1-2015

Stochastic Approximation Algorithms With

Applications To Particle Swarm Optimization,

Adaptive Optimization, And Consensus

Quan Yuan

Wayne State University,

Follow this and additional works at:http://digitalcommons.wayne.edu/oa_dissertations

Part of theApplied Mathematics Commons

This Open Access Dissertation is brought to you for free and open access by DigitalCommons@WayneState. It has been accepted for inclusion in Wayne State University Dissertations by an authorized administrator of DigitalCommons@WayneState.

Recommended Citation

Yuan, Quan, "Stochastic Approximation Algorithms With Applications To Particle Swarm Optimization, Adaptive Optimization, And Consensus" (2015). Wayne State University Dissertations. Paper 1324.

(2)

AND CONSENSUS by

QUAN YUAN DISSERTATION

Submitted to the Graduate School of Wayne State University,

Detroit, Michigan

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY 2015 MAJOR: MATHEMATICS Approved by: ———————————————————– Advisor Date ———————————————————– ———————————————————– ———————————————————– ———————————————————–

(3)

To my family and teachers

(4)

Over the past five years I have received support and encouragement from a great number of individuals. I would like to take this special opportunity to express my appreciation on all of them, especially to my advisor, Professor George Yin, for his endless support and care, spiritual guidance and academic training. Professor Yin’s guidance has made this five years a thoughtful and rewarding journey. It is my great honor to be his student, and he is my lifetime role model.

I would like to thank my dissertation committee of Professor Boris Mordukhovich, Pro-fessor Kazuhiko Shinki, ProPro-fessor Tze-Chien Sun, and ProPro-fessor Wen Chen for their support over the past three years as I moved from rough ideas to a completed study. In addition, Professor Le Yi Wang also provided valuable support of the guidance and help for my PhD study.

During my graduate study, Professor Po Hu, Professor Alex Korostelev, Professor Guozhen Lu, Professors Bertram Schreiber, Professor Ualbai Umirbaev, and Professor Zhimin Zhang have taught me courses. I appreciate their help. Mrs. Patricia Bonesteel, Mrs. Tiana Bosley, Dr. John Breckenridge, Mrs. Mary Klamo, Mrs. Barbara Malicke, Dr. Choon-Jai Rhee, and Mrs. Joyce Wynn have trained me and supported me on teaching and many other aspects. I appreciate you all. I am also grateful to my friends Dan Ao, Yuehai Xu, Qi He, Guangliang Zhao, Xiaolong Han, Yayuan Xiao, Hongwei Mei, Xiaoyue Cui, Wei Ouyang, Yuan Tian, Ren Zhao, and Hailong Guo. Because of you all, my stay in Detroit has been colorful and memorable.

Finally, I thank my parents, Shuncai Yuan and Yuyan Shi, for the inspiring education and endless love they provided me. My thanks also go to my parents-in-law, Hongwei Yang

(5)

many ways. My special thanks go to my wife Zhixin and my son Jacob, for sharing my hopes and dreams.

(6)

Dedication . . . ii

Acknowledgements . . . iii

Chapter 1: Introduction . . . 1

1.1 Background uand uMain uIssues . . . 1

1.2 Outline uof uthe uDissertation . . . 3

Chapter 2: Analyzing uConvergence uand uRates uof uConvergence uof uPSO . . . 6

2.1 Introuduction . . . 6 2.2 Forumulation . . . 10 2.3 Conuvergence . . . 16 2.4 Rate uof uConvergence . . . 24 2.5 Numerical uExamples . . . 34 2.6 FurtheruRemarks . . . 38

Chapter 3: Infinite uDimensional uRegime-Switching uSA Algorithms . . . 40

3.1 Introuduction . . . 40

3.2 Forumulation . . . 42

3.3 uAsymptoticuProperties . . . 44

3.4 Limit uof uModulatinguMarkov uChain . . . 52

3.5 Switching uDiffusion uLimit . . . 62

3.6 Truncation uand uTightness . . . 63

3.7 Representation uof uCovariance . . . 66

3.8 An uapplicationuon uadaptive udiscrete ustochastic optimization . . . 71

(7)

Chapter 4: Asynchronous uSA uAlgorithms ufor Networked uSystems . . . 75

4.1 Introuduction . . . 75

4.2 uConsensus uAlgorithmuBasics: uTraditional uSetting . . . 80

4.3 Forumulation . . . 84

4.4 AsymptoticuProperties: uε = O(µ) . . . 89

4.5 uInvariance uuTheorem . . . 99

4.6 SlowlyuuVarying uu(ε≪ µ) anduu uuRapidly uuVarying (µ≪ ε) uuMarkov Chains . . . . 110

4.7 uIllustrative uuExamples . . . 114

4.8 uFurtheruRemarks . . . 116

Chapter 5: uConcluding uRemarks . . . 117

References . . . 118

Abstract . . . 134

Autobiographical Statement . . . 136

(8)

CHAPTER 1 Introduction

1.1 Background

u

and

u

Main

u

Issues

This udissertation focusesuonustochastic uapproximationualgorithms uwithusome uapplications.uIn

u

manyureal-worlduproblems,utheudifficulties ulieinu utheuuncertainty uinuinformation. uForuexample,

u

inusystemuidentification utheuunknown usystemucoefficientsuare uestimateduonutheubasis uofu

input-output udata uof uthe ucontrol usystem; uin uadaptive ucontrol usystems uthe uadaptive ucontrol ugain u

shouldubeudefinedubaseduonuobservationudatauinusuchuauwayuthatutheugainuasymptoticallyutends u

to uthe uoptimal uone; uresearchers uat au upharmaceutical uform udesign ulaboratory uexperiments uto

u

extractuthe umaximumuinformationuabout utheuefficacy ofu uaunewudrug,uand umoreuexamplesumay

u

be uadded utouthis ulist.

Many uof uthese uproblems ucan ube utransformed uto ua uroot-seeking uproblem ufor uan uunknown u

function.uTousee uthis,ulet uusuconsideruauproblem uaboutuestimatinguunknown uparametersubased u

onuobservation udata ucontaining uinformation uabout uthe uparameters. uLetuyn udenote uthe u

obser-vation uatutime un ui.e., uthe uinformation uavailable uabout uthe uunknown uparameters uatutime un. uIt

u

can ube uassumed uthat uthe parameteru uunder uestimation udenoted uby ux

∗

u

is ua uroot uof usome u

un-knownufunctionuf (·)uwithuf (x

∗_{) = 0.}

u

Thisuisunotuarestriction,u ubecause,uforuexample,ukx−x

∗_k2

u

may userve uas usuch ua ufunction. uLet uxn ube uthe uestimate ufor ux

∗

u

at utime un. uThen uthe uavailable

information uyn+1 uat utime un + 1ucan uformally ube uwritten uas

yn+1u= f (xn)u+ εn+1,

where

(9)

Therefore, uby considering uyn+1 uas uanuobservation uon uf (·) uat uxn uwith uobservation uerror uεn+1,

u

the uproblem uhas ubeen ureduced utouseeking uthe uroot ux

∗

u

of uf (·) ubased uonu{yn}.

If uf (·) uand uits ugradient ucan ube uobserved uwithout uerror uat uany udesired uvalues, uthen u

nu-merical umethods usuch uas uNewton-Raphson umethod amongu uothers ucan ube uapplied uto usolving u

the uproblem. uHowever, uthisukind uofumethods ucannotubeuused uhere,ubecause uinuaddition utouthe u

obvious uproblem uconcerning uthe uexistence uand uavailability uof uthe ugradient, uthe uobservations

u

are ucorrupted uby uerrors uwhich umay ucontain notu uonlyuthe upurely urandomucomponent ubut ualso

u

the ustructural uerror ucaused uby inadequacyu uof uthe uselected uf (·).

Aiming uatusolvinguthe ustateduproblem, uRobbinsuand uMonro uproposeduthe followingu

recur-sive ualgorithm

xn+1u= xnu+ anuyn+1, an> 0,

touapproximateutheusought-forurootux

∗_,

u

whereuanuisutheustepsize.u uThisualgorithmuisunowucalled u

the uRobbins-Monro u(RM) ualgorithm. uFollowing uthis upioneer uwork uof ustochastic

approxima-tion,uthere haveubeenuaulargeuamountuofuapplicationsutoupracticaluproblemsuanduresearchuworks u

onutheoretical uissues.

At ubeginning, uthe uprobabilistic umethod uwas theu umain utool uin uconvergence uanalysis ufor

u

stochasticuapproximation algorithms,uanduratherurestrictiveuconditionsuwereuimposeduonuboth

u

f (·) uand u{εn}. Foru uexample, uit uis urequired thatu uthe ugrowth urate uof uf (x) uis unot ufaster uthan u

linear uas ukxk utends uto uinfinity uand u{εn} uis ua umartingale udifference usequence [1]. uThough uthe u

linear ugrowth urate ucondition uis urestrictive, uas shownu uby usimulation uit ucan uhardly ube usimply u

removed uwithout uviolating uconvergence ufor uRM ualgorithms. uTo uweaken uthe unoise uconditions

u

guaranteeinguconvergenceuofutheualgorithm,theu uODEu(ordinaryudifferentialuequation)umethod

u

(10)

u

by uthe uODE umethod mayu ube usatisfied uby ua ularge uclass uof u{εn} uincluding uboth urandom uand

u

structural uerrors, uthe uODE umethod uhas beenu uwidely uapplied ufor uconvergence uanalysis uin

u

different uareas. uIn usome uapplications upeople uprefer utouusing uconstant ustep usize

xn+1u=uxn+uηyn+1,

where ua uconstant uη > 0 ustands ufor uan. Theu utool uto udeal withu uthis usituation, uwhich uwas

u

developed uby uKushner [77, 90], ucalled uweak uconvergence umethod.

The udevelopmentuofustochasticuapproximationumethodshasu ubeenuclosely urelatedutouauwide

u

range uofuapplicationsuinustochastic uoptimization, identification, uadaptive control,uestimation,

detection, usignal processing, umanagement sciences, uand umany uother urelated ufields. uAs ucan u

be useen uthat umany ucontrol uand uoptimization utasks ucan ube urecast uinto ua uform uthat uresults u

in uthe uuse uof ustochastic uapproximation procedures. uIn uthis udissertation, uwe upresent uthree

u

applications uof ustochastic uapproximation umethods.

1.2 Outline

u

of

u

the

u

Dissertation

The uremainder uof uthe udissertation uis uarranged asu ufollows. uIn uChapter u2, uwe uuse ustochastic u

approximationutouanalyzeuParticleuSwarmOptimizationu u(PSO)ualgorithm.uWeuintroduceufour

u

coefficientsuandurewriteutheuPSOuprocedureasu uaustochasticuapproximationutypeuiterativeu

algo-rithm.uThen uwe uanalyzeuitsuconvergence uusing weaku uconvergence umethod. uItuis uproveduthat ua u

suitablyuscaled usequence uof uswarmsuconverge utoutheusolutionuof uanuordinaryudifferential u

equa-tion. uWe ualso uestablish ucertain ustability uresults.uMoreover, uconvergence urates uare uascertained u

by uusinguweak uconvergence umethod.uAucentered uanduscaled usequence uof uthe uestimationuerrors u

(11)

InuChapteru3,uweustudyuauclassuofustochasticuapproximationualgorithmsuwithuregimeu

switch-ing uwhich uis umodulated uby ua udiscrete Markovu uchain uhaving ucountable ustate uspaces uand u

two-time-scale structures. uInutheualgorithm,uthe uincrements uofuausequence uofuoccupation umeasures u

are uupdated uusing uconstant ustep usize. uIt uis udemonstrated uthat uleast usquares uestimates ufrom u

the utracking uerrors ucan ube udeveloped. uUnder uthe uassumption uthat uthe uadaptation urates uare u

of uthe usame uorder uof umagnitude uas thatu uof utimes-different parameter, uit uis uproven uthat uthe

u

continuous-time interpolationufromutheuiteratesuconverges weaklyu utousome usystemuofuordinary

u

differential uequationsu(ODEs)uwith uregime uswitching, andu uthat ua usuitably uscaledusequence uof

u

the utracking uerrors uconverges uto ua usystem ofu uswitching udiffusion. uThis uwork uis uan uextension u

of uthe uwork uin u[92].

In Chapter u4, uwe udeveloped uasynchronous ustochastic uapproximation u(SA) ualgorithms ufor u

networked usystems uwith umulti-agents andu uregime-switching topologies uto uachieve uconsensus

control. uThere uare useveral udistinct ufeatures ofu uthe ualgorithms ustudied uin uthe udissertation.

u

(1)uIn ucontrast utouthe umost existingu uconsensus ualgorithms,uthe uparticipating uagents ucompute

u

and ucommunicate uin uan uasynchronous ufashion uwithout uusing ua uglobal uclock. u(2) uThe uagents u

compute uand ucommunicate uat urandom utimes. uu(3)uThe uregime-switching uprocess uis umodeled u

as ua udiscrete-time uMarkov uchain uwith ua finiteu ustate uspace. u(4) uThe ufunctions uinvolved uare u

allowed uto uvary uwith urespect uto utime uhence unonstationarity ucan ube uhandled. u(5) uMulti-scale u

formulationuenriches uthe uapplicability uofuthe ualgorithms.uuInuthe usetup, uthe uswitching uprocess

u

contains ua urate uparameter uε > 0 uintheu utransition uprobability umatrix uthat ucharacterizes uhow

u

frequently uthe utopology uswitches. uThe ualgorithm usesu ua ustep-size µ uthat udefines uhow ufast u

the unetwork ustates uare uupdated.uDepending onu utheir urelative uvalues, uthree udistinct uscenarios u

emerge. uUnder usuitable uconditions, uit uis ushown uthat ua continuous-time uinterpolation uof uthe u

(12)

u

modulated uby ua ucontinuous-time uMarkov uchain, uoruto ua usystem uof udifferential uequations u(an

u

average uwith urespect uto ucertain umeasure). uIn uaddition, ua uscaled usequence uof utracking uerrors

u

converges uutouauswitching udiffusionuorau udiffusion.uSimulation results uare upresented utou

(13)

CHAPTER 2 Analyzing

u

Convergence and

u

Rates

u

of

u

Con-vergence

u

of

u

Particle

u

Swarm

Optimiza-tion

u

Algorithms

2.1 Intro

u

duction

Recently, uoptimization using uparticle uswarms uhave ureceived uconsiderable uattention uowing uto u

theuwideurangeuofuapplicationsufromunetworkedusystems,umulti-agentusystems,uanduautonomous u

systems. uParticle uswarming urefers uto ua computationalu umethod uthat uoptimizes ua uproblem uby

u

trying urecursively uto uimprove ua ucandidate solutionu uwith urespect uto ua ucertain uperformance

u

measure. uSwarm uintelligence ufrom ubio-cooperation withinu ugroups uof uindividuals ucan uoften u

provide uefficientusolutions uforucertainuoptimizationuproblems. uWhenubirds uareusearchingufood, u

they uexchange uand ushare uinformation. uEach memberu ubenefits ufrom ualluother umembers uowing u

to utheir udiscovery uand uexperience ubased onu uthe uinformation uacquired ulocally. uThen ueach

u

participating umember uadjusts uthe unext searchu udirection uin uaccordance uwith uthe individual’s

uubestupositionucurrentlyuandutheuinformationcommunicatedu utouthisuindividualubyuitsuneighbors. u

When ufood usources uscattered uunpredictably, advantages uof usuch ucollaboration uwas udecisive. u

Inspired uby uthis, uKennedy uand uEberhart uproposed ua uparticle uswarm uoptimization u(PSO) u

algorithmuinu1995 [26].uA uPSO uprocedure isu uaustochasticuoptimizationualgorithm uthatumimics u

the uforagingubehavioruofubirds. uTheusearch spaceu uof utheuoptimizationuproblem uisuanalogousuto

u

the uflightuspaceuofubirds.uUsing uanuabstractsetup,u ueachubirduis modeleduuasuauparticleu(aupoint

u

inutheuspaceuofuinterest).uFindingutheuoptimumisu utheucounterpartuofsearchingu uforufood.uAuPSO

u

can ube ucarried uout ueffectively uby usingu uan uiterative uscheme. The uPSO ualgorithm usimulates u

social ubehavior uamong uindividuals u(particles) u“flying” uthrough ua umultidimensional usearch u

(14)

u

Theuparticles uevaluateutheirupositionsuaccording tou ucertainufitnessufunctionsuatueachuiteration.

u

The uparticles ushare umemories uof utheir u“best” upositions ulocally, uand uuse uthe umemories uto

u

adjust utheiruownuvelocitiesuand upositions.uMotivatedbyu uthis uscenario,ua umodeluisuproposeduto u

represent uthe utraditional udynamicsuof uparticles.

Toupututhisuinau umathematicaluform,uletuF : R

D _{→ R}

u

beutheucostufunctionutoubeuminimized. u

Ifuwe ulet uM udenoteuthe usize uof uthe uswarm,uthe ucurrent uposition uof uparticle uiuis udenoted uby uX

i

(i = 1, 2, . . . , M),uand uitsucurrentvelocityu uisudenoted uby uv

i_.

u

Then,uthe uupdating principle ucan

u

be uexpressed uas

v_n+1i,d = vi,d_n + c1ri,d1,n[Pri,dn − Xni,d] + c2ri,d2,n[Pgi,dn − Xni,d],

X_n+1i,d = Xi,d n + v i,d n+1, (2.1) where d = 1,u. . . , D; r i,d 1 ∼ U(0, 1) and r i,d

2 ∼ U(0, 1) represent two random variables

uni-formly distributed in [0, 1]; c1 and c2 represent the acceleration coefficients; uPr

i

n represents

u

the ubest uposition ufound uby uparticle iu uup uto u“time” un, uand uPg

i

n urepresents uthe u“global” ubest

position ufound uby uparticle i’su uneighborhood uΠi, ui.e.,

uPri_nu= arg min

1≤uk≤unF (X i k),

Pgi_n = Prj∗

n, where j∗ =uarg min

j∈Πi

F (Pri_n). (2.2)

u

Inuartificial ulifeuandusocialupsychology,uv

i

n uinu(2.1) uisutheuvelocityofu uparticle uiuatutimeun, uwhich u

providesutheumomentumuforuparticlesutopassu uthroughutheusearchuspace.uThe c1r

i,d 1,n[Pr

i,d

n −Xni,d]

u

is uknown uas uthe u“cognitive” ucomponent, uwhich urepresents uthe upersonal uthinking uof each u

particle. uThe ucognitive ucomponent uof ua uparticle takesu uthe ubest uposition ufound uso ufar uby u

(15)

c2r2,ni,d[Pgi,dn − Xni,d] uis uknown uas uthe u“social” ucomponent, uwhich urepresents uthe ucollaborative

u

behavior uof uthe uparticles uto ufind uthe uglobal uoptimal usolution. uThe usocial ucomponent ualways

u

pulls uthe uparticles utowarduthe bestu uposition ufound uby uitsuneighbors.

In a nutshell, a PSO algorithm has the following advantages: (1) It has versatility and does not rely on the problem information; (2) it has a memory capacity to retain local and global optimal information; (3) it is easy to implement. Given the versatility and effectiveness of PSO, it is widely used to solve practical problems such as artificial neural networks [23, 43], chemical systems [17], power systems [5, 6], mechanical design [28], communications [71], robotics [32,63], economy [45,47], image processing [46], bio-informatics [53,64], medicine [58], and industrial engineering [40,60]. Note that swarms have also been used in many engineering applications, for example, in collective robotics where there are teams of robots working together by communicating over a communication network; see [38] for a stability analysis and many related references.

To uenable uand uto uenhance ufurther uapplications, umuch uwork uhas ualso ubeen udevoted uto

u

improvingutheuPSOualgorithms.uBecauseuthe uoriginalumodeluisusimilarutouaumobileumulti-agent u

system uandueachuparameterudescribesuauspecial ucharacteruofunaturaluswarmubehavior,uone ucan u

improve uthe uperformance uof uPSO uaccording tou uthe uphysical umeanings uof uthese uparameters

[39, 48, 52, 54, 72]. The ufirst usignificant uimprovement uwas proposedu uby uShi uand uEberhart uin u[59]. uThey usuggested uto uadd ua unew uparameter uw uas uan u“inertia uconstant”, uwhich uresults uin u

fast uconvergence. uThe umodified uequationuof u(2.1) uis

v_n+1i,d = wv_ni,d+ c1r1,ni,d[Pri,dn − Xni,d] + c2r2,ni,d[Pgi,dn − Xni,d],

X_n+1i,d = X_ni,d+ v_n+1i,d .

(16)

Another significant improvement was due to Clerc and Kennedy [16]. They introduced a constriction coefficient χ and then proposed to modify (2.1) as

v_n+1i,d = χ(v_ni,d+ c1r1,ni,d[Pri,dn − Xni,d] + c2r2,ni,d[Pgi,dn − Xni,d]),

X_n+1i,d = X_ni,d+ v_n+1i,d .

(2.4)

This uconstriction coefficient ucan ucontrol uthe u“explosion” ofu uthe uPSO uand uensure uthe u

conver-gence. uSome uresearchers ualso uconsidered uusing u“good” utopologies uof uparticle uconnection, uin u

particular uadaptive uones u(e.g., [14, 42, 44]).

Thereuhas ubeen umuchudevelopment uonmathematicalu uanalysis uforuthe uconvergenceuof uPSO

u

algorithms uas uwell. uAlthough umost researchersu uprefer uto uuse udiscrete usystem [13, 16, 62, 66],

there uare usome uworks uon ucontinuous-time umodels [18, 41]. uSome urecent uwork usuch uas [15,

19, 24, 37, 51, 65] provides uguidelines ufor uselecting PSOu uparameters uleading uto uconvergence, u

divergence, uoruoscillationuofutheuswarm’s particles.uThe aforementioneduworkualsougivesuriseuto u

several uPSO uvariants. uNowadays, uit uisuwidely urecognized uthat upurely udeterministic uapproach

u

is uinadequate uin ureflecting uthe uexploration uand uexploitation uaspects ubrought uby ustochastic

u

variables. uHowever, uas ucriticized uby uPedersen [50], uthe uanalysis uis uoften uoversimplified. uFor

u

example,utheuswarm uisuoftenuassumed utouhaveuonlyuone uparticle; stochasticuvariables (namely, u

r1,n, r2,n)uareunotuused; theupointsuofattraction,u ui.e.,utheuparticle’sbestu uknownupositionuPruand u

the uswarm’s ubest uknown uposition uPg, areu unormally uassumed utouremain uconstant uthroughout u

the uoptimization uprocess.

Inuthisuchapter,uweustudyuconvergenceofu uPSOubyuusingustochasticuapproximationumethods.

u

In uthe upast, usome uauthors uhave uconsidered uusing ustochastic uapproximation ucombined uwith u

(17)

u

knowledge,utheuonlyuchapteruusingustochasticuapproximationumethodsutouanalyzeutheudynamics

u

of uthe uPSO uso ufar uis uby uChen andu uLi u[15]. They udesigned au uspecial uPSO uprocedure uand

u

assumed uthat (i) uPr

i

n uand uPg

i

n uare ualways uwithin au ufinite udomain; (ii) uwith uP

∗

u

representing

u

the uglobal uoptimal upositions uin uthe usolution space, and kP

∗_{k < ∞. lim}

n→∞Prn → P∗ and

limn→∞Pgn → P∗. Using uassumption u(i), theyu uproved uthe uconvergence uof uthe ualgorithm u

in uthe usense uof uwith uprobability uone. Withu uadditional uassumption u(ii), uthey ushowed uthat uthe

u

swarmuwilluconvergeutouP

∗_.

u

Despiteutheuinterestingudevelopment,theiru uassumptionsu(i)uandu(ii)

u

appearutoubeuratherustrong.uMoreover,utheyaddedu usomeuspecificutermsuinutheuPSO uprocedure.

u

So utheir ualgorithm uis udifferent ufrom theu utraditional uPSOs (2.1)-(2.4). uIn uthis uchapter, uwe u

consider ua ugeneral uform ofu uPSO ualgorithms. uWe uintroduce ufour ucoefficients uε, uκ1, uκ2, uand uχ u

and urewriteuthe uPSOsuinua ustochasticuapproximation usetup.uThen uweuanalyze uitsuconvergence u

using uweak uconvergence umethod. uWe uprove thatu ua usuitably uinterpolated usequence uof uswarms

u

converge uto uthe usolution uof uan uordinary udifferential uequation. uMoreover, uconvergence urates

u

are uderived uby uusing ua usequence uof ucentered uand uscaled uestimation uerrors.

The urest uof utheuchapter uisuarranged uasufollows.uSection 2.2upresents uthe usetup uofuour u

algo-rithm.uSection u2.3 ustudiesuthe uconvergence uand uSection u2.4 uanalyzesuthe urate uof uconvergence. u

Sectionu2.5uproceedsuwithuseveralunumericalusimulationuexamplesutouillustrateutheuconvergence u

of uour ualgorithms. uFinally, uSection 2.6u uprovides ua ufew ufurther uremarks.

2.2 For

u

mulation

First,uweuwilluintroduceusomeunotationsuuseduinthisu uchapter.uWeuuse|·|u utoudenoteuauEuclidean u

norm. uAupoint uθ inu uauEuclidean uspace uisau ucolumn uvector;uthe uithucomponent uof uθ uis udenoted

u

by uθ

i_{; diag(θ)}

u

isuaudiagonal umatrix uwhose udiagonal uelementsuare uthe uelements uofuθ; uI udenotes

u

the uidentityumatrix uofuappropriate udimension;uz

′

u

(18)

u

O(y) udenotes uaufunction uof uy usatisfying sup

y|O(y)|/|y| < ∞, uand uo(y) udenotes ua ufunction uof u

y usatisfying |o(y)|/|y| → 0, as y → 0.uIn uparticular, uO(1) udenotesuthe uboundedness uand uo(1)

u

indicates uconvergence uto u0. uThroughout uthe uchapter, ufor uconvenience, uwe uuse uK uto udenote u

a ugeneric upositive uconstant uwith uthe uconvention uthat uthe uvalue uof uK umay ube udifferent ufor u

different uusage.

Inuthisuchapter,uwithoutulossuofugenerality,weu uassumeuthatueachuparticleuisuauone-dimensional

u

scalar. uNoteuthat ueachuparticle ucan beu ua umulti-dimensional vector, uwhich udoes unot uintroduce

u

essential udifficulties uin uthe uanalysis; uonly uthe notationu uis ua ubit umore ucomplex. uWe uintroduce

u

fouruparameters uε,uκ1, uκ2,uand uχ.Supposeu uthereuare ur uparticles, uthenuthe uPSO ualgorithm ucan u be uexpressed uas   vn+1 Xn+1   =   vn Xn   + ε     κ1I −χ(c1diag(r1,n) + c2diag(r2,n)) κ2I −χ(c1diag(r1,n) + c2diag(r2,n))     vn Xn   +χ   c1diag(r1,n) c2diag(r2,n) c1diag(r1,n) c2diag(r2,n)     Pr(θn, ηn) Pg(θn, ηn)     , (2.5)

where uc1 uand uc2 urepresent uthe uacceleration coefficients, uXn = [X

1 n, . . . , Xnr]′ ∈ Rr, vn = [v1 n, . . . , vrn]′ ∈ Rr, uθn = (Xn, vn) ′_, u

r1, ur2 uare r-dimensional urandom vectors uin uwhich ueach

u

component uis uuniformly udistributed uin (0, 1),u uand Pr(θ, η) uand Pg(θ, η) uare utwo unon-linear u

functionsudepending uonuθ = (X, v)

′

u

asuwelluasuonuau“noise” η,u uand uε > 0uisua usmalluparameter u

representing uthe ustepsize uof uthe uiterations.

Remark 2.1. Noteuthat uforuaulargeuvariety uof cases, utheustructures uand utheuformsuof uPr(θ, η)

and uPg(θ, η) are unot uknown. uThis uis usimilar uto uthe usituation uin ua ustochastic uoptimization

u

problem uinuwhich uthe uobjective ufunctionisu unotuknownuprecisely. uThus, ustochasticu

(19)

u

very uuseful uforutreating uoptimizationuproblems uinwhichu uthe uformuof uthe uobjective ufunctionuis

u

not uknown uprecisely, uor utoo ucomplex uto ucompute. uThe ubeauty uof usuch ustochastic uiteratively

u

defined uprocedures uisuthat uone uneed notu uknow uthe uprecise uformuof uthe functions.

If uthereuis uno unoiseuterm uηn,letu uε = 0.01, uχ = 72.9, κ1 =−27.1, uand κ2 = 72.9,uthen (2.5) uis

equivalent uto (2.3) uwhen w = 0.729 uor u(2.4) uwhen uχ = 0.729. uThus u(2.5) uis ua ugeneralization u

of (2.1)-(2.4). uSo ua ulot uof uapproaches ofu utuning uparameters u(e.g., [10, 49, 70]) ucould ualso ube

u

applied.

Remark 2.2. uIn uthe uproposed algorithm, uwe uuse ua uconstant ustepsize. uThe ustepsize uε > 0

u

is ua usmall uparameter. uAs uis wellu urecognized u(see [11, 90]), uconstant ustepsize ualgorithms uhave

u

the uabilityutoutrack uslight utime uvariationuand uisumore upreferable uinumany uapplications.uIn uthe u

convergence uand urateuof uconvergence uanalysis, uweulet εu →u0.uIn uthe uactualucomputation, uε uis u

just ua uconstant. uIt needu unot ugo uto u0.Thisu uis uthe usame uasuone ucarries uout uany ucomputational u

problem uin uwhich uthe uanalysis urequires uthe uiteration unumber ugoing uto uinfinity. uHowever, uin

u

the uactual ucomputing, uone uonlyuexecutes uthe uprocedure ufinitely umany usteps. u

In u(2.5), r1 uand ur2 uare uused uto ureflect uthe uexploration uof uparticles. uRearranging uterms uof u

(20)

u rewritten uas   vn+1 Xn+1   =   vn Xn   + ε      κ1I −0.5χ(c1+ c2)I κ2I −0.5χ(c1+ c2)I     vn Xn   +χ   0.5c1I 0.5c2I 0.5c1I 0.5c2I     Pr(θn, ηn) Pg(θn, ηn)   + χ   0 −(c1diag(r1,n) + c2diag(r2,n)− 0.5c1I− 0.5c2I) 0 _−(c1diag(r1,n) + c2diag(r2,n)− 0.5c1I− 0.5c2I)     vn Xn   + χ   c1diag(r1,n)− 0.5c1I c2diag(r2,n)− 0.5c2I c1diag(r1,n)− 0.5c1I c2diag(r2,n)− 0.5c2I     Pr(θn, ηn) Pg(θn, ηn)     . (2.6) Denote θn= [vn, Xn]′ ∈ R2r, M =   κ1I −0.5χ(c1+ c2)I κ2I −0.5χ(c1+ c2)I   , P (θn, ηn) = χ   0.5c1I 0.5c2I 0.5c1I 0.5c2I     Pr(θn, ηn) Pg(θn, ηn)  , (2.7)

anduW (θn, r1,n, r2,n, ηn) toubeuthe usumuoftheu ulastutwoutermsuintheu ucurly ubracesuofu(2.6).uThen u

(2.6) ucan ube uexpressed uasua ustochastic uapproximation algorithm

θn+1 = θn+ ε[Mθn+ P (θn, ηn) + W (θn, r1,n, r2,n, ηn)]. (2.8)

One uof uthe uchallenges uin uanalyzing uthe convergenceu uof uPSO uis uthat uthe uconcrete uforms

u

of Pr(θn, ηn) uand Pg(θn, ηn) uare uunknown. uHowever, uthis uwill unot uconcern uus. uAs umentioned u

(21)

u

situations. uWe ushall uuse uthe ufollowing uassumptions.

(A1) uTheuPr(·, η)uanduPg(·, η)uare ucontinuousuforueachuη. Forueachuboundeduθ, E|P (θ, ηn)|

2 _<

∞ uand E|W (θ, r1,n, r2,n, ηn)|

2 _<_∞.

u

There uexist ucontinuous ufunctions Pr(θ) uand Pg(θ) u such uthat 1 n n+m−1u X j=m EmPr(θ, ηj)→ Pr(θ) in probability, 1 n n+m−1u X j=m EmPg(θ, ηj)→ Pg(θ) in probability, (2.9)

where uEm udenotes uthe uconditional uexpectation onu uthe uσ-algebra Fm = {θ0, ri,j, i =

1, 2, ηj : j <um}. uMoreover, ufor eachu uθ uinua uboundeduset,

∞ X j=n |EnPr(θ, ηj)− Pr(θ)| < ∞, ∞ X j=n |EnPg(θ, ηj)− Pr(θ)| < ∞. (2.10) (A2) Define P (θ) = χ   0.5c1I −0.5c2I 0.5c1I −0.5c2I     Pr(θ) Pg(θ)  . The uordinary udifferentialuequation

dθ(t)

dt = Mθ(t) + P (θ(t)) (2.11)

has ua uunique usolutionufor ueach uinitialucondition θ(0) = (θ

1

0, . . . , θ02r)′.

(A3) uuFor ui = 1, 2, u{ri,n} uand u{ηn} areu umutually uindependent; u{ri,nu} uare ui.i.d. usequences uof u

(22)

Remark 2.3. Conditionu(A1)uisusatisfieduby ualargeu uclassuofufunctionsuand urandomuvariables.

u

Theucontinuity uisuassumedufor uconvenience.uInufact,uonlyuweakucontinuity uisuneededuso uweucan

u

in ufactudeal uwith uindicator utype uof ufunctionsuwhose uexpectations uare ucontinuous.

In ufact, u(2.9) umainly urequires uthat {Pr(θ, ηn)} isu ua usequence uthat satisfiesu ua ulaw uof ularge u

number utype uof ucondition, ualthough uit uis uweaker uthan uthe uusual uweak ulaw uof ularge unumbers. u

Conditionu(2.10)uisumodeleduby utheumixingtypeu ucondition.uForuinstance,uweumayuassumeuthat

u

for ueachubounded urandomuvector θu uand ueachuT < ∞,ueither

lim j→∞,∆→0E sup_{|Y |≤∆}|Pr(θ + Y, ηj)− Pr(θ, ηj)| = 0, or lim n→∞,∆→0 1 n m+n−1u X j=m E sup |Y |≤∆|Pr(θ + Y, η j)− Pr(θ, ηj)| = 0. u

Apparently, uthe usecond ualternative uis ueven uweaker. uWith ueither uof uthis uassumption, uall uof

u

the usubsequent udevelopment ufollows, ubut uthe argumentu uis umore ucomplex. uUnder uthe uabove u

condition, uone ucan utreat udiscontinuity uinvolving usign ufunction uor uindicator ufunction uamong u

others. uFor uthe ucorresponding stochasticu uapproximation algorithms, usee u[106, p. 100]; uthe

setupuin [90]uis evenumoreugeneral,uwhichuallowsuinuadditionutoutheudiscontinuity,utheufunctions

u

involved uto ube utime udependent. uInserting theu uconditional uexpectation uis umuch uweaker uthan

u

without. uFor uexample, uif u{ηn} uis au usequence uof ui.i.d. urandom uvariables uwith udistribution u

functionuFη,uthenuforueachuθ, Pr(θ) = EPr(θ, η1) =´ Pr(θ, ζ)Fη(dζ),usou(2.9)uisueasily uverified. u

Likewise, uif u{ηn} uis ua umartingale udifference usequence, uthe ucondition uis ualso usatisfied. uNext, uif u

{ηnu} uis ua umoving uaverage usequence udriven uby ua umartingale udifference usequence, u(2.9) uis ualso

u

satisfied.uInuaddition,uif u{ηn}uisau umixingusequence [97, p.166]uwithuthe umixingurateudecreasing

u

to u0, uthe ucondition uis ualso usatisfied. uNote thatu uin ua umixing sequence,u uthere ucan ube uinfinite

u

(23)

In uthe usimplest uadditive unoise case,u ui.e., Pr(θ, η) = Pr(θ) + η, uthen uthe ucondition uis

u

mainly uon uthe noise usequence {ηn}. uCondition (2.10) uis umodeled uafter uthe uso-called umixing

u

inequality; usee [106, p.82] uand ureferences utherein. uSuppose uthat {Pr(θ, ηn)} uis ua ustationary u

mixing usequence uwith umean Pr(θ) uand mixingu urate uφn usuch uthat u

P nuφ 1/2 n u<∞,uthen u(2.10) u is usatisfied.

With utheseuassumptions, uweuproceedutoanalyzeu utheuconvergenceuand uratesuofuconvergence

u

of uPSO ualgorithms uwith ugeneral uform u(2.8). uThe uscheme uis ua uconstant-step-size stochastic

u

approximation algorithm uwith ustep usize uε. uOur uinterest ulies uin uobtaining uconvergence uand

u

rates uof uconvergence uas uε → 0. uWe uemphasize thatu uin uthe uactual ucomputation, uit uis unot u

necessary uto umodify uit uas uthe ugeneralized PSOu uform u(2.8). uThis ugeneralized uPSO uform uis u

simply ua uconvenient uform uthat uallows uus tou uanalyze uthe ualgorithm uby uusing umethods uof u

stochastic approximation.

2.3 Con

u

vergence

This usection uis udevoted uto uobtaining asymptoticu uproperties uof ualgorithm u(2.8). uIn urelation

u

to uPSO uthe uword “convergence” utypically umeans uone ofu utwo uthings, ualthough uit uis uoften unot

u

clarifieduwhichudefinitionuisumeantuandusometimesutheyuareumistakenlyuthoughtutoubeuidentical.

u

(i)uConvergence umay ureferuto uthe uswarm’s ubest uknownuposition uuPg uapproaching u(converging u

to) uthe uoptimum uof uuthe uproblem, regardlessu uof uhow uthe uswarm ubehaves. (ii) uConvergence u

may ureferuto uaswarmu ucollapse uinuwhich ualluparticles uhave uconverged uto au upoint uinuthe usearch u

space,uuwhichumay uorumayunot ube theu uoptimum. uSince utheuconvergence umay rely uonustructure

u

ofuthe ucostufunctionifu uweuuseuthe ufirstdefinitionu uofuconvergence,uweuuse utheuseconduone uasuthe

u

(24)

u

as uε→u0 uthroughanu uappropriate continuous-time uinterpolation. uWe udefine

θε(t) = θn for t∈ [εn, εn + ε).

u

Then θε₍_{·) ∈}_u_{D([0, T ] : R}2r_{), which}

u

isuthe uspaceuofufunctionsthatu uareudefined uonu[0, T ]utaking u

values uinuR

2r_,

u

and uthat uareurightucontinuous uand uhaveuleft ulimitsuendowed withuthe uSkorohod u

topologyu[90, Chapter 7].

Theorem 2.4. Under u(A1)-(A3), θ

ε₍_·) u is utight uin uD([0, T ] : uR 2r_). u Moreover, uas uε → u0, u θε₍_·) u

converges uweakly uto uθ(·), uwhich uis ua usolution uof u(2.11).

Remark 2.5. AnuequivalentuwayuofustatingutheuODEulimitu(2.11)uisutouconsideruitsuassociated u

martingale uproblem u[106, pp. 15-16].uConsider uthe differentialu uoperatoruassociated uwith uθ(·)

Lf(θ) = (∇f(θ))′(Mθ + P (θ)), and define f Mf(t) = f (θ(t))− f(θ(0)) − ˆ t 0 Lf(θ(s))ds.

If fMf(·) is ua martingale ufor ueach uf (·) ∈ uC

1 0 u(C

1

u

function uwith ucompact usupport), uthen uθ(·)

u

isusaid uto usolve uaumartingale uproblem uwith operatoru uL. uThus, uanequivalentu uwayuto ustate uthe

u

theorem uisutouprove uthatuθ

ε₍_·)

u

converges uweaklyutouθ(·),uwhich isu uausolutionuofuthe umartingale u

problem uwith uoperator uL.

Proof of Theorem 2.4. To uprove uthe utightness uin uD([0, T ] : R

2r_),

u

we ufirst uneed uto ushow

lim

K→∞lim supε→0 P{supt≤T |θ

(25)

To uavoid uverifying u(2.12), weu udefine ua uprocess uθ ε,N₍_{·) satisfies} u θε,N_{(t) = θ}ε_{(t) up} u until uthe u

first uexit ufrom SN = {x ∈ R

2r _: _{|x| ≤ N} and} u satisfies u(2.12), uthe uθ ε,N₍_·) u is usaid uto ube uan u N-truncation uofuθ ε₍_·). u

Introduceuautruncationufunctionuq

N₍_·)

u

thatuisusmoothuanduthatusatisfies u

qN_{(θ) = 1}

u

for |θ| ≤ N, qN_{(θ) = 0 for} _{|θ| ≥ N + 1. Then}

u

the udiscrete usystem u(2.8) uisudefined u

as

θN

n+1 = θnN + ε[MθNn + P (θNn, ηn) + W (θnN, r1,n, r2,n, ηn)]qN(θNn), (2.13)

using uthe N-truncation. uMoreover, uthe uN-truncated uODE uand uthe uoperator uL

N

u

of uthe u

asso-ciated umartingaleuproblem ucan ube defined uas

dθN_(t) dt = [Mθ N_{(t) + P (θ}N_(t))]qN_(θ(t)), _(2.14) and LN_{f (θ) = (}_∇f(θ))′_{[Mθ + P (θ)]q}N_(θ), _(2.15) respectively.

Touprove utheutheorem,uweuproceed utouverifyutheufollowing uclaims:u(a)uforueachuN,u{θ

ε,N₍_·)}

u

is utight. uBy uvirtue uof uthe uProhorov utheorem [90, p.229],uwe ucan uextract ua uweakly uconvergent

u

subsequence. uFor unotational usimplicity, uwe stillu udenote uthe usubsequence uby u{θ

ε,N₍_·)} u with u limit udenoted uby uθ N₍_·). (b) θN₍_·) u

isua usolution uof uthe umartingale uproblem uwith uoperator uL

N_.

Using uthe uuniqueness uof theu ulimit,upassing uto uthe ulimit uasuN →u∞,uand uby uthe ucorollary

u

in [106, p.44], u{θ

ε₍_{·)} converges}

u

weakly utouθ(·).

Now uwe ustart uto proveu uclaims u(a)uand u(b).

(26)

u Noteuthat θε,N(t + s)− θε,N_{(t) = ε} (t+s)/ε−1_X k=t/ε (Mθ_kN + P (θ_kN, ηk) + W (θkN, r1,k, r2,k, ηk))qN(θkN).

In uthe uabove uand uhereafter, uwe uuse uthe uconventions uthat ut/ε uand u(t + s)/ε udenote uthe u

corre-spondinguinteger uparts ⌊t/ε⌋ and ⌊(t + s)/ε⌋, respectively. Forunotational usimplicity, uinuwhat

u

follows,uwe uwill unot uuse uthe flooru ufunction unotationuunless uituis unecessary.

Using uthe uCauchy-Schwarz inequality,

ε2E_tε (t+s)/ε_X−1 k=t/ε Mθ_kNqN(θN_k ) 2 ≤ εKs (t+s)/ε_X−1 k=t/ε E_tεθ_kNqN(θ_kN)2. (2.16) where uE ε

t udenotes uthe uexpectation uconditioneduon uthe uσ-algebra F

ε t. uLikewise, ε2E_tε (t+s)/ε_X−1 k=t/ε W (θN_k, r1,k, r2,k, ηk)qN(θNk) 2 ≤ Ks2, (2.17) and ε2E_tε (t+s)/ε−1_X k=t/ε P (θN_k, ηk)qN(θNk) 2 ≤ Ks2_. _(2.18) Souwe uhave E_tεθε,N(t + s)_{− θ}ε,N(t)2 _{≤ Kεs} (t+s)/ε_X−1 k=t/ε sup t/ε≤k≤(t+s)/ε−1 E_tε_|θ_kNqN(θ_kN)_|2+ Ks2. (2.19) As uauresult, uuthere uis uauς ε_{(δ) such} u that E_tε|θε,N_{(t + s)}_{− θ}ε,N_(t)_|2 _{≤ E}ε tςε(δ) for all 0≤ s ≤ δ,

(27)

and uthat ulimδ

→0lim supε→u0Eςε(δ) = 0. uThe utightness uof u{θ

ε,N₍_·)}

u

then ufollows ufrom [106,

p.47].

(b) uCharacterization uof uthe ulimit. uTo ucharacterize uthe ulimit uprocess, uwe uneed uto uwork u

with ua ucontinuously differentiable ufunction with ucompact usupport uf (·). uChoose umε uso uthat u

mε → ∞ uasuε→u0ubut uδε = εumε→u0. Using uthe urecursion u(2.13),

f (θε,N_{(t + s))}_{− f(θ}ε,N_{(t)) =} (t+s)/δ_Xε l=t/δε [f (θNlmε+mε)− f(θ N lmε)] = ε (t+s)/δε X l=t/δε (∇f(θN lmε)) ′ lmεX+mε−1 k=lmε [MθN_k + P (θ_kN)]qN(θN_k) +ε (t+s)/δε X l=t/δε (_∇f(θ_lmN_ε))′ lmεX+mε−1 k=lmε [P (θN_k, ηk)− P (θNk)]qN(θNk) +ε (t+s)/δ_Xε l=t/δε (_∇f(θ_lmN_ε))′ lmεX+mε−1 k=lmε W (θN_k, r1,k, r2,k, ηk)qN(θNk) +ε (t+s)/δ_Xε l=t/δε ( (_∇f(θ_lmN +_ε)_{− ∇f(θ}N_lm_ε))′ × lmεX+mε−1 k=lmε [MθN_k + P (θN_k, ηk) +W (θ_kN, r1,k, r2,k, ηk)]qN(θkN) ) , (2.20) where uθ N +

lmε is ua upointonu uthe uline usegment ujoining θ

N

lmε uand θ

N lmε+mε.

Our ufocus uhere uis uto ucharacterize uthe ulimit. uBy uthe uSkorohod urepresentation [90, p.230], u

withuauslightuabuseuofnotation,u uweumayuassumeuthatuθ

ε,N₍_{·) converges} u touθ N₍_·) u withu

probabil-ityuone uandutheuconvergence uisuuniformuonuanyuboundedutimeuinterval.uToushowuthatu{θ

ε,N₍_·)}

u

is ua usolution uof uthe umartingale uproblem uwith uoperator uL

N_,

u

(28)

u

f (_·)u_∈uC1

0,uthe uclassuofufunctionsuthat uareucontinuously differentiableuwith compactusupport,

f

M_fN(t) = f (θN(t))_{− f(θ}N(0))₋ ˆ t

0 L

N_{f (θ}N_(u))du

isua martingale. uTo uverifyuthe umartingale uproperty, weu uneeduonly ushowuthat uforuany ubounded u

and ucontinuous ufunction uh(·),uany upositiveuinteger uκ, anyu ut, s > u0, uand uti ≤ut uwith ui≤ κ,

Eh(θN_(t i) : i≤ κ)[fMfN(t + s)− fMfN(t)] = Eh(θN(ti) : i≤ κ) × [f(θN(t + s))− f(θN(t))− ˆ t+s t LN_{f (θ}N_(u))du] = 0. (2.21)

To uverify (2.21),uwe begin uwith uthe uprocess uindexed uby uε. For unotational usimplicity, udenote

eh = h(θN_(t

i) : i≤ κ), ehε = h(θε,N(ti) : i≤ κ). (2.22)

Thenuthe uweakuconvergenceuanduthe uSkorohodurepresentationutogetheruwithuthe uboundedness

u

and uthe ucontinuity uof uf (·)uand uh(·) uyield uthat uasuε→ 0,

Eehε[f (θε,N(t + s))_{− f(θ}ε,N(t))_{→ Eeh[f(θ}N(t + s))_{− f(θ}N(t))].

For uthe ulast uterm uof u(2.20), uasuε → 0,usince uf (·) ∈uC

1 0, Eehεε (t+s)/δ_Xε l=t/δε ( (_∇f(θN +_lm_ε)_{− ∇f(θ}_lmN_ε))′_× lmεX+mε−1 k=lmε [Mθ_kN + P (θN_k, ηk) +W (θN_k, r1,k, r2,k, ηk)]qN(θkN) ) = O(ε)_{→ 0.} (2.23)

(29)

Foruthe unext uto theu ulast uterm uof u(2.20), lim ε→0Eeh εh_ε (t+s)/δ_Xε l=t/δε (∇f(θN lmε)) ′_× lmεX+mε−1 k=lmε W (θ_kN, r1,k, r2,k, ηk)qN(θkN) i = lim ε→0Eeh εh (t+s)/δ_Xε l=t/δε (_∇f(θ_lmN_ε))′_× δε mε lmεX+mε−1 k=lmε ElmεW (θ N k , r1,k, r2,k, ηk)qN(θkN) i . (2.24)

Using u(A1) uand u(A3),

1 mε lmεX+mε−1 j=lmε ElmεW (θ N lmε, r1,j, r2,j, ηj)q N_(θN lmε)→ 0

in probability, we obtain that

Eehεhε (t+s)/δ_Xε l=t/δε (_∇f(θ_lmN_ε))′_× lmεX+mε−1 k=lmε W (θN_k, r1,k, r2,k, ηk)qN(θkN) i → 0. (2.25)

Using u(A1), uweuobtain

Eehεhε (t+s)/δ_Xε l=t/δε (_∇f(θN_lm_ε))′ _× lmεX+mε−1 k=lmε (P (θN_k, ηk)− P (θNk ))qN(θNk) i → 0. (2.26)

Next, uwe uconsider uthe ufirst uterm. uWe uhave

lim ε→0Eeh εh_ε (t+s)/δ_Xε l=t/δε (∇f(θN lmε)) ′ _× lmεX+mε−1 k=lmε (MθN_k + P (θN_k ))qN(θN_k)i = lim ε→0Eeh εh_ε (t+s)/δ_Xε l=t/δε (_∇f(θN_lm_ε))′_× lmεX+mε−1 k=lmε (Mθ_lmN_ε+ P (θN_lm_ε))qN(θ_lmN_ε)i. (2.27)

Thus,utougetutheudesired ulimit,uweuneeduonlyuexamineutheulastutwoulinesuabove.uLetuεulmε →uu

u

(30)

u result, lim ε→0Eeh εh_ε (t+s)/δε X l=t/δε (∇f(θN lmε)) ′_× lmεX+mε−1 k=lmε (Mθ_lmN_ε + P (θ_lmN_ε))qN(θN_lm_ε)i = Eehh ˆ t+s t

(_∇f(θN_(u)))′_(M(θN_{(u)) + P (θ(u)))q}N_(θ(u))dui_.

(2.28)

The udesired uresult uthenufollows.

To uproceed, uconsider u(2.11). uFor usimplicity, supposeu uthat uthere uis ua uunique ustationary

u point uθ ∗_. u Denote Pr(θ∗) = Pr∗ uand Pg(θ ∗_{) = Pg}∗_. u

By uthe uinversion uformula uof upartitioned u

matrix [61], usolving uMθ

∗ _{+ P (θ}∗_{) = 0 yields}

u

that uthe uequilibrium upoint uof uthe uODE usatisfies

θ∗ ₌   κ1I −0.5χ(c1+ c2)I κ2I −0.5χ(c1+ c2)I   −1 ×   −0.5χ(c1Pr∗+ c2Pg∗) −0.5χ(c1Pr∗+ c2Pg∗)   =   0 c1Pr∗+c2Pg∗ c1+c2   . (2.29)

Corollary 2.6. Suppose uthat uthe ustationary upoint uθ

∗

u

isuasymptotically ustable uin uthe usense uof u

Lyapunov uanduthat u{θn} uis utight. Thenu uforuany utε→ ∞ uas uε → 0, θ

ε₍_{· + t} ε) uconverges uweakly u to uθ ∗_. Proof. Define uθe ε₍_{·) = θ}ε₍_{· + t}

ε). Let uT > u0 uand uconsider uthe upair u{eθ

ε₍_{·), e}_θε₍_{· − T )}.}

u

Using

u

the usame uargument uas uin uthe uproof uof uTheorem 2.4, {eθ

ε₍_u_·),_u_θeε_u₍_·_u₋_u_T_u₎_}

u

is utight. uSelect ua u

convergent usubsequence uwith ulimit udenoted uby u(θ(·), θT(·)). uThen θ(0) = uθT(T ). uThe uvalue u

of uθT(0)uisnotu uknown, ubut uallusuch θuT(0), uoverualluT uand uconvergentusubsequences, ubelonguto

u

a utight uset. uThis utogether uwith uthe ustability andu uTheorem 2.4 uimplies uthat foru uany u∆u> u0,

u

there uis ua uT∆ usuch uthat for uTu>uT∆, uuP (θT(T )u∈uU∆u(θ

∗₎_u₎_u_> _u₁₋_u_∆, u where uU∆(θ ∗₎ u is ua

(31)

u

∆-neighborhood uof uθ

∗_.

u

The udesired uresult uthen ufollows.u

In uuCorollary 2.6, uwe uused uthe utightness uof uthe setu u{θn}, uwhich canu ube uproved uusing uthe

u

argument uof uLemma 2.9. uThe uresult uindicates uthat uastheu ustepsize uε →u0uand un→u∞ uwith u

nε → u∞, uθn uconverges uto uθ

∗

u

in uthe usense uin probability.u uNote uthat uif uθ

∗

u

turns uout uto uthe u

optimum uof uthe usearch uspace, uthen uθn uconverges uto uthe uoptimum.

Remark 2.7. Note uthat ufor unotational usimplicity, uwe haveu uassumed uthat uthere uis ua uunique

u

stationaryupoint uofu(2.11).uAs ufaruasuthe uconvergence uisuconcerned, uoneuneed unotuassume uthat

u

there uis uonly uone uθ

∗_.

u

See uhow umultimodal ucases ucan beu uhandled uin uthe urelated ustochastic

u

approximation uproblems uinu[90, Chpaters u5, u6,u8].Inu ufact, uforuthe umultimodalucases, uwe ucan

u

show uthat uθ

ε₍_·_u₊_u_t

ε) uconverges uin uan uappropriate usense tou uthe uset uof uthe ustationary upoints. u

Thus uCorollary 2.6 ucan ube umodified. uIn uthe urate uof uconvergence ustudy, u[25] usuggested uan u

approachuusing uconditional distribution,uwhich uisuaumodificationuof uausingleustationaryupoint. u

Ifumultipleustationary upointsuare uinvolved, uwe canu usimplyuuse uthe uapproachuof u[25]ucombined

u

withuour uweakuconvergenceuanalysis.uThe unotationuwill ubeuaubitumoreucomplex,ubut umainuidea

u

still urest uupon uthe basicu uanalysis umethod utoube upresented uin uthe unext section.u uIt useems utoube u

more uinstructive utoupresent uthe main uideas, uso uwe uchooseuthe ucurrent usetting.

2.4 Rate

u

of

u

Convergence

Once the convergence of a stochastic approximation algorithm is established, the next task is to ascertain the convergence rate. To study the convergence rate, we take a suitably scaled sequencezn = (θn− θ∗)/εα, for some α > 0. The idea is to choose α such that zn converges

(in distribution) to a nontrivial limit. The scaling factor α together with the asymptotic covariance of the scaled sequence gives us the rate of convergence. That is, the scaling tells

(32)

covariance is a mean of assessing “goodness” of the approximation. Here the factor α = 1/2 is used. To some extent, this is dictated by the well-known central limit theorem. For related work on convergence rate of various stochastic approximation algorithms, see [31, 67].

As umentioneduabove,uby uusing utheudefinition ofu uthe urateuof uconvergence,uweuare ueffectively u

dealinguwithuconvergenceuinutheudistributionalusense.uInulieuuofuexaminingutheudiscreteuiteration u

directly,uweuareuagaintakingu ucontinuous-time interpolations.uThree assumptionsuare uprovided

u

in uwhatufollows.

(A4) The following conditions hold:

(i) in a neighborhood of θ∗_{, Pr(}_{·, η) and Pg(·, η) are continuously differentiable for}

each η, and the second derivatives (w.r.t. θ) of W (_{·, r}1, r2, η) and P (·, η) exist and

are continuous.

(ii) denoting by Emthe conditional expectation on the σ-algebraFm ={θ0, r1j, r2j, ηj :

j < m_{}, and by ζ}θ the first partial derivative w.r.t. θ of ζ = W or P , resp., for

each positive integer m, as n_{→ ∞,}

1 n n+m_X−1 j=m EmPrθ(θ, ηj)→ Prθ(θ) in probability, 1 n n+m_X−1 j=m EmPgθ(θ, ηj)→ Pgθ(θ) in probability, ∞ X j=m |EmWθ(θ∗, r1,j, r2,j, ηj)| < ∞, ∞ X j=m |EmPθ(θ∗, ηj)− Pθ(θ∗)| < ∞. (2.30)

(iii) The matrix M + Pθ(θ∗) is stable in that all of its eigenvalues are on the left half

(33)

(iv) There is a twice continuously differentiable Lyapunov function V (_{·) : R}2r _{→ R}

such that

– V (θ)_{→ ∞ as |θ| → ∞, and V}θθ(·) is uniformly bounded.

– _|Vθ(θ)| ≤ K(1 + V1/2(θ)).

– |Mθ + P (θ)|2 _{≤ K(1 + V (θ)) for each θ.}

– V_θ′(θ)(Mθ + P (θ))≤ −λV (θ) for some λ > 0 and each θ 6= θ∗_.

(A5) ∞ X j=m |EfW′(θm, r1,m, r2,m, ηm)fW (θj, r1,j, r2,j, ηj)|< ∞, where fW (θ, r1, r2, η) = P (θ, η)− P (θ) + W (θ, r1, r2, η).

(A6) The sequence Bε_{(t) =}√_εPt/ε−1

j=0 fW (θ∗, r1,j, r2,j, ηj) converges weakly to B(·), a

(34)

Σ = EfW (θ∗, r1,0, r2,0, η0)fW′(θ∗, r1,0, r2,0, η0) + ∞ X k=1 EfW (θ∗, r1,0, r2,0, η0)fW′(θ∗, r1,k, r2,k, ηk) + ∞ X k=1 EfW (θ∗, r1,k, r2,k, ηk)fW′(θ∗, r1,0, r2,0, η0). (2.31)

Remark 2.8. Note uthat u(A4)(ii) uis uanother unoise ucondition. uThe umotivation uis usimilar uto

Remark 2.3. uThe umain udifference uof u(2.9) uand u(2.10) uand u(2.30) uis uthat u(2.30) uis uon uthe u

derivativeuofutheufunctionsevaluatedu uatutheupointuθ

∗_.

u

Inufact,uweuonlyuneedutheuderivativeuexists u

in ua uneighborhood uof uthis upoint uonly. uThis uis ubecause uthat uwe uare uanalyzing uthe uasymptotic

u

normality ulocally. uIn uview uof uthis conditionu uand ucondition uof u{ri,n}, u

1 n m+n_X−1 j=m EmWθ(θ∗, r1,j, r2,j, ηj)→ 0 in probability, 1 n m+n−1_X j=m EmPθ(θ∗, ηj)→ Pθ(θ∗) in probability. u

TheutraditionaluPSOualgorithmsudounotuallownon-additiveu unoise,uhereuweuare utreatinguaumore

u

generaluproblem. uNonadditive unoise ucan ube uallowed.

(A4)(iv) uassumes uthe uexistence uof uauLyapunov ufunction.uOnlyuthe uexistence uisuneeded; uits

u

precise uform uneed unot ube uknown. uFor simplicity,u uwe uhave uassumed uthe uconvergence uof uthe u

scaled usequence uto ua uBrownian umotion uin(A6);u usufficient uconditions uare uwell uknown; usee ufor u

example, u[90, Section 7.4]. uBefore uproceeding ufurther, uwe ufirst uobtain ua umoment ubound uof u

θn.u

Lemma 2.9. Assume uthat (A1)-(A6) uhold. Then uthere uis uan uNε usuch uthat ufor uall un > Nε,

(35)

Proof. To ubegin, uit ucan ube useen uthat uEnV (θn+1)− V (θn) ≤u− ελuV (θn) + εEnVθ′(θn)fW (θn, r1,n, r2,n, ηn) +O(ε2)(1 + V (θn)), (2.32) where uθ +

n uis uon uthe uline usegment ujoining uθn andu uθn+1. uThe bound inu u(2.32) ufollows ufrom uthe u

growth ucondition uinu(A4)(iv), uthe ulast uinequality followsu ufrom u(A1). uTo uproceed, uwe uuse uthe u

methods uof uperturbed uLyapunov ufunctions, whichu uentitles uto uintroduce usmall uperturbations

u

toua uLyapunov ufunction uin uordertou umake udesired ucancelation. uDefine ua uperturbation

V1ε(θ, n) = ε ∞ X j=n EnVθ′(θ)fW (θ, r1,j, r2,j, ηj). Noteuthat |V1ε(θ, n)| = Kuε(1 + V (θ)). (2.33) Moreover, uEnV1ε(θn+1, n + 1)− V1ε(θn, n) = O(ε2)(V (θn) + 1)− εuEnVθ′(θn)fW (θn, r1,n, r2,n, ηn). (2.34) Define Vε_{(θ, n) = V (θ) + V}ε

1(θ, n).Using u(2.32) and u(2.34), uwe uobtainu

EnVε(θn+1, n + 1)≤u(1− ελ)Vε(θn, n) + O(ε2)(1 + Vε(θn, n)). (2.35)

u

Choosing uNε uto ube au upositive uinteger usuch uthat (1 − (λε/2))

Nε _≤ _u_Kε.

u

Iterating uon uthe

u

(36)

u

can uthen uobtain

uEuVε(θn+1, n + 1) ≤u(1_{− ελ)EV}ε(θn, n) + O(ε2)(1 + Vε(θn, n)) ≤u(1₋ ελ 2 ) n_EVε_(θ 0, 0) + O(ε) = O(ε). (2.36)

whenun > Nε. uTheusecondlineu uof u(2.36)ufollowsufrom 1− λε + O(ε

2₎_≤_u₁₋λε

2 forusufficiently u

small uε. uNow uusing u(2.33) uagain, uwe ualso uhave EV (θn+1) = O(ε). uThus uthe udesired uestimate u

follows.

Define uzn= (θn− θ

∗_)/√_{ε. Then}

u

it uis readily uverified uthat

zn+1 = zn +ε(M + Pθ(θ∗))zn

+√ε(P (θ∗, ηn)− P (θ∗) + W (θ∗, r1,n, r2,n, ηn))

+ε(Pθ(θ∗, ηn)− Pθ(θ∗)

+Wθ(θ∗, r1,n, r2,n, ηn))zn+ o(|zn|2).

(2.37)

Corollary 2.10. Assumeuthat (A1)-(A6)uhold.uIf utheuLyapunovufunction uisulocallyuquadratic,

u

i.e.,

V (θ) = (θ− θ∗₎′_Q(θ_{− θ}∗_{) + o(}_{|θ − θ}∗_|2_).

Then uEV (zn) = O(1) ufor uall un > Nε.

Nowuweuareuinuaupositionutostudyu utheuasymptoticupropertiesuthroughuweakuconvergenceuof u

appropriately interpolatedusequence uofuzn.uDefineuz

ε_{(t) = z}

nufor t∈ [(n −Nε)ε, (n−Nε)ε + ε]. u

We can uintroduce ua utruncation usequence. Thatu uis, uin ulieu uof uz

ε₍_·),

u

we ulet uN ube ua ufixed ubut u

otherwise uarbitrary ularge upositive uinteger uand udefine uz

ε,N₍_·)

u

as uan uN-truncation uof uz

ε₍_·).

u

That uis, uituis uequal uto uz

ε₍_{·) up}

u

until uthe ufirst uexit uof uthe processu ufrom uthe usphere uSN ={|z| :

u|z| ≤ uN} uwith uradius uN. uAlso udefine ua utruncation ufunction uq

N_{(z) = 1}

u

if uzu∈ uSN, u= u0 uif u

(37)

u

operator uwith utruncation u(i.e., uthe ufunctions uused uin uthe uoperator uare uall umodified uby uuse uof

u

qN_(z)).

u

Thenuweuproceedutoestablishu utheuconvergence uofuz

ε,N₍_·)

u

asuausolutionuofuaumartingale

u

problem uwith uthe utruncated uoperator. uThen ufinally, lettingu uN →u∞, uwe uuse uthe uuniqueness u

of uthe umartingaleuproblem utouconclude uthe uproof. uThe uargumentuisusimilar utouthat uofuSection u

2.3. uForufurtherutechnicaludetails,uwe ureferutheureader utou[90, pp. 284-285].uSuch uautruncation uudevice uis ualso uwidely uused uin uthe analysis uof upartial udifferential uequations. uFor unotational u

simplicity, uwe uchoose utousimply uassumeuthe uboundedness urather uthan ugouwith uthe utruncation

u

route. uThus umerely ufor notationalu usimplicity, uwe usuppose uz

ε_(t)

u

is ubounded. uFor uthe urate uof

u

convergence, uour ufocus uis uon the uconvergence uof uthe usequence uz

ε₍_·).

u

We ushall ushow uthat uit u

converges utouaudiffusionuprocess uwhoseucovarianceumatrixutogether uwithutheuscalingufactor uwill u

provide uus uwith uthe udesired uconvergence urates. uAlthough umore ucomplex uthan uTheorem 2.4, u

weustill uuse uthe umartingaleuproblem usetup. Tou ukeep utheupresentation urelatively ubrief, uwe ushall

u

only uoutlineuthe umain usteps uneeded.

For uany ut, s > 0, zε_{(t + s)}_{− z}ε_{(t) = ε} (t+s)/ε−1_X j=t/ε (M + Pθ(θ∗))zj +√ε (t+s)/ε_X j=t/ε f W (θ∗, r1,j, r2,j, ηj) +ε (t+s)/ε_X j=t/ε f Wθ(θ∗, r1,j, r2,j, ηj)zj. (2.38)

Noteuthat ufor uany uδ > 0, ut, s > 0 uwith us < δ,

uE_tε √ ε (t+s)/ε_X−1 j=t/ε f W (θ∗, r1,j, r2,j, ηj) 2 ≤ Kε t + s ε − t εu = Ks_≤uKδ.

(38)

and usimilarly, uE_tε ε (t+s)/ε−1_X j=t/ε f Wθ(θ∗, r1,j, r2,j, ηj)zj 2 ≤uKs_≤uKδ.

Using uCorollary 2.10 uand usimilar uargument uas thatu uof uTheorem 2.4, uwe uhave uthe ufollowing u

result.

Lemma 2.11. Assume uconditions uof uCorollary 2.10, {z

ε₍_·)}

u

is utight uon uD([0, T ] : R

2r_).

Next uweucan uextractau uconvergent usubsequenceuof u{z

ε₍_·)}.

u

Withoutuloss uof ugenerality, ustill

u

denote uthe usubsequence uby uz

ε₍_·)

u

with ulimit uz(·). uFor anyu ut, s > 0, u(2.38) uholds. The uway

u

to uderive uthe ulimit uis usimilar uto uthat uof uTheorem 2.4 uusing umartingale uproblem uformulation

u

although uthe uanalysis uis umore uinvolved. uWe uproceed uto ushow uthat uthe ulimit uis uthe uunique u

solutionufor uthe umartingale uproblem uwithuoperator

Lf (z) = 1 2tr(Σfzz(z)) + (∇f(z)) ′_{(M + P (θ} ∗)), (2.39) u for uuf ∈uC 2 0, uC 2 u

functionsuwith ucompact usupport.

Using usimilarunotation uasthatu uof uSection u2.3, uredefine

eh = h(z(ti) : i≤ κ), ehε= h(zε(ti) : i≤ κ). (2.40) By u(A4)u(ii), uas uε→ 0 Eehεh_ε (t+s)/ε X j=t/ε f W (θ∗, r1,j, r2,j, ηj)zj i = Eehεhε (t+s)/δε X l=t/δε lmεX+mε−1 j=lmε f W (θ∗, r1,j, r2,j, ηj)zj i → 0.

(39)

Using uthe unotationuasuin uSection u2.3, u Eehεh (t+s)/δε X l=t/δε δε mε lmεX+mε−1 j=lmε f W (θ∗, r1,j, r2,j, ηj)[zj− zlmε] i → 0 as ε → 0. Moreover, uby u(A6) uwe uhave

u√ε (t+s)/ε X j=t/ε f W (θ∗, r1,j, r2,j, ηj)→u ˆ t+s t uduB(u)

as uε→u0. uForuthe firstu utermuof u(2.38), uwe uhave u

Eehεh_ε (t+s)/ε_X−1 j=t/ε (M + Pθ(θ∗))zj i → Eehh ˆ t+s t (M + Pθ(θ∗))z(u)du i

as uε→u0. uPuttinguthe uaforementioned argumentsu utogether, uwe uhave uthe ufollowing utheorem.

Theorem 2.12. uUnder uconditions u(A1)-(A7), u{z

ε₍_·)}

u

converges uto uz(·) such that uz(·) uis ua u

solution uof uthe ufollowing ustochasticudifferential uequation

dz = [M + Pθ(θ∗)]zdt + Σ1/2d bB(t), (2.41)

where uB(b ·) uisua ustandard uBrownian umotion.

Remark 2.13. To usee uwhat ukind uof ufunctions uand uthe uassociated uODE uand uSDE uwe uare u

working uwith, uwe ulook uatutwo usimpleexamples.u uIn uthe ufirst uexample uwe uuse uF (x) = x

2_,

u

take

u

(40)

u

with umean u[0, 0, 0, 0]

′

u

and uvariance uI. uThen u

M =          −0.271 0 −1.5 0 0 −0.271 0 −1.5 1 0 −1.5 0 0 1 0 −1.5          , (2.42)

and uthe ulimit uODE uis ugiven uby

˙θ(t) = Mθ(t). Thusuθ

∗ _{= [0, 0, 0, 0]}′

u

isutheuminimizeruofutheuswarm,uanduPθu(θu

∗_u_{) =}_u₀_∈_u_R4×u4

u

(au4×4umatrix u

with uall uentries ubeing u0). uIn uthe ustandard uoptimization ualgorithm, uone uprocessor uis urunning u

touapproximateuthe uoptimum.uHere, uwehaveu utwouparticles urunningusimultaneously. uNoteuthat u

θ uhas ufour ucomponents. uTwo uof uthem representu uthe uparticles’ upositions, uand uthe uother utwo

u

are uthe uparticles’ uspeeds. uAt uthe end, ubothuof uthe uparticles ureach uthe uminimum, urepresenting

u

something uthat umight ube ucalled u“overlapping.” Inu uaddition, ueventually uthe uspeeds uof uboth

u

particles ureach u0 u(or uat uresting upoint). uAs faru uas uthe urate ofu uconvergence uis uconcerned, uwe u

concludeuthatuθn−θ

∗

u

decadesuinutheuorderuofu

√

εu(inutheusenseuofuconvergenceuinudistribution). u

Not uonlyuis uthe umeanusquares uerror uof u(θn− θ

∗₎

uuofuthe uorderuε, ubut ualso uthe uinterpolationuof u

the uscaled usequence uzn uhas ualimitu urepresented uby uaustochastic udifferentialuequation

dz = Mzdt + d bB(t).

That uis, u(2.41) uis usatisfied uwith uPθ(θ

∗_{) = 0}

u

and uΣ = I. uAs uillustrated uin u[90], uthe uscaling u

factoru

√

εutogetheruwithustationaryucovarianceuofuthe SDEu ugivesuusutheurateuofuconvergence.uIn u

terms uofuthe uswarm, uloosely, uwe uhave uθnu− θ

∗_u_∼_u_N(0,_u_ε_u_Ξ

0) u[that uis,u(θn− θ

∗₎

u

(41)

asymptot-ically unormaluwith umean u0u∈ uR

4

u

and ucovariance umatrix uuεuΞ0], uwhere uΞ0 uis uthe uasymptotic

u

covariance umatrix uthat uis uthe usolutionuof uthe uLyapunov uequation uMuΞ0u+uΞ0uM

′_u₌_u_{− I.}

Likewise, uin uthe usecond uexample, uF (x) = sinux withu ux ∈ [0, 1]. uWe ustill utake u2 uparticles, u

same uparameters usetting, uand uassume u{ηk} uis uthe usame ui.i.d. usequence uasubefore. uThen M uis u as uinu(2.42), uand Pθ(θ∗) =          0.75 0 0.75 0 0 0.75 0 0.75 0.75 0 0.75 0 0 0.75 0 0.75                   1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0          .

It ufollows uthat u(2.41)uholds uwith

M + Pθ(θ∗) =          1.229 0 −1.5 0 0 1.229 0 _−1.5 2.5 0 _−1.5 0 0 2.5 0 _−1.5         

and uΣ = I. uSimilar utouthe uprevious uexample, uwe uhave uthat uθn− θ

∗

u

is uasymptotically unormal u

withmeanu u0uanducovarianceuεeΞ,uwhereΞueuisutheuasymptoticucovarianceusatisfyingutheuLyapunov

u

equation u(M + Pθ(θ

∗_))e_{Ξ + e}_{Ξ(M + P}

θ(θ∗))′ =−I. uu

2.5 Numerical

u

Examples

We uuse utwo usimulation uexamples uto illustrate uthe convergence uproperties. Using u(2.5), uwe

u

takeuε = 0.01,uχ = 1, κ1 =−0.271,uκ2 = 1, c1 = c2 = 1.5.uFor simplicity,uweutakeuthe uadditive

u

noise uPr(θn, ηn) = uPr(θn) + ηn uand uPg(θn, ηn) = uPg(θn) + ηn, where uηn uis ua usequence uof u

(42)

0 500 1000 1500 2000 −25 −20 −15 −10 −5 0 5 10 15 20 Trajectories Iterations Particles 0 500 1000 1500 2000 −20 0 20 40 60 80 100 120

Centered and Scaled Tracking Errors

Iterations

Errors

Figure 1: Particle uswarm uof uone-dimensional uX usingu uF1 udefined uin u(2.43).

uu 0 500 1000 1500 2000 −1 0 1 2 3 4 5 6 7 8 Graph of Pr Iterations Pr 0 500 1000 1500 2000 −6 −4 −2 0 2 4 6 Graph of Pg Iterations Pg

Figure 2: Graphsuof uPr uand uPg usingu uF1 udefined uin u(2.43).

u

number uof uswarms uto ubeu5.

Example 2.14. uuuuConsider uthe sphere ufunction:

F1(x) = D X i=1 x2i, (2.43) u

where uD uis uthe udimension uof uthe uvariable ux. uIts uglobal uoptimum uis u(0, 0, . . . , 0)

′_{. First,}

u

the

u

dimension uof uX uis setu uto ube u1. uFigures 1u uand u2 ushow uthe ustate utrajectories u(top) uand uthe u

centered uand uscaled uerrors ofu uthe ufirst ucomponent uθ

1

n u(bottom). uuThe ugraphs uof uPr u(top) uand u

Pg u(bottom) uare ualso uprovided.

Next, uwe uconsider uthe u2-dimension ucase uof uX. uFigures u3 uand u4 uillustrate uthe ustate u

trajec-tories u(top) uand uthe ucentered uand scaledu uerrors uof uthe ufirst ucomponent uθ

1

n u(bottom), uand uthe u

(43)

−30 −20 −10 0 10 −20 0 20 40 0 1000 2000 3000 4000 5000 6000 X(1) Trajectories X(2) Iterations 0 1000 2000 3000 4000 5000 −20 0 20 40 60 80 100

Iterations

Errors

Figure 3: Particle uswarm uof utwo-dimensional uX usingu uF1 udefined uinu(2.43).

uu 0 1000 2000 3000 4000 5000 −2 0 2 4 6 8 10 Graph of Pr Iterations Pr 0 1000 2000 3000 4000 5000 −10 −8 −6 −4 −2 0 2 4 6 8 10 Graph of Pg Iterations Pg

Example 2.15. uConsider uthe Rastrigin ufunction [57]

F2(x) = 10D + D X i=1 [x2_i − 10 cos(2πuxi)], (2.44) u

where uD uis uthe dimensionu uof uthe uvariable ux.

This ufunctionuhas umany ulocaluminima. Itsu uglobal uoptimumuisugivenuby u(0, 0, . . . , 0)

′_.

u

Same

u

asuExample 2.14, uweuset uthe udimension uof uX tou ubeu1 uand 2,u urespectively. uTheuparticle uswarm

u

trajectories, uthe ucentered uand uscaleduerrors uof uthe ufirst ucomponent,uand ugraphs uof uPruand uPg

u

are ugiven uin uFigures u5uto u8, respectively.

From uthese ufigures, uwe ucan uconclude uthat uall uthe uswarms uconverge uto ua upoint uin uthe u

searchinguspace. uThese uresults wereu uobtained uwithout uassuming uthatur1,ur2, uuPr, uand uuPg uare u

fixed. uOur unumerical uresults uconfirm uour utheoretical findings uinuSections u2.3 uand u2.4.

(44)

0 500 1000 1500 2000 −20 −15 −10 −5 0 5 10 15 20 Trajectories Iterations Particles 0 500 1000 1500 2000 −10 0 10 20 30 40 50 60 70

80 Centered and Scaled Tracking Errors

Iterations

Errors

Figure 5: Particle uswarm uof uone-dimensional uX usingu uF2 udefined uin u(2.44).

uu 0 500 1000 1500 2000 −14 −12 −10 −8 −6 −4 −2 0 2 Graph of Pr Iterations Pr 0 500 1000 1500 2000 −2 −1 0 1 2 3 4 Graph of Pg Iterations Pg

−20 −10 0 10 20 30 −40 −20 0 20 40 0 1000 2000 3000 4000 5000 6000 X(1) Trajectories X(2) Iterations 0 1000 2000 3000 4000 5000 −10 0 10 20 30 40 50 60 70

Iterations

Errors

Figure 7: Particle uswarm uof utwo-dimensional uX usingu uF2 udefined uinu(2.44).

uu 0 1000 2000 3000 4000 5000 −6 −5 −4 −3 −2 −1 0 1 Graph of Pr Iterations Pr 0 1000 2000 3000 4000 5000 −6 −4 −2 0 2 4 6 8 Graph of Pg Iterations Pg

(45)

u

all uparticles uhave uconverged uto ua upoint uin uthe usearch uspace. uSometimes uwe uobserve u(e.g., uin

u

the usecond uexample) uthat uthe uconvergence upoint isu unot uthe uglobal uor ueven ulocal uoptimum.

Thisuproblem,ureferredutouasuprematureinu uliteratures,uoccurs ucommonlyuinuevolutionaryu

algo-rithmsusuchuasuPSOs,ugeneticualgorithms,evolutionaryu ustrategies,uetc.uBaseduonuourunumerical u

experiments, uwe ufound uthat uif uthe ucost functionu uis uunimodal uand uwith ulow udimensions, uthe u

equilibrium ucoincides uwith uthe uproper parameter choice. Theu uproblem uof uunder uwhat u

con-ditions uthe uequilibrium ucoincides uwith uthe uoptimum udeserves uto ube ucarefully ustudied uin uthe

u

future.

2.6 Further

u

Remarks

In uthis uchapter, uwe uconsidered ua ugeneral formu uof uPSO ualgorithmsuusing uaustochastic u

approx-imation uscheme. uDifferent ufrom uthe uexisting uresults uin uthe uliterature, uwe uhave uused uweaker u

assumptions uand uobtained umore ugeneral uresults uwithout udepending uon uempirical uwork. uIn u

addition, uwe uobtained urates uof uconvergence ufor uthe uPSO ualgorithmsufor uthe ufirst utime.

Several uresearch directions umayube upursued uintheu ufuture. uWe ucan uuse ustochastic u

approx-imation umethods uto uanalyze uother uschemes ofu uPSO, ufor uexample, uthe uSPSO2011 uconsidered

u

in [69]. uWe ucan uset uup ua ustochastic approximationu usimilar uto u(2.8) uand uanalyze uits u

conver-gence uand uconvergence urate. uFinding uways uto usystematically uchoose uthe uparameter uvalues u

κ1, uκ2, uc1, uand uc2 uis ua upractically uchallenging uproblem. uOne uthought uis uto uconstruct ua ulevel u

two u(stochastic) optimization ualgorithm to uselect ubest parameteru uvalue uin ua usuitable usense. u

To uproceed uin uthis udirection urequires ucareful uthoughts uand uconsideration. uuIn uaddition, uwe

u

can uconsider uthat usome uparameters usuch uas uχ, κu 1, uetc. uare notu ufixed ubut uchange urandomly

u

during uiterations uor uchange uowing uto someu urandom uenvironment uchange u(for uexample, usee