Automatic Customization of Powered Knee Prostheses for Individual User using Adaptive Dynamic Programming.

(1)

ABSTRACT

WEN, YUE. Automatic Customization of Powered Knee Prostheses for Individual User using Adaptive Dynamic Programming. (Under the direction of Dr. He Huang.)

Advanced powered prostheses have demonstrated the potential to restore natural gait

and decrease the energy expenditure of amputees during different locomotion modes.

However, those benefits are dependent on the control parameters of the powered

prosthe-ses which are manually and experimentally fine-tuned for each individual user for each

locomotion mode. It is very impractical to adjust a large number of parameters

simultane-ously in clinics for each amputee patient. In addition, this manual configuration procedure

increases the financial cost of advanced prostheses, and the time cost for both clinicians

and amputee patients. This tuning procedure greatly hindered the wide use of the advanced

prostheses. In this thesis, we propose a new method using machine learning to

automati-cally configure the control parameters of powered knee prostheses to fit the behavior of

individual amputee users.

As, safety is the rule of thumb when involve human participants. To validate our novel

approach, we built an OpenSim model to simulate a human-prosthesis system, of which

one knee joint was regulated by a commonly used finite state machine impedance controller.

We implemented and compared two online, model-free adaptive dynamic programming

(ADP) controllers, direct heuristic dynamic programming (dHDP) and the neural fitted

Q with continuous action (NFQCA), to automatically tune the 12 impedance parameters

of the prosthetic knee joint. Both ADP controllers learned to tune the high dimension

impedance parameters and achieved balanced walking, but dHDP outperformed NFQCA

in this application during a 200 gait cycle-based testing.

(2)

for the human-prosthesis system, which constantly subjects to measurement noise,

envi-ronment change, and human body caused variations. We modified the dHDP algorithm

to automatically tune the 12 impedance parameters of an experimental knee prosthesis

while subject walking with it on the treadmill. To achieve normative knee kinematics, the

ADP observed the knee kinematics, tuned the impedance parameters, and learned the

knowledge/policy through interaction with the human-prosthesis system using

reinforce-ment learning. We tested the ADP-tuner on two subjects (i.e. one amputee subject and one

able-bodied subject) with multiple testing sessions. The ADP-tuner learned to reach target

gait kinematics in an average of 300 gait cycles or 10 minutes of walking.

Besides restoration of normal knee kinematics, can we improve amputee’s gait

sym-metry and efficiency by personalizing the prosthesis control and/or guided gait training?

Understanding the human-prosthesis physical interaction is vital to build the intelligent

prosthesis that truly benefits the amputee users. With the ADP-tuner, we modulated the

control of robotic knee prosthesis to investigated the relationship between the prosthetic

knee kinematics and human-prosthesis gait performance. Across 6 subjects (3 amputee

subjects and 3 able-bodied subjects), we found that the prosthetic impulses are related

with the prosthetic knee kinematics and that the inter-limb impulse is significantly and

consistently correlated with stance time symmetry. This provides the knowledge to

poten-tially improve gait symmetry and efficiency with both prosthesis control and guided gait

training for amputee by monitoring anterior-posterior impulse components.

Our long-term goal is to restore/enhance amputee’s daily life with advanced robotic

prosthesis. With this in mind, we will focus on 1) realizing adaptive robotic knee prostheses

that can adapt to users while they walking with it in the real world (i.e. with changing tasks),

2) identifying the optimization goals for human-prosthesis system that matter to amputee,

(3)

(4)

Automatic Customization of Powered Knee Prostheses for Individual User using Adaptive Dynamic Programming

by Yue Wen

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

Biomedical Engineering

Raleigh, North Carolina

2019

APPROVED BY:

H. Troy Nagle Jason Franz

Jennie Si External Member

Min Chi

He Huang

(5)

DEDICATION

To my parents,

(6)

BIOGRAPHY

Yue Wen was born in a small village in Handan, Hebei in Febrary 1986 and graduated from

high school in 2007. Yue received the Bachelor of Science in Automation from Wuhan

University of Technology, Wuhan, China, in 2011, and the Master of Science in Control

Engineering and Control Theory from Huazhong Univerisity of Science and Technology,

Wuhan, China, in 2014. In the two year graduate study, he focused on the research on hand

rehabilitation of people with stroke through functional electrical stimulation and exo-hand.

Through this opportunity, he gained strong interests in the biomedical engineering and

rehabilitation.

After finishing his M.S. degree, Yue started to pursue his Ph.D. degree in the

Neuromus-cular Rehabilitation Engineering Laboratory (NREL) under Dr. Helen Huang’s supervision

at the NCSU/UNC Joint Department of Biomedical Engineering at North Carolina State

University and the University of North Carolina at Chapel Hill. Within 5 years at NREL, he

focused on improving the personalization procedure of powered lower limb prosthesis

through reinforcement learning approach, funded by National Science Foundation. His

research interests include adaptive control of robotic prostheses and assistive robotic

de-vices, machine learning, human motion analysis, and gait analysis. As February of 2019, he

has co-authored 4 journal articles and 5 peer-reviewed conference papers. In addition, Yue

(7)

ACKNOWLEDGEMENTS

This dissertation work would never be possible without those people who are willing to

support and help me during my graduate study.

I would like express my deepest appreciation to my advisor, Dr. Helen Huang, for her

endless support for the past five years, for her rigorous guidance, enthusiastic

encourage-ment, and constructive critiques on my research projects and manuscript preparation,

and to Dr. Jennie Si for her time and efforts in advising me in machine learning. And many

thanks to my committee members, Dr. Chi Min, Dr. Jason Franz, Dr. Troy Nagle, for their

valuable time, and concrete suggestions on my research topics.

I would like to thank my research mates. Ming Liu, for his supports on mechanical

de-sign and control development, and insightful discussions about research and life; Stephanie

Huang, for her constructive suggestions on experimental design and data analysis; Andrea

Brandt, for those valuable discussions on research questions and bio-mechanical

analy-sis, for her help with participant recruitment, experiments, manuscript editing; to all lab

members at NREL for valuable experience in group meeting, journal club, and consulting

support.

Special thanks to Derek Frankena, CPO and Martha Soyars, PT, for their tremendous

help with subject training and experimental setup, to all participants for their valuable time

and contribution to my research.

Last but not least, I would like to thank my cohort: Lizhi Pan, Minhan Li, for their

accompany and support for late night study, adventures in the holidays. I would also thank

my friends (Mengying Wang, Zhongcan Xiao, etc.) for their encouragement, help, and

(8)

LIST OF TABLES

Table 2.1 Upper bound and lower bound of peak error and duration error . . . . 27 Table 2.2 Effects of ADP controller specifications on control performance . . . . 35 Table 2.3 Test results of dHDP after training . . . 37 Table 2.4 Test results of NFQCA after training . . . 37

Table 3.1 Post-tuning impedance parameters of three testing sessions for two subjects . . . 70

Table 4.1 Subject information . . . 93 Table 4.2 Prosthetic braking impulse estimation using angle features of

pros-thetic knee kinematics . . . 102 Table 4.3 Prosthetic propulsive impulse estimation using angle features of

pros-thetic knee kinematics . . . 103 Table 4.4 Correlation analysis between impulse measurements and stance time

symmetry index . . . 103 Table 4.5 Correlation analysis between impulse measurements and step length

(11)

LIST OF FIGURES

Figure 2.1 Human-prosthesis FS-IC control platform facilitated by OpenSim. One ADP controller is designed for each phase (m =1, 2, 3, 4) of the FSM to provide respective impedance parameter values[refer to (2.1)] for the OpenSim during each gait cyclen. . . 21 Figure 2.2 Red line: target knee profile. Blue line: simulated knee trajectory.

Vertical lines: transitions between gait phases, as determined by the FS-IC transition rules. . . 24 Figure 2.3 Left: initial knee angle profiles. Right: dHDP-tuned knee angle profiles

(light blue) in run 2 of scenario 3. . . 34

Figure 3.1 Block diagram of ADP-tuner, an automatic robotic knee control pa-rameter tuning scheme by dHDP with amputee in the loop. The learn-ing control system operates at three different time scales: 1) real-time impedance controller provides outputs at 100 Hz to regulate the joint torque; 2) the finite-state machine runs at the gait frequency (denoted by time indexg) with four phases per gait cycle; 3) the dHDP gener-ated control is updgener-atedIm,nevery few gaits (denoted by time indexn) to update the impedance parameters. The respective variables in the figure are defined and discussed in Sections II and III. The ADP-tuner consists of four dHDP blocks (m =1, 2, 3, 4) corresponding to four gait phases in the finite-state machine impedance controller. . . 47 Figure 3.2 Feature representation of near-normal knee kinematics during one

gait cycle was used as learning control target, where ¯Dmindicates the angle feature, and ¯Pm indicates the duration feature. The phase index is indicated bym=1, 2, 3, 4. The start at 0%, and the finish at 100% are the heel strike events, and 60% is approximate toe off time. . . 49 Figure 3.3 Comparison of knee kinematics by RMSE between pre-tuning and

post-tuning across multiple testing sessions. The square markers represent the testing sessions from the TF subject, and circle markers represent the testing sessions from AB subject. Open marker repre-sents the pre-tuning condition, and closed marker reprerepre-sents the post-tuning condition. . . 65 Figure 3.4 Peak error comparison between pre-tuning and post-tuning

(12)

Figure 3.5 Duration error comparison between pre-tuning and post-tuning con-ditions of the TF subject (a) and the AB subject (b) for each phase. Each bar represents the mean error of three testing sessions, and the error bars denote one standard deviation from the mean. . . 66 Figure 3.6 Peak error and duration error during the four phases for a

representa-tive tuning procedure. (a) stance flexion phase, (b) stance extension phase, (c) swing flexion phase, and (d) swing extension phase. The red dots were times when the₋1 reinforcement signals incurred, and the blue dots were times when the₋0.8 reinforcement signals incurred. The horizontal blue areas, which centered at zero, indicate the tolerance ranges for each feature. The paired horizontal red lines indicate the allowed maximum and minimum exploration limits for each feature. . . 67 Figure 3.7 Impedance parameters of the four phases during a representative

tun-ing procedure. (a) stance flexion phase, (b) stance extension phase, (c) swing flexion phase, and (d) swing extension phase. The meanings of the red and blues dots are the same as in Fig. 3.6. . . 68 Figure 3.8 Learned ADP auto-tuner on-line evaluation results. (a) Trends of

angle error along tuning iterations. (b) Trends of duration error along tuning iterations. (c) Changing ˆJ values as learning proceeded. (d) RMSE along tuning iterations. . . 72

Figure 4.1 Feature representation of near-normal knee kinematics during one gait cycle was used as learning control target, where ¯Dmindicates the angle feature, and ¯Pm indicates the duration feature. The phase index is indicated bym=1, 2, 3, 4. The start at 0%, and the finish at 100% are the heel strike events, and 60% is approximate toe off time. . . 88 Figure 4.2 Framework of the human-in-the-loop auto-tuning procedure. (A and

(13)

Figure 4.3 The features of the prosthetic knee kinematics covered a wide range during the exploration for each subject. The stance flexion angle and stance extension angle during stance phase varied from 0 to 20 degrees and -8 to 10 degrees respectively. And the swing flexion angle and swing extension angle covered range of 45 to 60 degrees and -8 to 10 degrees respectively. . . 100 Figure 4.4 Stance time SI, step length SI, propulsion SI, and braking SI changed

significantly during the control parameter exploration for all six sub-jects. The mean and standard deviation of the iterations with maxi-mum and minimaxi-mum values are compared. Paired t-test reveal signifi-cant difference for all measurements (p<10−3_{). a) and b) The stance}

time SI and step length SI covered a wide range relative to variances without control change (3% for stance time SI, 5% for step length SI). All subjects cannot generate perfect stance time symmetry, but some of them can generate perfect step length symmetry. c) and d) The impulse symmetry indexes also varied a wide range compared to their variance without control change (17% for propulsion SI, 23% for braking SI). Most of the subjects can generate perfect braking symmetry (SI equals to 0), while only one of the subject can generate perfect propulsion symmetry. . . 101 Figure 4.5 Correlation between inter-limb impulse and stance time symmetry

index. (a)-(c) are from 3 amputee subjects, and (d)-(f ) are from 3 able-bodied subjects. The stance time symmetry index is positive with longer stance time with intact limb, and vice versa. The inter-limb impulse was the net impulse of the transition impulse at transition from intact limb to prosthetic limb (summation of propulsive im-pulse of intact limb and braking imim-pulse of prosthetic limb in one gait cycle). And the impulse was normalized to body weight. More than 150 samples were recorded for each subject. Dots represent the averaged measurement pair from each iteration. Red line is the least squares fits results. . . 105 Figure 4.6 Block diagram of the prosthesis control parameters optimization for

(14)

CHAPTER 1:

Introduction

1.1 Background

1.1.1 Powered lower limb prostheses

Advanced powered lower limb prostheses have shown great potential to allow lower limb

am-putee fulfilling different locomotion tasks (e.g. level ground walking, stair ascend/descend,

ramp ascend/descend) more naturally[1–4]and efficiently[5–7]. Compared with passive

prostheses, these advanced powered prostheses share common features: 1) an active

ac-tuator that provides net power at the prosthetic joint, and 2) a controller that emulates

the behavior of biological joints. The prostheses control typically consists of a finite state

machine and a low-level controller to regulate the properties of the prosthetic joints, such

as impedance, within each walking gait phase. The prostheses control are flexible to

ac-commodate different locomotion mode for individual user through modifying the control

parameters, which also means the control parameters need to be customized for each

individual user for each locomotion mode.

In the last decades, many studies has been done to improve the mechanical design of the

powered knee prosthesis. Sup et al. designed a powered ankle-knee prosthesis with

motor-driven ball screw assembly through slider-crank linkage, and applied finite-state-based

impedance controller to restore walking and standing function[8]. Martinez-Villalpando

et al. designed a powered knee prosthesis with two series-elastic actuators in

agonist-antagonist arrangement with finite-state controller to restore level-ground walking[1].

(15)

con-trolled by finite-state controller with user intention detection[9]. Rouse et al. applied a

clutchable series-elastic actuator mechanical structure to improve the efficiency of powered

knee prosthesis, which is also controlled by finite-state impedance controller[10]. Lenzi et

al. designed a lightweight robotic knee prosthesis with a hybrid actuation system to provide

stair ambulation capability[11].

In addition, different control methods have been applied to extend the functions and

improve the performance of the powered knee prostheses. Many studies have applied

finite state machine controllers to restore different locomotion tasks and validate the

performance of powered knee prostheses[1, 8–10, 12]. Typically, the finite state machine

controller includes 12 to 15 individual control parameters to mimic the biological function

of the knee joint[4, 8, 12–15]. Lately, many variant forms of finite state controller have

been explored to further improve the performance and/or decrease the number of control

parameters. Hoover et al. applied direct myoelectric (EMG) control for powered knee

prosthesis control on stair ascent task. Transfemoral amputee showed robust and repeatable

performance through combining proportional myoelectric torque and a state-determined

knee impedance[16]. To improve the robustness of the controller, Gregg et al. introduced

virtual constraint control for powered knee prosthesis control, which used effective shape

as a virtual constraint during stance phase and PD control during swing phase[17]. Pfeifer

et al. proposed to define joint stiffness and moment requirements as functions of joint

angle, and incorporated the idea with a finite-state controller to improve the prosthetic

knee moment performance[18]. With intact-leg quasi-stiffness profiles of the knee and

ankle joint at different walking speeds, Lenzi et al. designed the speed-adaptive control of

powered ankle-knee prostheses with walking-speed estimation and finite-state machine

controller[19].

(16)

potential to restore natural gait under different locomotion tasks and improve the gait

efficiency for amputees. Many studies have shown that the powered knee prostheses allow

amputee generating gait kinematics that are relatively closer to healthy gait, compared

to walking with passive-elastic prostheses for over ground walking[1, 2], upslope walking

[3], stair ascent and stair descent [4]. Ledoux et al. investigated the metabolic effort of

transfemoral amputees for stair ascent with a powered prosthesis and a passive prosthesis,

and the results demonstrated that the oxygen consumption and stair ascent time decreased

when walking with powered prosthesis[5]. Wolf et al. demonstrated that knee power of

the intact limb significantly less during stair ascent when walking with a powered knee

prosthesis than when walking with a passive knee prosthesis[6]. Willimas et al. studied the

gait performance of transfemoral amputees when walking with different knee prostheses

(i.e. a OttoBock C-Leg and a powered knee prosthesis), and demonstrated that, when

walking with powered knee prosthesis, the hip range of motion symmetry between legs

improved, prosthetic knee power increased, and pelvis-torso twist coupling decreased[7].

To achieve the aforementioned benefits, the prostheses control need to be tuned for

each individual user so as to accommodate human variances in term of weight, physical

ability, etc.[1, 2, 4, 9, 11, 16, 17, 20]. For clinical usage, when the users’ physical condition

changed or their behavior changed after adapting to the device, their gait performance

will be deteriorated due to the unmatched control parameters. Therefore, they need to

re-visit the clinic to re-tune the control parameters to maintain the performance of the

prostheses. Currently, the tuning process, also called customization process, of the powered

(17)

1.1.2 Customization of powered knee prostheses

Traditionally, clinicians/experts manually and heuristically tune prosthesis control

param-eters for an individual amputee in the clinic/laboratory by adjusting 1 or 2 parameters

simultaneously while observing the amputee’s gait. The clinician/expert repeats these steps

until the prosthetic knee reaches a desired, subjective performance level. To alleviate the

customization effort for clinicians/experts, many studies has been done to provide insight

about the stiffness of knee joint, decrease the number of control parameters that need to

be tuned.

Some researchers tried to measure the biological knee stiffness so as to guide the

pros-theses control. Pfeifer et al. combined electromyography, kinetic, and kinematic

measure-ments to estimate muscle force and joint stiffness of intact limb at isometric conditions

[21]. Shamaei et al. characterized the quasi-stiffness, which is the slope of the line that fits

moment-angle graph, of the knee joint for the flexion and extension period during stance

[22]. Further, Misgeld et al. developed observer-based knee stiffness estimation approach

that can estimate the stiffness of knee joint in real-time during movement[23]. However,

these approaches can potentially provide good initial guess of the prosthetic knee joint

stiffness, but have not been validated in actual prostheses application.

Alternately, studies have been done to decrease the number of parameters requiring

tuning with different control methods. Simon et al. developed a few strategies to decrease

the number of parameters requiring tuning, including associating the joint impedance

with previous state, joint angle, and prosthesis load[24]. By introducing virtual constraint

control, Gregg et al. decreased the number of control parameters during stance phase from

6 impedance parameters to 4 PD gains[17].

(18)

ac-celerate the customization procedure of the powered knee prosthesis, but the remaining

control parameters still needs to be manually tuned. Considering the different locomotion

modes and the high dimension control parameters for each locomotion mode, the manual

tuning/customization procedure is still timing consuming and human-resources expensive,

which greatly hindered the usage of the powered knee prostheses and limited the benefits

of those advanced devices for lower limb amputees. Hence, how to customize this large

number of parameters for an individual user quickly and economically is the critical barrier

for the field of powered prosthesis control. The customization procedure is essentially an

optimization process to find the control parameters that can fulfill certain performance

criteria, such as normative knee kinematics, gait symmetry, etc.Therefore, we propose to optimize the high-dimensional control parameters of the powered knee prostheses au-tomatically through machine learning approach to alleviate/substitute manual tuning procedure.

1.2 Optimization of powered knee prostheses

1.2.1 Human-in-the-loop optimization

Many optimization approaches have been used to customize the control of the wearable

robotics (i.e. prostheses and exoskeleton) with human in the loop. These close-loop

ap-proaches apply different control parameters to the wearable robotic systems, monitor

the physiological measurements in real-time from the robotic-wearer system, and further

change the control parameters based on the physiological measurements. The real-time

measurements can accurately reflect the performance of the user with applied control

parameters. Using different optimization algorithms, the control parameters could be

(19)

Currently, human-in-the-loop optimization studies focus on optimizing exoskeleton

control for able-bodied subjects to minimize metabolic energetic cost. Koller et al.

devel-oped gradient descent method to optimize an onset time parameter, as percentage of the

gait cycle, of an ankle exoskeleton to enhance able-bodied persons’ gait efficiency[25].

Zhang et al. developed evolution strategy to optimize four control parameters of an ankle

exoskeleton, which represents the pattern of the ankle joint torque during stance phase, to

minimize the energy cost during walking for able-bodied people[26]. Ding et al. applied

Bayesian optimization to identify two control parameters of hip extension assistance, which

determined the shape of the force profile of the soft hip exosuit, to minimize the metabolic

rate[27]. However, it is hard to migrate these optimization methods to customize the

pow-ered knee prosthesis. One of the reasons is that those methods are difficult to scale up to a

high dimensional (_≥5) parameter space. To minimize metabolic cost of able-bodied walk-ing is possible because it is widely accepted that a primary goal for able-bodied persons in

walking is energy minimization[28], and often the relationship between human metabolic

cost and walking condition (e.g. walking speed) follows U-shape functions. Unfortunately,

it is unknown whether this theory is applicable to human-prosthesis system where a human

and robotic limb was connected in series.

For the human-prosthesis optimization, Huang et al. designed a cyber expert system

using fuzzy logic to code the human expert’s (e.g. prosthetist’s) tuning decisions[29], and

applied the cyber expert system to tune the high dimension control parameters of an

ex-perimental knee prosthesis to approach normative knee kinematics. However, this requires

prior/expert knowledge of how to tune the parameters from clinicians/experts, whose

knowledge is subjective and might be biased by their experience.

The optimization of powered knee prostheses subjects to many challenges: 1) it is

(20)

prosthesis system has continuous state spaces and relative high dimension control

pa-rameters (_≥12), 3) the human-prosthesis system has great uncertainty (e.g. unexpected disturbances) and high measurement noise. The reinforcement learning (RL), also called

adaptive/approximate dynamic programming(ADP), lends itself as an alternative solution

considering its widely application to optimal control problems of nonlinear dynamic

sys-tem[30–32]. Since we need to optimize the high dimension control parameters of powered

knee prostheses without any available human-prosthesis model, action-dependent

heuris-tic dynamic programming (ADHDP) stands out among the series of ADP designs due to its’

model free property (i.e. no need for dynamic model) and promising scalability[31, 33].

Within the branch of ADHDP, neural fitted Q with continuous actions (NFQCA)[34],

direct heuristic dynamic programming (direct HDP or dHDP)[35]have demonstrated the

learning capability in complex and realistic control problems in a on-line manner without

system dynamic model. Furthermore, dHDP and NFQCA (a batch variant of the dHDP[36])

designs have demonstrated their success with many applications of RL control for

con-tinuous state and control problems[36–43].We proposed to introduce the reinforcement learning/adaptive dynamic programming algorithms (dHDP and NFQCA) to automati-cally optimize the high dimension control parameters of the powered knee prostheses while amputee walking with the devices.

1.2.2 Optimization goal of powered knee prostheses

Clinically, people with unilateral lower limb amputation have been reported with

deterio-rated temporal-spatial gait symmetry, loading symmetry, and gait efficiency[44–49]. Lower

limb amputees tend to rely more on the intact side than prosthetic side, resulting longer

(21)

intact limb caused reduced prosthetic ground reaction force and impulse[46, 47], and

increased the intact limb loading[48]. This asymmetry gait and overuse of intact limb

could potentially be the causes of many secondary issues, such as back pain, intact knee

osteoarthritis, etc[50]. In addition, people with lower limb amputation present higher

energy expenditure for locomotion than able-bodied people[49].

The ‘super’ powered prostheses would be able to restore the gait symmetry and reduce

the energetic cost to normal level. But, currently, the optimization goals of powered knee

prostheses remain an open question for the field. In clinic/laboratory, experts manually and

iteratively tune the control parameters based on observation of the overall gait performance,

inspection of the prosthetic joint measurements, and the verbal feedback from the subjects.

From control point of view, high level gait performance (e.g. gait symmetry, center of mass

movement, metabolic cost) might be too complex and noisy to directly guide the

high-dimension control parameter tuning procedure, because they usually relies on both the

powered knee prosthesis and the amputee user’s intact joints. And currently, almost all

studies of the powered knee prostheses used normative knee kinematics as an indicator of

good performance.

For human-in-the-loop optimization, the optimization goal should be physiological

measurements that could be influenced by the control parameters. The prosthetic knee

kinematics are the most direct and straightforward measurements of performance of the

human-prosthesis system and determined by the human-prosthesis interaction, i.e. the

behavior of the human user and the control parameters of the prosthesis[8]. In addition,

the knee kinematics are more general across subjects’ physical condition compared to

prosthetic joint torque[51], which is quite different from the biological knee joint due to

the mechanical properties of the powered prostheses are different from intact limb[52].

(22)

four distinct features: maximum stance flexion angle, minimum stance extension angle,

maximum swing flexion angle, minimum swing extension angle. Meanwhile, those features

have direct connection with the function of the knee joint:

1. During stance phase, knee flexion at early stance was associated with the energy

absorption, and people increased knee flexion and extensor moments to absorb more

energy when walking speed increases[53]. At late stance, the knee joint re-extended

to generate energy.

2. During swing phase, the knee joint flexes for foot clearance, and then extend to get

ready for next heel strike.

For the first step to optimize the high dimension control parameters of the powered knee prosthesis, we defined the optimization goal as allowing amputee generate nor-mative knee kinematics while walking with the powered knee prosthesis. Further, we would like to investigate the influence of the prosthesis control on the gait symmetry to build up the knowledge for optimization of the gait symmetry.

1.3 Significance

The proposed study is a significant step towards advancing the intelligence of powered

knee prostheses. This study will demonstrate that machine learning methods can learn

complex tuning knowledge through interaction with the human-prosthesis system without

1) prior knowledge of the human tuning procedure, 2) expert/clinician intervention, and 3)

a dynamic model of the powered knee prosthesis. Meanwhile, this study will demonstrate

that the ADP method can tune the high dimensional control parameters of a powered

knee prosthesis for normative kinematics. With those foundations, the ADP method can

(23)

Furthermore, the ADP method could potentially lead to intelligent powered knee prostheses

that could adapt to changes in users’ physical environment. Moreover, the ADP method

could potentially be applied to solve the powered ankle prosthesis tuning problem.

The proposed study could change the clinical practice of prosthesis tuning. Currently,

for powered knee prostheses fitting, clinicians need special training from the prosthesis

company to learn how to change the control parameters. In the proposed study, the ADP

tuning algorithm will fill the knowledge gap between the control parameters and the

ob-servable prosthetic knee kinematics, which are more familiar to clinicians. This would

permit clinicians to focus on improving amputee gait performance through modifying the

target knee kinematics without knowledge of the control mechanism and how to change

the control parameters. This could potentially lower both the financial cost and the time

cost of the current fitting procedures for powered knee prostheses.

The proposed framework could serve as a tool to systematically study human walking

objectives. Little is known about the amputees’ objectives during walking and how the

powered prosthesis can better benefit the amputee user besides (or in addition to) enforcing

normative knee kinematics. With the ADP auto tuner proposed here, researchers could

modulate prosthetic knee kinematics systematically to study the relationship between

prosthetic knee joint kinematics and the human gait performance. Ultimately, knowledge of

the relationship between powered knee kinematics and global gait performance could assist

researchers in helping amputees achieve improved gait performance (e.g. gait symmetry,

(24)

1.4 Objective and Outline

The design and validation of adaptive dynamic programming algorithm was presented

in Chapter 2 to automatically tune the high-dimension impedance control parameters of

powered knee prosthesis to achieve normative knee kinematics with an OpenSim model. I

modified an OpenSim lower limb model to mimic human walking with a powered knee

prosthesis, and designed and implemented the gait phase-based ADP with reinforcement

learning algorithm for prostheses controller. With the simulation platform, I tested the

feasibility of the ADP auto tuning algorithm for prosthesis tuning and study the effects

of different factors, such as the reinforcement signal, on the performance of the ADP

algorithm.

In Chapter 3, we evaluated the performance of the adaptive dynamic programming

algorithm on tuning the impedance control parameters of powered knee prostheses for

individual users. I recruited one trans-femoral amputee subjects and one able-bodied

subject to validate the ADP algorithm. With each subject, I tested the ADP algorithm with 4

different initial impedance parameter conditions. Afterwards, I validated the improvement

of prosthetic knee kinematics by the root-mean-square error to the normative knee

kine-matics, the knowledge of the ADP algorithm by the observation and decision of the tuning

procedure.

In Chapter 4, I investigated the relationship between the prosthetic knee kinematics

and human-prosthesis gait symmetry through reinforcement learning based auto-tuner.

The knee kinematics were supposed to influenced the human-prosthesis gait symmetry.

However, due to human behavior difference and human-prosthesis setup difference, the

influence of the knee kinematics on the human-prosthesis symmetry might be inconsistent.

(25)

prosthetic knee kinematics and gait symmetry. I recruited six subjects (3 unilateral

trans-femoral amputee and 3 able-bodied subjects) to walk with our experimental powered knee

prosthesis on treadmill, during which the reinforcement learning based auto-tuner

ex-plored the control parameters so as to generate varied, but safe prosthetic knee kinematics.

The ground reaction force and knee kinematics were recorded to get gait symmetry index,

impulse, and features of prosthetic knee kinematics. Correlation between knee kinematics

and prosthetic impulses and correlation between the inter-limb impulse measurements

(26)

References

[1] E. C. Martinez-Villalpando and H. Herr, “Agonist-antagonist active knee prosthesis: a preliminary study in level-ground walking,”Journal of rehabilitation research and development, vol. 46, pp. 361–373, 5 2009.

[2] F. Sup, H. A. Varol, J. Mitchell, T. J. Withrow, and M. Goldfarb, “Self-contained powered knee and ankle prosthesis: Initial evaluation on a transfemoral amputee,” in2009 IEEE International Conference on Rehabilitation Robotics, ICORR 2009, pp. 638–644, 2009.

[3] F. Sup, H. A. Varol, and M. Goldfarb, “Upslope walking with a powered knee and ankle prosthesis: Initial results with an amputee subject,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19, pp. 71–78, 2 2011.

[4] B. E. Lawson, H. A. Varol, A. Huff, E. Erdemir, and M. Goldfarb, “Control of stair ascent and descent with a powered transfemoral prosthesis,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 21, pp. 466–473, 5 2013.

[5] E. D. Ledoux, S. Member, and M. Goldfarb, “Control and Evaluation of a Powered Transfemoral Prosthesis for Stair Ascent,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, pp. 917–924, 7 2017.

[6] E. J. Wolf, V. Q. Everding, A. L. Linberg, B. L. Schnall, J. M. Czerniecki, and J. M. Gambel, “Assessment of transfemoral amputees using C-Leg and Power Knee for ascending and

descending inclines and steps,”The Journal of Rehabilitation Research and Develop-ment, vol. 49, p. 831, 8 2012.

[7] M. R. Williams, S. D’Andrea, and H. M. Herr, “Impact on gait biomechanics of using an active variable impedance prosthetic knee,”Journal of NeuroEngineering and Rehabilitation, vol. 13, no. 1, pp. 1–11, 2016.

(27)

[9] L. Ambrozic, M. Gorsic, J. Geeroms, L. Flynn, R. Molino Lova, R. Kamnik, M. Munih, and N. Vitiello, “CYBERLEGs: A user-oriented robotic transfemoral prosthesis with whole-body awareness control,”IEEE Robotics and Automation Magazine, vol. 21, pp. 82–93, 12 2014.

[10] E. J. Rouse, L. M. Mooney, and H. Herr, “Clutchable series-elastic actuator: Implications for prosthetic knee design,”The International Journal of Robotics Research, vol. 33, pp. 1611–1625, 10 2014.

[11] T. Lenzi, M. Cempini, L. Hargrove, and T. Kuiken, “Design, development, and testing of a lightweight hybrid robotic knee prosthesis,”The International Journal of Robotics Research, p. 027836491878599, 2018.

[12] M. Liu, F. Zhang, P. Datseris, and H. Huang, “Improving Finite State Impedance Con-trol of Active-Transfemoral Prosthesis Using Dempster-Shafer Based State Transition Rules,”Journal of Intelligent & Robotic Systems, vol. 76, pp. 461–474, 12 2014.

[13] E. J. Rouse, L. J. Hargrove, E. J. Perreault, and T. a. Kuiken, “Estimation of human ankle impedance during the stance phase of walking,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 22, pp. 870–878, 7 2014.

[14] A. Brandt, M. Liu, and H. Huang, “Does the Impedance of Above-knee Powered Pros-theses Need to Adjusted for Load-carrying Conditions ?,” in38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, (Orlando, FL, USA), 2016.

[15] F. Sup, H. Varol, J. Mitchell, T. Withrow, and M. Goldfarb, “Design and control of an active elctrical knee and ankle prosthesis,”Proceedings of the 2nd Biennial IEEE/ RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Scotts-dale, AZ, USA, pp. 523–528, 2008.

(28)

[17] R. D. Gregg and J. W. Sensinger, “Towards biomimetic virtual constraint control of a powered prosthetic leg,”IEEE Transactions on Control Systems Technology, vol. 22, no. 1, pp. 246–254, 2014.

[18] S. Pfeifer, A. Pagel, R. Riener, and H. Vallery, “Actuator with angle-dependent elasticity for biomimetic transfemoral prostheses,”IEEE/ASME Transactions on Mechatronics, vol. 20, pp. 1384–1394, 6 2015.

[19] T. Lenzi, L. Hargrove, and J. Sensinger, “Speed-adaptation mechanism: Robotic pros-theses can actively regulate joint torque,”IEEE Robotics and Automation Magazine, vol. 21, no. 4, pp. 94–107, 2014.

[20] L. J. Hargrove, A. J. Young, A. M. Simon, N. P. Fey, R. D. Lipschutz, S. B. Finucane, E. G. Halsne, K. A. Ingraham, and T. A. Kuiken, “Intuitive control of a powered prosthetic leg during ambulation: A randomized clinical trial,”JAMA - Journal of the American Medical Association, vol. 313, no. 22, pp. 2244–2252, 2015.

[21] S. Pfeifer, H. Vallery, M. Hardegger, R. Riener, and E. J. Perreault, “Model-based es-timation of knee stiffness,”IEEE Transactions on Biomedical Engineering, vol. 59, pp. 2604–2612, 9 2012.

[22] K. Shamaei, G. S. Sawicki, and A. M. Dollar, “Estimation of Quasi-Stiffness of the Human Knee in the Stance Phase of Walking,”PLoS ONE, vol. 8, p. e59993, 3 2013.

[23] B. J. Misgeld, M. Lüken, R. Riener, and S. Leonhardt, “Observer-Based Human Knee Stiffness Estimation,”IEEE Transactions on Biomedical Engineering, vol. 64, no. 5, pp. 1033–1044, 2017.

[24] A. M. Simon, K. a. Ingraham, N. P. Fey, S. B. Finucane, R. D. Lipschutz, A. J. Young, and L. J. Hargrove, “Configuring a powered knee and ankle prosthesis for transfemoral amputees within five specific ambulation modes,”PLoS ONE, vol. 9, p. e99387, 6 2014.

(29)

[26] J. Zhang, P. Fiers, K. A. Witte, R. W. Jackson, K. L. Poggensee, C. G. Atkeson, and S. H. Collins, “Human-in-the-loop optimization of exoskeleton assistance during walking,”

Science, vol. 356, pp. 1280–1284, 6 2017.

[27] Y. Ding, M. Kim, S. Kuindersma, and C. J. Walsh, “Human-in-the-loop optimization of hip assistance with a soft exosuit during walking,”Science Robotics, vol. 3, p. eaar5438, 2 2018.

[28] R. M. N. Alexander, “Energetics and optimization of human walking and running: The 2000 Raymond Pearl Memorial Lecture,”American Journal of Human Biology, vol. 14, no. 5, pp. 641–648, 2002.

[29] T.-w. P. Huang, K. A. Shorter, P. G. Adamczyk, and A. D. Kuo, “Mechanical and ener-getic consequences of reduced ankle plantar-flexion in human walking,”Journal of Experimental Biology, vol. 218, no. 22, pp. 3541–3550, 2015.

[30] D. P. Bertsekas and J. N. Tsitsiklis,Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.

[31] J. Si, A. G. Barto, W. B. Powell, and D. Wunsch,Handbook of learning and approximate dynamic programming. New Jersey: John Wiley & Sons, 2004.

[32] W. B. Powell,Approximate dynamic programming: solving the curses of dimensionality. New Jersey: John Wiley & Sons, 2nd ed., 2011.

[33] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,”IEEE Transactions on Neural Networks, vol. 8, pp. 997–1007, 9 1997.

[34] M. Riedmiller, “Neural fitted Q iteration - First experiences with a data efficient neural Reinforcement Learning method,” in16th Eur. Conf. Mach. Learn., (Porto, Portugal), pp. 317–328, Springer, 2005.

[35] J. Si and Y.-t. Wang, “On-Line Learning Control by Association and Reinforcement,”

(30)

[36] R. Hafner and M. Riedmiller, “Reinforcement learning in feedback control : Challenges and benchmarks from technical process control,”Machine Learning, vol. 84, pp. 137– 169, 7 2011.

[37] M. Riedmiller, M. Montemerlo, and H. Dahlkamp, “Learning to drive a real car in 20 minutes,” inthe Frontiers in the Convergence of Bioscience and Information Tech-nologies, (Jeju, Korea), pp. 645–650, Springer, 2007.

[38] M. Riedmiller, T. Gabel, R. Hafner, and S. Lange, “Reinforcement learning for robot soccer,”Autonomous Robots, vol. 27, pp. 55–73, 7 2009.

[39] R. Enns and J. Si, “Helicopter Trimming and Tracking Control Using Direct Neural Dynamic Programming,”IEEE transactions on neural networks, vol. 14, pp. 929–939, 7 2003.

[40] R. Enns and J. Si, “Helicopter flight-control reconfiguration for main rotor actuator failures,”Journal of guidance, control, and dynamics, vol. 26, pp. 572–584, 7 2003.

[41] C. Lu, J. Si, and X. Xie, “Direct Heuristic Dynamic Programming for Damping Oscilla-tions in a Large Power System,”IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, pp. 1008–1013, 8 2008.

[42] L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, “Direct heuristic dynamic program-ming for nonlinear tracking control with filtered tracking error,”IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, pp. 1617–1622, 12 2009.

[43] W. Guo, F. Liu, J. Si, D. He, R. Harley, and S. Mei, “Online Supplementary ADP Learning Controller Design and Application to Power System Frequency Control With Large-Scale Wind Energy Integration,”IEEE Transactions on Neural Networks and Learning Systems, vol. 27, pp. 1748 – 1761, 3 2016.

(31)

[45] E. Isakov, O. Keren, and N. Benjuya, “Trans-tibial amputee gait: Time-distance pa-rameters and EMG activity,”Prosthetics and Orthotics International, vol. 24, no. 3, pp. 216–220, 2000.

[46] P. G. Adamczyk and A. D. Kuo, “Mechanisms of Gait Asymmetry Due to Push-Off Defi-ciency in Unilateral Amputees,”IEEE transactions on neural systems and rehabilitation engineering, vol. 23, pp. 776–785, 9 2015.

[47] M. Schaarschmidt, S. W. Lipfert, C. Meier-Gratz, H. C. Scholle, and A. Seyfarth, “Func-tional gait asymmetry of unilateral transfemoral amputees,”Human Movement Sci-ence, vol. 31, no. 4, pp. 907–917, 2012.

[48] L. Nolan and A. Lees, “The functional demands on the intact limb during walking for ac-tive trans-femoral and trans-tibial amputees,”Prosthetics and Orthotics International, vol. 24, no. 2, pp. 117–125, 2000.

[49] H. L. Jarvis, A. N. Bennett, M. Twiste, R. D. Phillip, J. Etherington, and R. Baker, “Tempo-ral Spatial and Metabolic Measures of Walking in Highly Functional Individuals With Lower Limb Amputations,”Archives of Physical Medicine and Rehabilitation, vol. 98, no. 7, pp. 1389–1399, 2017.

[50] R. Gailey, “Review of secondary physical conditions associated with lower-limb ampu-tation and long-term prosthesis use,”Journal of Rehabilitation Research and Develop-ment, vol. 45, no. 1, pp. 15–30, 2008.

[51] M. P. Kadaba, H. K. H. Ramakrishnan, and M. E. M. Wootten, “Measurement of lower extremity kinematics during level walking,”Journal of Orthopaedic Research, vol. 8, pp. 383–392, 5 1990.

[52] F. Sup, A. Bohara, and M. Goldfarb, “Design and Control of a Powered Transfemoral Prosthesis,”The International Journal of Robotics Research, vol. 27, pp. 263–273, 2 2008.

(32)

CHAPTER 2:

A New Powered Lower Limb Prosthesis Control Framework Based on

Adap-tive Dynamic Programming

Yue Wen, Jennie Si, Xiang Gao, Stephanie Huang, Helen Huang

As published in IEEE Transaction on Neural Network and Learning System

2.1 Abstract

This study presents a novel application of adaptive dynamic programming (ADP) for

opti-mal adaptive control of powered lower limb prostheses, a type of wearable robots to assist

the motor function of limb amputees. Current control of these robotic devices typically

relies on finite state impedance control (FS-IC), which lacks adaptability to the user’s

phys-ical condition. As a result, joint impedance settings are often customized manually and

heuristically in clinics, which greatly hinder the wide use of these advanced medical devices.

This simulation study aimed at demonstrating the feasibility of ADP for automatic tuning

of the twelve knee joint impedance parameters during a complete gait cycle to achieve

balanced walking. Given that accurate models of human walking dynamics are difficult

to obtain, the model-free ADP control algorithms were considered. First, direct heuristic

dynamic programming (dHDP) was applied to the control problem and its performance

was evaluated on OpenSim, an often-used dynamic walking simulator. For the comparison

purposes, we selected another established ADP algorithm, the neural fitted Q with

con-tinuous action (NFQCA). In both cases, ADP controllers learned to control the right knee

joint and achieved balanced walking, but dHDP outperformed NFQCA in this application

(33)

2.2 Introduction

Adaptive dynamic programming (ADP) is a class of learning- and approximation-based

methods for optimal adaptive control of nonlinear dynamic systems. It combines ideas

from diverse fields including reinforcement learning and machine learning, optimal

con-trol, adaptive concon-trol, and function approximation. In recent years, it has emerged as a

powerful alternative to classic optimal control, especially for complex nonlinear systems,

due to its demonstrated scalability for large systems and flexibility and capability of

learn-ing from data measurements[1, 2]. Additionally, an important class of ADP algorithms

can be implemented without a fully identified system dynamics model. This attribute is

especially useful in applications such as the human-prosthesis control problem studied in

this paper. Human body dynamics vary from person to person; even for a single person,

a full, accurate mathematical description of human walking dynamics is impossible to

acquire. This presents a great challenge and also an exciting opportunity for the ADP or

controls community at large.

Most ADP controls reported in literature were tested on standard benchmark problems

such as single or double inverted pendulums, significantly simplified linear flight models,

and simplified robot arm models. Real demonstrations of ADP controls applied to complex

systems were only available in a handful of cases[3–7]. Furthermore, only a subset of these

cases are capable of model-free learning, namely least square policy iteration (LSPI), neural

fitted Q iteration, neural fitted Q with continuous action (NFQCA), and direct heuristic

dynamic programming (dHDP) and its variants. Among those model-free ADP algorithms,

dHDP and NFQCA stand out because they have been applied to multiple large and complex

dynamic system control problems[3, 4, 6, 8–10]. Thus, they are natural candidates to be

(34)

the feasibility of dHDP using simulation studies, it is noteworthy that learning performance

guarantees such as boundedness, stability, and convergence results have recently provided

important theoretical insight on model-free ADP such as dHDP[3, 11–14].

ADP has been used in bipedal robot controls and simulations[15], but our powered

prosthetic leg control (i.e. wearable robot control) is fundamentally different. First, a bipedal

robot is mechanical in nature and its model can be obtained in a mathematical form. Its

control can be completely manipulated by the algorithm designer. On the other hand,

obtaining a model for human-prosthesis control is nearly impossible, and the designer

can only manipulate one or two joints in a prosthesis. All other joints are dominated by

the human wearer. Second, the control law for bipedal robots is not restricted by complex

biomechanics that inherently govern human walking. The human-prosthesis system has

to cope with co-adaptation influenced by human motor control principles.

Figure 2.1Human-prosthesis FS-IC control platform facilitated by OpenSim. One ADP controller is designed for each phase (m=1, 2, 3, 4) of the FSM to provide respective impedance parameter values[refer to (2.1)]for the OpenSim during each gait cyclen.

(35)

great promise to augment human movement in able-bodied persons or individuals with

physical disabilities[16], optimal adaptive control of these advanced devices remains a

challenge. Yet, it is much needed[17]. Variations across or within human subjects call for

adaptive control to personalize wearable devices. Since these assistive devices are to enable

or restore motor functions of the users, optimal performance is desired. Given its adaptive

and optimal control nature, ADP algorithms are among the most promising approaches to

this class of challenging problems. As such, this study is the first to address these challenges

and to demonstrate the potential of ADP in personalizing control parameters adaptively

and optimally. The results may transform wearable robot control to better support the

motor function in lower limb amputees.

In powered lower limb prosthetics, finite state impedance control (FS-IC) is the most

utilized approach to intrinsic prosthesis control, which is based on mechanical

measure-ments in the prosthesis[18]. Automatic control parameter tuning, however, is not available

in commercial devices, nor are existing ideas satisfactory. An untested concept is to use

direct estimation of joint impedance via biomechanical measurements and models of the

unimpaired leg in able-bodied subjects during walking[19, 20]. Another method is to define

prosthesis control by finding the correlation between joint control torque or impedance

and intrinsic measurements (e.g. center of pressure[21], prosthetic joint motion[22]) in a

specific gait phase. Similar to the previous approach, the correlation has only been defined

by data collected from able-bodied persons. Whether or not the correlation still holds in

lower limb amputees is unclear. A somewhat related idea to what we are proposing here is

the cyber expert system[17]that mimics human expert decisions to tune the impedance

control parameters and maximize the amputees’ performance. This approach, however,

depends heavily on the knowledge and experience of the human experts involved. Putting

(36)

users of powered lower limb prosthesis.

2.3 The human-prostheses control platform

Our prosthesis control platform (Fig. 2.1, left) consists of 1) a simulated

human-prosthesis model implemented by the freeware simulation package OpenSim[23]and 2)

a FS-IC realized in MATLAB. The ADP controller, or the action network output (Fig. 2.1,

right), is used to update of the impedance controller coefficients as in (2.6) to achieve

adaptive control of the powered prosthetic knee while optimizing the gait performance of

the human-prosthesis system.

A simulator, namely OpenSim, of human bipedal walking with a prosthesis was used in

this feasibility study in place of an amputee wearing a prosthetic knee to generate walking

patterns. This simulator has been validated in previous studies[24]and well accepted in

the field of biomechanics. This simulation model was composed of a rigid level platform

for the ground and five rigid-body segments for the human body. The rigid-body segments

were linked using one degree-of-freedom pin joints. The pelvis segment was linked to

the ground platform using a free joint, which allows for free motion between the body

and the ground. Model settings such as segment length, mass, and inertial parameters

were provided in the lower-limb OpenSim model[23]. To simulate unilateral above-knee

amputee walking, we applied a fixed set of prescribed motions for the left and right hip

joints and the left knee joint to generate intact, human-controlled joint motion according

to a well-established, normative data set[25]. The right knee was assigned as the prosthetic

knee and was controlled by FS-IC and ADP controller.

The FS-IC determines the driving torque of the controlled prosthetic knee joint. Four

(37)

Figure 2.2Red line: target knee profile. Blue line: simulated knee trajectory. Vertical lines: transi-tions between gait phases, as determined by the FS-IC transition rules.

left): stance flexion as Phase 1, stance extension as Phase 2, swing flexion as Phase 3, and

swing extension as Phase 4. The phase transitions were defined based on gait events (i.e.

heel contact and toe-off ) and knee motions[26, 27]and were detected according to vertical

ground reaction force (GRF) of both legs (Fz_L,Fz_R), prosthetic knee angle (θ), and angular velocity (ω). In each phase (m=1, 2, 3, 4) of the FSM, the impedance controller was given a set of three impedance parameters during the nth gait,

Im,n= (Km,n,Bm,n,θem,n) (2.1)

including stiffness (K), damping (B), and equilibrium position (θe). Consequently, the prosthetic knee torque (T) for OpenSim was generated according to

(38)

For each FSM phase m, there was an ADP controller (Fig. 2.1, right), which produced

the outputUm,n for each gait cycle n used to update the impedance parameterIm,nin (2.1). Therefore, twelve impedance parameters (In: 4×3) were used to simulate a gait in OpenSim where the prosthetic knee (i.e. the right knee) was controlled by FS-IC with impedance

settingIm,n.

To assess the feasibility of ADP for prosthetic control, during each gait cycle n for

each phase m, we set the target prosthesis knee performance asP_mt andD_mt representing target peak knee angle (in degree) and time duration (in second), respectively (Fig. 2.2).

Correspondingly for the controlled knee performance obtained from OpenSim, we used

Pm,n andDm,n to denote the peak knee angle and time duration, respectively.

Let∆Pm,n and∆Dm,ndenote the peak value error and duration time error, respectively:

∆Pm,n=Pm,n−Pmt, ∆Dm,n=Dm,n−D_mt.

(2.3)

Let∆P_m0_,_n and∆D_m0 _,_ndenote the change rate of∆Pm,nand∆Dm,n, respectively

∆P_m0_,_n=Pm,n−Pm,n−1,

∆D_m0_,_n=Dm,n−Dm,n−1.

(2.4)

We then define the four state variables usingXm,n∈R4for themth ADP control block [corresponding to Phasemin the FSM in Fig. 2.1 (left)], during thenth gait cycle

Xm,n= (∆Pm,n,∆Dm,n,∆P 0 m,n,∆D

0

m,n). (2.5)

(39)

2.4 ADP based Impedance Control

Each ADP control block corresponded with one of the four FSM phases (m =1, 2, 3, 4). When dHDP[28]was employed, each ADP block had one action neural network (ANN) and

a critic neural network (CNN). The state variables wereXm,n, and the output of an ANN wasUm,n, which was used in the update of the impedance parameterIm,n as follows for each FSM phasemand gait cyclen:

Im,(n+1)=Im,n+βm,n∗Um,n, (2.6)

whereβm_,_nis a 3_×1 controller gain, as discussed below; the symbol_∗denotes component-wise multiplication. Refer to Fig. 2.1, each of the four ADP controllers was designed using

the same principle of solving for optimal controlUm,nas an output from the action network, from minimizing the critic approximation errorec

m,n:

e_mc_,_n=αJm,n−[Jm,n−1−rm,n], (2.7)

where Jm,nis the total cost-to-go,rm,n is the instantaneous cost for FSM phasem and gait cyclen, andαis the discount factor for the infinite-horizon problem (0< α <1).

To computeUm,n in (2.6) from the ADP controller, we need to train the critic and action networks as follows. The CNN weights were updated in order to minimize the squared error

of the critic approximation error, i.e.

E_mc_,_n=1

2(e c m,n)

(40)

The ANN weights were updated in order to minimize the error between the desired

ultimate objectiveuc and the approximated total cost-to-goJm,n, i.e.

E_ma_,_n=1

2(Jm,n−uc)

2_. _(2.9)

In our implementation, as in[28], the discount factorαin (2.7) was selected as 0.95, and

uc was set to 0, respectively.

The CNN and ANN weight training was carried out through backpropagation to

mini-mize the errors in (2.8) and (2.9) using the same procedures and learning rates as in[28].

Next, we describe the instantaneous costrm,n. LetPl

m andP

u

m denote the lower and upper bound of peak value error, respectively.D l m

andDu

m denote the lower and upper bound of duration time error, respectively (Table 2.1).

Table 2.1Upper bound and lower bound of peak error and duration error

Parameter Phase 1 Phase 2 Phase 3 Phase 4

Pl m,P

u m

(degrees) 1, 10.5 1, 7.5 1, 9 1, 6

Dl m,D

u m

(seconds) 0.01, 0.05 0.01, 0.05 0.01, 0.05 0.01, 0.05

1) The instantaneous costrm,n associated with failure is considered as:

rm,n=−1, i f |∆Pm,n|>P_mu o r |∆Dm,n|>D_mu. (2.10)

2) Else, ifPl

m<|∆Pm,n|<P u m orD

l

m <|∆Dm,n|<D u

(41)

penalty scoreS−

m,n:

rm,n= _r

s, ifSm−,n>4

0, else

(2.11)

where the penalty score is

S_m−_,_n=S_mP−_,_n+S_mD_,−_n, (2.12)

andrs is a design parameter. In our results evaluation, we used differentrs values,−0.4, −0.6, and₋0.8, withrs =−0.8 providing noticeable improvement in learning.

In (2.12), the peak value penalty scoreSP−

m,n and duration time penalty scoreS D−

m,nwere

computed according to,

S_mP−_,_n=S_mP−_,_n+2 i f ∆Pm,n∗ n X

i=n−5

∆P_m0_,_i>0

S_mD_,−_n=S_mD_,−_n+4 i f ∆Dm,n∗ n X

i=n−5

∆D_m0 _,_i>0.

(2.13)

In addition,SP−

m,n was set to 0 if∆Pm,n∗ Pn

i=n−5∆P

0

m,i≤0 andS D−

m,n was set to 0 if∆Dm,n∗ Pn

i=n−5∆D

0 m,i≤0.

Note that key to the above considerations described by (2.11)-(2.13) was that an

inter-mediate reinforcementrs was assigned if learning resulted in increased absolute peak value error and/or absolute duration time error measured every 5 gait steps.

(42)

reward scoreSP+

m,n and the duration time reward scoreS D+

m,n. Specifically,

S_mP+_,_n=S_mP+_,_n+2 i f ∆Pm,n∗ n X

i=n−5

∆P_m0_,_i<0

S_mD_,+_n=S_mD_,+_n+4 i f ∆Dm,n∗ n X

i=n−5

∆D_m0 _,_i<0.

(2.14)

In addtion,SP+

m,n was set to 0 if∆Pm,n∗ Pn

i=n−5∆P

0

m,i≥0 andS D+

m,nwas set to 0 if∆Dm,n∗ Pn

i=n−5∆D

0 m,i≥0.

Then define the reward score as

S_m+_,_n=S_mP+_,_n+S_mD_,+_n. (2.15)

The control gainβm_,_n in (2.6) was then determined according to the follow two

condi-tions.

Condition 1: S_m+_,_n>2, and either the peak value error or duration time error was within the respective lower and upper bounds.

Condition 2:Both the peak value error and duration time error were below the respective lower bound, and the gait cycle numbern was greater than 50.

If Condition 1 holds, then

βm,n+1=1.2βm,n. (2.16)

If Condition 2 holds, then

βm,n+1=

1

(43)

andβm_,_n remained unchanged for all other conditions. Alsoβm_,_n was no smaller than

0.5βm_,0. Note that key to the above considerations described by (2.14)-(2.17) was that if

learning resulted in decreased absolute peak value error and/or absolute duration time

error measured every 5 gait steps, we amplified the control gain in (2.16) to make adaptation

go faster. But when both peak error and duration error were within the lower bounds, we

toned down the control gain (2.17) as we approached a stabilized controller.

In Table 2.1, the upper and lower bounds that define the acceptable range of system

states were selected based on realistic conditions of balanced walking without stumbling

or falling. Specifically, we used the upper bounds of 1.5 standard deviations above the knee

kinematic peak values observed in each phase[29]. These choices were not the tipping point

of instability but rather were quite conservative. The lower bounds were selected based on

experimental observations associated with normative walking patterns. For the penalty

score and reward score, since the duration time change slowly, we assigned a value of 4 in

(2.13) and (2.14), twice as much as the peak error change to cause a sub-reinforcement as

in (2.11) to take effect.

2.5 Implementation

In our simulation studies, a gait cycle (n) was a basic time unit for ADP to make an update to the impedance values in (2.6) or (2.18) for dHDP and NFQCA, respectively. Controller

was designed to reduce the peak value error and duration time error given in (2.3) as shown

in Fig. 2.2 for each gait cycle. In our simulations, we prescribed the target profile of the right

knee through tuning OpenSim to reach desired natural walking patterns. As such, we were