Dynamic programming - Problem in Hybrid Electric Vehicles

Problem in Hybrid Electric Vehicles

4.4 Dynamic programming

Dynamic programming (DP) [48], [49], [50] is a numerical method to solve problems in which a sequence of interrelated decisions have to be taken [51]. This approach yields a functional equation for which a solution can be found by using a digital computer [39].

DP is the only optimal control method capable of providing the optimal solution to problems of any complexity level within the accuracy limitations imposed by the discretization of problem variables [1], e.g., controls and states. The resolution of the vector of possible solution candidates 𝑈𝑘 and states 𝑋𝑘 come from a compromise between the computational burden of the calculations and the accuracy of the results [34].

When implementing DP, the optimal solution is found proceeding backwards, i.e.: starting from the final step, the sequence of controls which minimizes the sum of the costs from the current state to the end of the optimization horizon is found at each step [39].

Note that the above statement implies that in order to select the first control action, the backward solution of the entire problem needs to be found [34],

therefore, in the context of HEVs, the entire driving cycle must be known a priori, making DP a non-causal EMS.

Despite not being real-time implementable, DP yields the best available approximation of the optimal control policy for a certain HEV allowing to determine its maximum capabilities. Hence, the results obtained can be useful for:

• Optimizing the design of HEVs [98], [99].

• Designing of rule-based EMSs [100], [57], [101].

• Generating benchmark solutions for real-time implementable EMSs [35], [102], [31], [52], [53].

As stated before, one of the main practical limitations to the implementation of DP is the computational burden involved, which increases linearly with the final time and exponentially with the dimension of the state vector. This is referred to in literature as the curse of dimensionality [39].

In chapter 5, DP will be used to find the global optimal solution to the energy management of the HEV powertrain architecture described in chapter 2. Therefore, a study case won’t be presented here. The reader is referred to [1] for an introductory example regarding the use of this technique for HEV control.

The theoretical aspects discussed in the next paragraphs are meant to set the basis for understanding the work presented in the following chapter.

4.4.1 The principle of optimality

The method of DP is based on Bellman’s principle of optimality [48]. According to this intuitively appealing concept:

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

This implies that from any step on a discretized optimal trajectory, the remaining trajectory is optimal for the corresponding problem initiated at that step (and corresponding system states) [1], [103].

The mentioned principal will be described in mathematical terms in the next few lines.

Consider a discretized performance index for the optimal control problem starting at the initial state 𝑥₀:

Ψ0,𝑁 = 𝐿𝑁(𝑥𝑁) + ∑ 𝐿𝑘(𝑥𝑘, 𝑢𝑘) 𝑁−1

𝑘=1

(4-7)

where,

𝑘 indicates the time step 𝑘. 𝑁 indicates the final time step.

The corresponding optimal control trajectory is 𝑢∗ _{= {𝑢} 1 ∗_{, 𝑢}

2 ∗_{, … , 𝑢}

𝑁−1∗ }. Note that the instantaneous cost function 𝐿_𝑘(𝑥_𝑘), called arc cost in the context of DP, is equivalent to the integrand of the continuous formulation of the performance index presented in Eq. (4-1). Hence, 𝐿𝑁(𝑥𝑁) is the terminal cost which depends on the final state.

Let’s also consider the performance index for the tail subproblem starting at time step 𝑗, i.e., the last part of the overall problem:

Ψ_𝑗,𝑁 = 𝐿_𝑁(𝑥_𝑁) + ∑ 𝐿_𝑘(𝑥_𝑘, 𝑢_𝑘) 𝑁−1

𝑘=𝑗

(4-8)

The implication of Bellman’s principle of optimality is that the last part of the overall optimal control trajectory {𝑢_𝑗∗_{, 𝑢}

𝑗+1∗ , … , 𝑢𝑁−1∗ } is the optimal solution of the tail subproblem. The analytical proof, which is based on the induction principal, can be found in [51].

4.4.2 A recurrence relation of DP

In order to show how the principle of optimality can be used to determine the solution to an optimal control problem, a recurrence relation is derived here. This will allow to illustrate the practical implementation of DP. The complete derivation can be found in [39].

The starting point is to consider a discretized version of the state dynamics described in Eq. (4-1):

𝑥_𝑘+1 = 𝑥_𝑘+ 𝑓_𝑘(𝑥𝑘, 𝑢𝑘) = 𝑓𝑑(𝑥𝑘, 𝑢𝑘) (4-9) Based on the discretized performance index defined in Eq. (4-7), the cost of reaching the final state is:

Ψ_𝑁,𝑁(𝑥_𝑁) = 𝐿_𝑁(𝑥_𝑁) (4-10) Let’s now consider a one-stage process starting at the initial state 𝑥_𝑁−1:

Ψ𝑁−1,𝑁(𝑥𝑁−1, 𝑢𝑁−1) = 𝐿𝑁−1(𝑥𝑁−1, 𝑢𝑁−1) + Ψ𝑁,𝑁(𝑥𝑁) (4-11) In the previous expression, the cost of driving the system from state 𝑥_𝑁−1 to 𝑥_𝑁 depends only on the state and control decision at the initial time step of this one- stage process since the final state can be expressed as a function of those variables through the state equation:

Ψ_{𝑁−1,𝑁}(𝑥_𝑁−1, 𝑢_𝑁−1)

= 𝐿_𝑁−1(𝑥𝑁−1, 𝑢𝑁−1) + Ψ_𝑁,𝑁(𝑓_𝑑(𝑥_𝑁−1, 𝑢_𝑁−1))

(4-12)

Now it is possible to define the optimal cost as: Ψ_{𝑁−1,𝑁}∗ (𝑥𝑁−1)

= 𝑎𝑟𝑔 𝑚𝑖𝑛_𝑢_𝑁−1_∈𝑈_𝑁−1(𝐿_𝑁−1(𝑥_𝑁−1, 𝑢_𝑁−1) + Ψ_𝑁,𝑁(𝑓_𝑑(𝑥𝑁−1, 𝑢𝑁−1)))

(4-13)

Note that only control candidates within the set of admissible controls are used when the minimization is performed [39].

Next, the optimal cost of a two-stage process will be derived. The cost of transitioning from state 𝑥_𝑁−2 to 𝑥_𝑁 can be described as:

Ψ_{𝑁−2,𝑁}(𝑥_𝑁−2, 𝑢_𝑁−2, 𝑢_𝑁−1)

= 𝐿_𝑁−2(𝑥𝑁−2, 𝑢𝑁−2) + 𝐿_𝑁−1(𝑥𝑁−1, 𝑢𝑁−1) + Ψ_𝑁,𝑁(𝑓_𝑑(𝑥_𝑁−1, 𝑢_𝑁−1))

It can be appreciated that the last two terms on the right side of Eq. (4-14) correspond to Ψ_{𝑁−1,𝑁}(𝑥_𝑁−1, 𝑢_𝑁−1) from Eq. (4-12), hence:

Ψ𝑁−2,𝑁(𝑥𝑁−2, 𝑢𝑁−2, 𝑢𝑁−1)

= 𝐿_𝑁−2(𝑥𝑁−2, 𝑢𝑁−2) + Ψ_{𝑁−1,𝑁}(𝑥𝑁−1, 𝑢𝑁−1)

(4-15)

Applying the principle of optimality discussed in section 4.4.1 to this two-stage process implies that, for any initial state and control (𝑥𝑁−2 and 𝑢𝑁−2), the remaining decision must be optimal with respect to the system state resulting from the application of the control action 𝑢𝑁−2. Moreover, the state equation allows to express this resulting state 𝑥_𝑁−1 in terms of 𝑥_𝑁−2 and 𝑢_𝑁−2. Based on these considerations, it is possible to rewrite Eq. (4-15) as:

Ψ_{𝑁−2,𝑁}∗ (𝑥_𝑁−2)

= 𝑎𝑟𝑔 𝑚𝑖𝑛_𝑢_𝑁−2_∈𝑈_𝑁−2(𝐿_𝑁−2(𝑥𝑁−2, 𝑢𝑁−2) + Ψ_{𝑁−1,𝑁}∗ (𝑓𝑑(𝑥𝑁−2, 𝑢𝑁−2)))

(4-16)

Finally, for a 𝑗-stage process, the total cost to drive the system from a certain state 𝑥𝑁−𝑗 to the final state 𝑥𝑁, also called cost-to-go in the context of DP, can be expressed as:

Ψ_{𝑁−𝑗,𝑁}∗ (𝑥𝑁−𝑗)

= 𝑎𝑟𝑔 𝑚𝑖𝑛_𝑢_𝑁−𝑗_∈𝑈_𝑁−𝑗(𝐿_𝑁−𝑗(𝑥_𝑁−𝑗, 𝑢_𝑁−𝑗) + Ψ_{𝑁−(𝑗−1),𝑁}∗ (𝑓_𝑑(𝑥_𝑁−𝑗, 𝑢_𝑁−𝑗)))

(4-17)

Eq. (4-17) is the recurrence relation yield by the DP approach. This expression allows to appreciate how by proceeding backwards (starting at the final step) it is possible to find the optimal solution to the control problem at hand using model- based techniques.

One final consideration to be made here is that the application of a given control may drive the system into a state which does not exactly correspond to one of the discretized values of the state vector 𝑋𝑘. If this happens, the computation of the cost-to-go for each of the state grid values, which is necessary to find the solution as seen in the recurrence relation of Eq. (4-17), is done through interpolation [1].

Moreover, since only the range of admissible states is considered, the states resulting from the optimal control trajectory will never exceed the prescribed boundaries [34].

In document Integration of dual-clutch transmissions in hybrid electric vehicle powertrains (Page 144-149)