Problem in Hybrid Electric Vehicles
4.4 Dynamic programming
Dynamic programming (DP) [48], [49], [50] is a numerical method to solve problems in which a sequence of interrelated decisions have to be taken [51]. This approach yields a functional equation for which a solution can be found by using a digital computer [39].
DP is the only optimal control method capable of providing the optimal solution to problems of any complexity level within the accuracy limitations imposed by the discretization of problem variables [1], e.g., controls and states. The resolution of the vector of possible solution candidates ππ and states ππ come from a compromise between the computational burden of the calculations and the accuracy of the results [34].
When implementing DP, the optimal solution is found proceeding backwards, i.e.: starting from the final step, the sequence of controls which minimizes the sum of the costs from the current state to the end of the optimization horizon is found at each step [39].
Note that the above statement implies that in order to select the first control action, the backward solution of the entire problem needs to be found [34],
therefore, in the context of HEVs, the entire driving cycle must be known a priori, making DP a non-causal EMS.
Despite not being real-time implementable, DP yields the best available approximation of the optimal control policy for a certain HEV allowing to determine its maximum capabilities. Hence, the results obtained can be useful for:
β’ Optimizing the design of HEVs [98], [99].
β’ Designing of rule-based EMSs [100], [57], [101].
β’ Generating benchmark solutions for real-time implementable EMSs [35], [102], [31], [52], [53].
As stated before, one of the main practical limitations to the implementation of DP is the computational burden involved, which increases linearly with the final time and exponentially with the dimension of the state vector. This is referred to in literature as the curse of dimensionality [39].
In chapter 5, DP will be used to find the global optimal solution to the energy management of the HEV powertrain architecture described in chapter 2. Therefore, a study case wonβt be presented here. The reader is referred to [1] for an introductory example regarding the use of this technique for HEV control.
The theoretical aspects discussed in the next paragraphs are meant to set the basis for understanding the work presented in the following chapter.
4.4.1 The principle of optimality
The method of DP is based on Bellmanβs principle of optimality [48]. According to this intuitively appealing concept:
An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
This implies that from any step on a discretized optimal trajectory, the remaining trajectory is optimal for the corresponding problem initiated at that step (and corresponding system states) [1], [103].
The mentioned principal will be described in mathematical terms in the next few lines.
Consider a discretized performance index for the optimal control problem starting at the initial state π₯0:
Ξ¨0,π = πΏπ(π₯π) + β πΏπ(π₯π, π’π) πβ1
π=1
(4-7)
where,
π indicates the time step π. π indicates the final time step.
The corresponding optimal control trajectory is π’β = {π’ 1 β, π’
2 β, β¦ , π’
πβ1β }. Note that the instantaneous cost function πΏπ(π₯π), called arc cost in the context of DP, is equivalent to the integrand of the continuous formulation of the performance index presented in Eq. (4-1). Hence, πΏπ(π₯π) is the terminal cost which depends on the final state.
Letβs also consider the performance index for the tail subproblem starting at time step π, i.e., the last part of the overall problem:
Ξ¨π,π = πΏπ(π₯π) + β πΏπ(π₯π, π’π) πβ1
π=π
(4-8)
The implication of Bellmanβs principle of optimality is that the last part of the overall optimal control trajectory {π’πβ, π’
π+1β , β¦ , π’πβ1β } is the optimal solution of the tail subproblem. The analytical proof, which is based on the induction principal, can be found in [51].
4.4.2 A recurrence relation of DP
In order to show how the principle of optimality can be used to determine the solution to an optimal control problem, a recurrence relation is derived here. This will allow to illustrate the practical implementation of DP. The complete derivation can be found in [39].
The starting point is to consider a discretized version of the state dynamics described in Eq. (4-1):
π₯π+1 = π₯π+ ππ(π₯π, π’π) = ππ(π₯π, π’π) (4-9) Based on the discretized performance index defined in Eq. (4-7), the cost of reaching the final state is:
Ξ¨π,π(π₯π) = πΏπ(π₯π) (4-10) Letβs now consider a one-stage process starting at the initial state π₯πβ1:
Ξ¨πβ1,π(π₯πβ1, π’πβ1) = πΏπβ1(π₯πβ1, π’πβ1) + Ξ¨π,π(π₯π) (4-11) In the previous expression, the cost of driving the system from state π₯πβ1 to π₯π depends only on the state and control decision at the initial time step of this one- stage process since the final state can be expressed as a function of those variables through the state equation:
Ξ¨πβ1,π(π₯πβ1, π’πβ1)
= πΏπβ1(π₯πβ1, π’πβ1) + Ξ¨π,π(ππ(π₯πβ1, π’πβ1))
(4-12)
Now it is possible to define the optimal cost as: Ξ¨πβ1,πβ (π₯πβ1)
= πππ πππ π’πβ1βππβ1(πΏπβ1(π₯πβ1, π’πβ1) + Ξ¨π,π(ππ(π₯πβ1, π’πβ1)))
(4-13)
Note that only control candidates within the set of admissible controls are used when the minimization is performed [39].
Next, the optimal cost of a two-stage process will be derived. The cost of transitioning from state π₯πβ2 to π₯π can be described as:
Ξ¨πβ2,π(π₯πβ2, π’πβ2, π’πβ1)
= πΏπβ2(π₯πβ2, π’πβ2) + πΏπβ1(π₯πβ1, π’πβ1) + Ξ¨π,π(ππ(π₯πβ1, π’πβ1))
It can be appreciated that the last two terms on the right side of Eq. (4-14) correspond to Ξ¨πβ1,π(π₯πβ1, π’πβ1) from Eq. (4-12), hence:
Ξ¨πβ2,π(π₯πβ2, π’πβ2, π’πβ1)
= πΏπβ2(π₯πβ2, π’πβ2) + Ξ¨πβ1,π(π₯πβ1, π’πβ1)
(4-15)
Applying the principle of optimality discussed in section 4.4.1 to this two-stage process implies that, for any initial state and control (π₯πβ2 and π’πβ2), the remaining decision must be optimal with respect to the system state resulting from the application of the control action π’πβ2. Moreover, the state equation allows to express this resulting state π₯πβ1 in terms of π₯πβ2 and π’πβ2. Based on these considerations, it is possible to rewrite Eq. (4-15) as:
Ξ¨πβ2,πβ (π₯πβ2)
= πππ πππ π’πβ2βππβ2(πΏπβ2(π₯πβ2, π’πβ2) + Ξ¨πβ1,πβ (ππ(π₯πβ2, π’πβ2)))
(4-16)
Finally, for a π-stage process, the total cost to drive the system from a certain state π₯πβπ to the final state π₯π, also called cost-to-go in the context of DP, can be expressed as:
Ξ¨πβπ,πβ (π₯πβπ)
= πππ πππ π’πβπβππβπ(πΏπβπ(π₯πβπ, π’πβπ) + Ξ¨πβ(πβ1),πβ (ππ(π₯πβπ, π’πβπ)))
(4-17)
Eq. (4-17) is the recurrence relation yield by the DP approach. This expression allows to appreciate how by proceeding backwards (starting at the final step) it is possible to find the optimal solution to the control problem at hand using model- based techniques.
One final consideration to be made here is that the application of a given control may drive the system into a state which does not exactly correspond to one of the discretized values of the state vector ππ. If this happens, the computation of the cost-to-go for each of the state grid values, which is necessary to find the solution as seen in the recurrence relation of Eq. (4-17), is done through interpolation [1].
Moreover, since only the range of admissible states is considered, the states resulting from the optimal control trajectory will never exceed the prescribed boundaries [34].