Stochastic Dynamic Programming (SDP)

Literature Review

2.3 EMS Techniques

2.3.3 Stochastic Dynamic Programming (SDP)

Although DDP produces the optimal solution to a specific problem known a priori, there are a number of limitations when it comes to using it on board a vehicle for real-time control.

In order to overcome these issues, Lin et al. [47] propose the use of Stochastic Dynamic Programming (SDP). SDP is a similar technique to DDP, but rather than examining a single drive-cycle in the time domain, SDP works by finding the optimal solution based on the state of the vehicle, and the probability of transitioning to another state. Using SDP allows multiple drive-cycles, or even real-world data, to be examined concurrently by combining the data using a Markov Chain. SDP produces a causal solution which is entirely time-invariant and therefore suitable for direct implementation on board a vehicle. It must be noted, however, that SDP is often even more computationally expensive than DDP, and for an individual drive-cycle, the solution will almost certainly be less effective than the (optimal) DDP solution.

Lin et al. [47] propose that the driver power demand can be modelled as a finite sequence of discrete-time values, see Equation 2.3.6 [47].

Pdem ∈ {P_dem¹ , P_dem² , ..., P_dem^K } (2.3.6) It is then assumed that the driver power demand in the next time step, P_dem^k+1 can be predicted using a two-dimensional Markov Chain depending on both the wheel speed, ω^kw, and the driver power demand, Pdem^k , in the current time step. See Equation 2.3.7 [47]. This allows the driver power demand to be modelled using a number of transitional probabilities.

These are obtained by Lin et al. by analysis of standard drive cycles; however, other authors such as Moura et al. [64] and Zhang et al. [63] have used real-world logged data such as that obtained from the National Household Travel Survey [101].

p_il,j = P P_dem^k+1 = P_dem^j |P_dem^k = P_demⁱ , ω_w^k = ω_w^l i, j = 1, 2, ..., NP, l = 1, 2, ..., Nω

(2.3.7) This stochastic model of the drive-cycle allows the generation of a state transition ma-trix, which represents the probability of the vehicle transitioning to each future state based on its current state. The following states can then be calculated from the probabilities for that state and so on. The cost of each of these transitions can also be calculated, and dy-namic programming is then used to find the optimal control action to perform in each state in order to minimise the cost function considering the likelihood of all future vehicle state transitions and their associated costs. In order to produce a meaningful result, the solution must converge. This can be achieved by;

1. Discounting each future state in an infinite horizon problem [30,31, 47,51, 96].

2. Considering only a finite number of states by estimating the typical journey length [63].

3. Including an absorbing terminal state with no on-going cost accumulation [26, 52, 53, 64]

2.3.3.1 Infinite Horizon Markov Decision Process (MDP)

The earliest and by far the most common technique [30, 31, 47, 51, 96] to employ SDP is by defining an infinite horizon problem. The objective is to find the optimal control policy, u = π^∗(S), so as to minimise the total expected cost, Jπ(S₀), over an infinite horizon, see

In this equation, Γ is the instantaneous cost incurred and λ is a discount factor between 0 and 1 that allows for the infinite horizon problem to converge as the time step, t, increases.

It can be seen in the equation that the cost, Jπ(S₀), and hence the resultant control policy, π(S₀), is purely related to the initial state of the vehicle, S0, and is completely independent of any other variables including time. This means the solution is causal and time-invariant and therefore it is trivial to implement the solution on board the vehicle.

Lin et al. [47] test their SDP derived EMS over a number of standard and random drive-cycles and find that this approach offers a more robust power management strategy that outperforms previous work [45, 46] using an optimised rule-based strategy based on the DDP solution.

This technique is also used by Schell et al. [30] who describe the design of the Daimler-Chrysler Town and Country “Natrium” FCHEV and the development of its control strategy.

A traditional rule-based strategy using battery SoC management is used as a baseline for the SDP algorithm. Simulation results show a possible 15km (2-3%) increase in range using the SDP controller. In 2006, Lin et al. [31] describe the use of their SDP algorithm in order to optimise the fuel consumption of a FCHEV. Following on from the work in [30, 47], the SDP algorithm is show to improve the fuel consumption of a medium sized Sport Utility Vehicle (SUV) on a range of different drive-cycles. This SDP result is also shown to reduce fuel cell voltage fluctuation which may increase the reliability of the fuel cell stack.

2.3.3.2 Finite Horizon MDP using Commuting Time Estimation

One of the downsides to the infinite horizon algorithm is choosing an effective discount fac-tor which is representative of real-world driving. A small discount facfac-tor tends to optimise more effectively for shorter journeys than for longer ones and will tend to over penalise SoC deviation mid-cycle if the solution is required to be charge-sustaining [52]. In order to overcome this issue, Zhang et al. [63] suggest including a “Commuting Time Distribu-tion” in the calculation. Using historic data concerning the drivers previous total journey times, the problem can be solved as a finite horizon MDP, negating the requirement for a discount factor. Zhang [63] uses simulation to show that this technique is able to produce an 11.6% improvement in fuel economy over the rule-based controller used on board the Toyota Prius.

This technique effectively sets the drive-cycle length to optimise over, but the solution can be subject to a similar downfall to an infinite horizon solution with too small a discount factor. This is because it may not consider what would be the optimal solution if the drive-cycle were to carry for longer than expected. Using a finite horizon solver may also tend to promote Charge-Depleting (CD) behaviour if the horizon is not long enough, which is undesirable if a Charge-Sustaining (CS) strategy is required.

2.3.3.3 Terminal State MDP

A more advanced method to eliminate the discount factor is to include an absorbing “ter-minal state” with no on-going cost. Given an infinite horizon, the probability of being

“absorbed” at some point by the terminal state becomes 1. Because the terminal state has no on-going cost, the solution will converge as this happens. This technique has been used by a number of authors, such as Tate et al. [52], Opila et al. [26, 53] and Moura et al. [64].

Moura et al. mention that this terminal state allows for more accurate representation of drive-cycle length, when compared to an infinite horizon, which is critically important for plug-in HEVs with a CD strategy.

The definition of the Markov chain is slightly complicated by the addition of the termi-nal state, Se, see Equations 2.3.10 to 2.3.13 [64].

p_il,j =P P_dem^k+1 = P_dem^j |P_dem^k = P_demⁱ , ω_w = ω_w^l

It can be seen that simultaneous with the probabilities of transitions between states (Equation 2.3.10), there is also the chance of transitioning to a terminal state when the vehicle’s speed is 0 (Equation 2.3.11). The vehicle will then remain in this state indefinitely (Equation 2.3.12). The probability of a transition to the terminal state allows for the cost function to converge as K goes to infinity. This means that a discount factor is no longer required when calculating the infinite horizon cost, Jπ(S₀), see Equation 2.3.14.

J_π(S₀) = lim

As well as more accurately representing drive-cycle length, the terminal state SDP tech-nique allows for costs based on the final state of the vehicle. For example, the cost function could be designed to be CS by allow penalising a difference between the initial and final states only. This means that the SoC may be allowed to fluctuate throughout the cycle in a comparable manner to an energy balance “rule-based” controller (see Equation 2.3.1).

2.3.3.4 SDP Summary

In conclusion, SDP overcomes the main disadvantages of DDP and produces a solution which can be directly used on board the vehicle. This is because the solution to the SDP problem is entirely causal, and time-invariant. It is also guaranteed to be the optimal so-lution to the given problem and can account for a large quantity of training data without much increase in computational effort. However, SDP is very dependent of the quality of the training data and its accuracy with regard to the future use of the vehicle. It is highly computationally intensive and therefore optimisation is required to be performed offline, with only the solution, π^∗(S₀), stored in the real-time controller. This means that it won’t take into account changes in the duty cycle of the vehicle. There is also limited research available as to the real-world implementation and effectiveness of SDP controllers outside of simulation. This is of concern because often the simulation models of SDP based solu-tions are often heavily simplified in order to reduce the computational burden.

In document Optimal energy management strategy for a fuel cell hybrid electric vehicle (Page 79-82)