2.9 Simulation Results of the Q-Learning Based Wide-Area Control Algorithm
2.9.4 Simulation with Nonlinear Detailed 30-Bus Model
In this subsection, similar simulations are performed on a detailed IEEE 30-bus model as presented in figure 2.19. The performance of the proposed wide-area controller is compared with the behavior of CPSSs (without using the proposed controller). In order to make the paper condensed, only a three-phase self-clean fault is conducted between bus #4 and bus #6. The total simulation time is 15s and the fault happens at the 2s during the steady state. After 3 cycles, the fault is removed by opening the two circuit breakers located at both ends of the transmission line. The frequency responses of using the proposed wide- area controller and the CPSSs are shown in the figure 2.17(a) and figure 2.17(b), respectively. As can be observed, the proposed Q-learning controller can achieve the wide- area oscillation damping within 4s, while CPSSs have to take more than 8s to damp out the oscillations. Actually, the tuning process of CPSSs is full of difficulties, and the low- frequency oscillation is a common phenomenon when system operating condition changes. One major reason is that the CPSSs only use the local measurements. Therefore, WAMS has created a great opportunity to better manage WAPSs.
44 (a)
(b)
Figure 2.27 Frequency responses (𝑓) under system disturbances, (a). The proposed Q- learning networked controller; (b). CPSS.
Above simulation results demonstrate the effectiveness of the proposed wide-area damping control algorithm. Usually, RL method requires significant time to obtain the desired knowledge, and the learning from scratch is difficult as shown in our previous paper [34]. In this work, once the initial knowledge is obtained, the incremental online learning afterward will be converged much faster. So the learning speed is not a problem since the utility can afford the cost of capable infrastructure. Besides, even though the design is linear-based, the obtained performance is not compromised. Practical control applications prefer such simple yet effective algorithms.
45
SHIPBOARD POWER SYSTEM CONTROL
3.1 Problem Description
A shipboard power system (SPS) consists of various loads for propulsion, service, warfighting and so on. Loads, such as electromagnetic guns, electromagnetic launch systems, and electron lasers, consume a large amount of power within a short period of time are called pulsed power loads (PPLs) [35]. Due to economic considerations, the capacity of an SPS is usually decided to mainly satisfy the needs of non-PPLs that are online most of the time, instead of meeting the possible peak demand of all loads [36]. Besides capacity limit, it is almost impossible to increase the generation of conventional synchronous generators (SGs) instantaneously during the deployment of PPLs. Large voltage sag in medium voltage DC (MVDC) SPS and large frequency droop in medium voltage AC (MVAC) SPS will occur when a large PPL is directly connected to an SPS. Consequently, not only sensitive loads might be tripped offline, but also the system-wide instability of SPS might occur [37].
In order to accommodate the high transient energy demand of PPLs, energy storage system (ESS) such as supercapacitor or flywheel is usually installed [38]. Once fully charged, ESS is disconnected from the SPS to supply PPLs alone so that the negative strikes during PPLs deployment are isolated. Thus, the problem of PPLs accommodation can be converted to charging control of ESS. Many issues such as large voltage and frequency disturbances, and even system instability possibly result from the improperly designed charging controller of ESS [39]. In this paper, the supercapacitor-type of ESS is adopted due to its advantages such as simple maintenance, high reliability, and high power density [40]. The concept can be extended to other types of ESS, but the controller might need to be redesigned if its properties are significantly different.
So far, there are not many solutions for PPL accommodation. In [41] and [42], authors present a limit-based control and a profile-based control algorithm, respectively. Both algorithms have been tested through hardware experiments with an MVDC testbed. These two algorithms generate charging current references based on the offline analysis. Since interactions between generations and charging controls as well as possible changes to
46
operating conditions are not considered, there is still room for performance improvement. To realize fast and smooth charging, it is desirable to design closed-loop control algorithms to better coordinate the generation and charging controls under dynamic operating conditions.
In [50], a proportional integral (PI) based algorithm and a feedback linearization (FL) based algorithm are presented. These algorithms are easy to implement and their effectiveness is demonstrated through both software simulations and power hardware-in-the-loop (PHIL) simulations with detailed SPS model. However, the performance of PI control is limited, whereas the FL is sensitive to parameter accuracy. To further improve the system performance, an adaptive critic design (ACD) based control algorithm is developed in [51]. Through interactive learnings of two neural networks (NNs) for cost-to-go function and optimal control, respectively, the near-optimal control can be realized. However, the ACD based algorithm requires a separately determined control references and the known system dynamics. Moreover, an initial admissible control policy is needed to maintain the system stability during NN learning. Finally, the system disturbance is simply considered as being bounded and the optimality of control is not rigorously proved.
To overcome the problems with existing solutions, both problem formulation and control design should be improved. The accurate model of SPS is very complicated, and the outermost control designs do not need to use very accurate model if the inner control loop is reasonably fast and accurate. But the balance between effectiveness and simplicity of SPS model has to be sufficiently considered, and the effectiveness of simplified model must be well tested. In addition, there are not many solutions to handle the disturbance in the existing control designs [50-51]. In general, using a static bound in control model to counteract the impact of disturbance results in a large ultimate bound [51]. Meanwhile, it is also very difficult to dynamically approximate the disturbance especially when system models are partially unknown. In this paper, the frequency dynamic is augmented with a damping term to better approximate the real-world conditions. The damping coefficient is treated as a disturbance that fluctuates throughout time. Such formulation can not only significantly improve the validness of the model, but can also maintain system simplicity. A zero-sum game (ZSG) online-learning optimal NN control design is presented in this paper for the uncertain nonlinear system under disturbances. Two players of the game are
47
controller and disturbance, respectively. The controller aims at approximating the bound of disturbance based on system output, and then learning the optimal control under the disturbance through reinforcement learning. Due to its universal approximation capability [18], NNs are used to estimate the unknown system dynamics, approximate the solution of Hamilton-Jacobi-Isaac (HJI) equation (i.e. the optimal cost function for two-player ZSG), and learn the optimal control. Since the control objective is to optimize a predefined performance index, there is no need to separately provide the control references. Because ZSG is utilized to dynamically approximate the upper bound of disturbances, the impact of disturbances can be well addressed. The optimality of an MVAC SPS is guaranteed by using a time-based neuro dynamic programming (NDP) scheme. The usage of NN identifier avoids the requirement for knowledge of system model. Lastly, the controller does not require initial control being admissible and can be introduced at any initial state.