Surviving the Improbable: Towards Resilient Aircraft Control
1.2 History of Flight Control Systems, Source: [40]
1.2.7 Short Case Study of Other Fault Tolerant Systems, Source:
[24]
Many fault-tolerant control systems have been produced and used successfully for other aerospace applications. The following is a brief survey of a few of these other systems with a discussion of the requirements they satisfy and the design approach that was used. The systems described were selected based on the availability of information and the personal experience of the author of ref. [24]. These are believed
to be representative of the many excellent systems in use. Table 1.1 is a summary of the systems surveyed and captures the primary attributes of these systems.
F-16 Analog Fly-by-Wire Flight Control [1]
Fig. 1.10 Belgian Air Component F-16AM FA-126, cDirk Voortmans, via Airliners.net
Early production F-16A/B aircraft used an ana-log electronic FBW flight control system. From Block 25 F-16C/D onward, a digital system has been used. The F-16 is an inherent unstable aircraft that requires continuous stability aug-mentation. In case of problems with the flight control system, the F-16 aircraft can fail catas-trophically. The system was designed to deal with two failures. The analog FBW used a quad-redundant N-fold Modular Redundancy (NMR) computer architecture with approximate con-sensus Middle Value Selection (MVS) electron-ics to determine which computers’ signals are
transmitted to the flight control actuators. The hydraulic actuators include voting to reject possible faulty outputs from any computer MVS or its servo amplifier. Both the computer MVS electronics and the hydraulic actuators make use of fault down logic to disengage a known, faulty signal. The analog computers use MVS on the sensor inputs to provide the same inputs to the redundant computers. Analog control integrators, the only state data involved, are held in agreement between the redun-dant channels by means of cross-connecting signals. The design uses neither design diversity (identical hardware) nor software.
Fig. 1.11 AFTI/F-16, source: NASA Multimedia Gallery
F-16 Digital Fly-by-Wire Flight Control [10]
Experience with a triplex digital system on the AFTI/F-16 gave General Dynamics the confi-dence to abandon the proven analog FBW sys-tem of the earlier Fighting Falcon and adopt the quadruplex digital FBW system for the Block 25 and beyond F-16C/D. This choice resulted in capability and integration advantages with other aircraft systems, e.g. displays via 1553 buses.
The quad-redundant analog NMR computers used in earlier production F-16A/Bs were replaced by quad-redundant digital computers. These digital computers also include simple analog backups in each computer to protect against generic hard-ware or softhard-ware design error failures. Digital data exchange is used between com-puters for various reasons, namely to mechanize computer output voting, to ensure identical inputs, to keep the computers synchronized, and to maintain consistent state data.
Table 1.1 Survey of typical in-service fault-tolerant systems, source: ref. [24] and software, 5th channel backup using same hardware but dissimilar software, identical inputs by data bus monitoring, computer outputs compared for crew annunciation only, computer selection by external voters (hydraulic voting ac-tuators, pyro fire electronic discrete vot-ing), exchange and vote of some state data
Two separate units, one for pilot and one for copilot displays, each unit uses 3 sets of selfchecking dual processors, Arinc-659 Safebus to distribute identi-cal inputs, select output from a healthy pair, exchange state data, identical hard-ware and softhard-ware in all processing pairs pairs must send same critical actuation signals
TMR 3 identical COTS hardware and software channels, RMS provides same inputs by exchange and MVS, voting of outputs and some state data, dual actua-tion, transient fault recovery
NMR 4 identical hardware and soft-ware channels, identical inputs by ex-change and voting, voting of outputs transient fault and state data recovery, any 2 FCCs can control single fault tol-erant actuation.
Pratt and Whitney
PW2037 Electronic Engine Control [29]
Fig. 1.12 Pratt & Whitney PW2037, source: Pratt & Whitney
The PW2037 was the first production commer-cial jet engine to use a Full-Authority Digital Electronic Control (FADEC) system with no mechanical backup control. It was introduced on the Boeing 757 civil airliner and remains representative of state of the art commercial engine controls. Because all commercial trans-port aircraft have at least two engines, loss of thrust from one engine is not catastrophic. An engine control malfunction leading to a poten-tially catastrophic engine overspeed is mitigated by mechanical overspeed protection. Because
of this, electronic engine controls are capable of meeting FAA safety requirements using a dual standby system. In the worst case scenario, an engine control failure not detected by BIT (Built-In-Test) will trip the overspeed protection, resulting in the shutdown and loss of thrust from one engine only. Also this set-up does not rely on hardware design diversity. The risk of a common design error affecting both channels of one engine or all engines on the aircraft is addressed through exhaustive testing.
Boeing 777 Airplane Information Management Systems (AIMS) [18]
The B-777 AIMS system is used to command all cockpit displays and to interact with the crew via keyboards to provide flight management functions. Total loss of cockpit displays, a system loss of function, is potentially hazardous, particularly in adverse weather, but is not by itself a catastrophic event. A malfunction resulting in erroneous display information to the crew is possibly a greater hazard, which is mit-igated somewhat by requiring that pilot and copilot displays are driven by different sources, allowing the crew to detect faulty display data by proper cross-checking.
In addition to requiring fault tolerance for safety, airline operators of transport air-craft desire systems that can be operated safely with known failures until repairs can be made without interruption to revenue-generating aircraft service. For this purpose, the so-called Minimum Equipment List (MEL) has been defined, which is specific for every aircraft and type of operation, and approved by the appropriate au-thority. The AIMS is required to fail operationally only after two failures and must provide very robust protection against malfunctions that would produce erroneous crew displays. AIMS uses a triple, self-checking pair architecture. The complete system actually consists of two separate triple self-checking units in separate cabi-nets, separately driving the pilot’s and copilot’s displays. This allows the flight crew to manually compare displays. The AIMS uses the same hardware and software in both systems and in all self-checking pairs, so they do not provide dissimilarity for protection against a generic software error. A unique type of backplane bus, the
Arinc-659 ‘Safebus’, is used to mechanize switchover between the redundant self-checking pairs and to provide a robust method for transferring state data between the processor pairs. Switchover to backup occurs when the backup processor pair detects that the primary processor pair has failed to transmit its data on the Safebus.
US Space Shuttle FBW Flight Control [25]
Together with the McDonnell Douglas F/A-18 Hornet, the Space Shuttle was one of the first digital FBW flight control systems and remains a representative exam-ple of today’s systems. The Space Shuttle is a very demanding control problem throughout an extensive flight envelope, requiring a single system that provides un-interrupted control of a space launch vehicle, control of an orbiting spacecraft, and both space and atmospheric flight control during the return to Earth. The shuttle uses a four-channel NMR approach, with a fifth computer used as a backup system.
Fig. 1.13 Space Shuttle, source:
NASA Multimedia Gallery The fifth computer uses no hardware design
di-versity compared to the other four, but is pro-grammed with dissimilar software. The fifth channel can be engaged manually by the crew in case the primary system fails, but this has never been necessary during the hundred or so Shuttle flights to date. The Shuttle operates the four pri-mary computers as a redundant set, providing them with identical input data by monitoring the same data buses and holding the comput-ers in close synchronization. The computcomput-ers are programmed with the same software and should produce the same outputs. No attempt is made by the computers to select the correct output, but instead, these redundant outputs are trans-mitted to external voting devices. On one hand, these external voters include voting hydraulic actuators for control surfaces and thrust vector control. On the other hand, there are electronic discrete command voters that control
pyrotech-nic ignition of the Shuttles engines and the separation of the solid rockets and the external tank. The redundant computers do exchange and compare outputs in order to alert the crew if a computer is producing a different output from the others. The crew may then choose to remove power from a faulty computer to configure the system to operate following additional failures. In fact, this is a manual fault down.
Boeing Inertial Upper Stage (IUS) Guidance and Control System [12]
The IUS is an example of a typical high-value unmanned space launch vehi-cle guidance and control system. This IUS has been used to launch the space-craft Ulysses, Galileo and Magellan in the right orbit for interplanetary missions
Fig. 1.14 Boeing Iner-tial Upper Stage (IUS), source: Boeing Multi-media Gallery after they have been brought to space in the cargo bay of the
Space Shuttle. Space launch vehicles must provide a high level of reliability to be economical and must not malfunc-tion in a manner that endangers human safety or property. In the event of a malfunction, ground crews can monitor the ve-hicle and command destruction thanks to the incorporation of a vehicle self-destruct system and range safety systems.
The control system for the IUS uses four processors con-figured as a dual self-checking pair. The switchover from the primary processor pair to the backup pair will occur if there is disagreement between the processor pairs. A form of electronic voting is used for critical pyrotechnic signals, requiring both processor pairs produce the same command to these actuators.
X-33 Reusable Launch Vehicle Control System [11]
Fig. 1.15 X-33 Reusable Launch Vehicle, source:
NASA Multimedia Gallery The X-33 program was a technology demonstrator for
the next generation of single stage to orbit reusable launch vehicles. This prototype was unmanned. Thus, a control system failure would have primarily economic consequences. A TMR (Triple Modular Redundancy) fault-tolerant computer with dual standby actuation was selected to guarantee a high probability of successfully completing a series of sub-orbital test flights. The sys-tem used commercial-off-the-shelf (COTS) computers with custom Redundancy Management System (RMS) hardware and software to form the TMR fault-tolerant computer. It was planned to expand from TMR to quad NMR and to increase the level of actuation redundancy for the manned, operational system, for which even higher safety requirements would be imposed, however
budget cuts and technical troubles have led to the cancellation of these plans. The TMR computers used MVS to vote outputs, maintain identical inputs, and to main-tain consistent state data. Voting was selectively applied to some, but not to all data, to minimize the data exchange and voting required. The TMR computers were de-signed in order to fault down to a self-checking pair after one persistent failure. The system was designed to recover the use of a computer that had experienced a tran-sient fault. The COTS computers and the software that runs on them are identical:
no dissimilarity was used to protect from generic design errors.
X-38 Prototype Crew Return Vehicle (CRV) Control System [2]
The X-38 program was an unmanned technology demonstrator for a re-entry vehi-cle that would be used for emergency return from the International Space Station.
Fig. 1.16 X-38 Prototype Crew Return Vehicle, source: NASA Multimedia Gallery
However, budget cuts have led to the cancellation of this development program after a few unmanned demonstra-tor test flights. The demonstration system was required to operate following any two Flight Control Computer (FCC) failures and following any one non-computer failure. A four channel NMR FCC with dual standby actuation was selected to meet these requirements. Sen-sors and actuators were connected to the FCCs such that any two operating FCCs can control the vehicle. The FCCs were COTS computers and were interconnected by special network element hardware and fault tolerant systems serviced software to form a Fault Tolerant
Par-allel Processor (FTPP). The FTPP was designed to provide resilience to Byzantine failures. A Byzantine fault is an arbitrary fault that occurs during the execution of an algorithm by a distributed system. It encompasses those faults that are commonly referred to as ‘crash failures’ and ‘send and omission failures’. When a Byzantine failure has occurred, the system may respond in any unpredictable way, unless it is designed to have Byzantine fault tolerance. These arbitrary failures may be loosely divided into three categories, namely a failure to take another step in the algorithm (crash failure), a failure to correctly execute a step of the algorithm, and arbitrary execution of a step other than the one indicated by the algorithm. The FTPP was also designed to discriminate between transient and permanent faults, allowing re-covery of an FCC that had a transient fault. The COTS computers and the software that ran on them were identical, no dissimilarity was used to protect from generic design errors.