Unintentional Internal Security Consequences Vulnerability Scanner Incidents
6. ICS Security Controls
6.8 Contingency Planning
Contingency plans are designed to maintain or restore business operations, including computer
operations, possibly at an alternate location, in the event of emergencies, system failures, or disaster. The security controls that fall within the NIST SP 800-53 Contingency Planning (CP) family provide policies and procedures to implement a contingency plan by specifying roles and responsibilities, assigning personnel and activities associated with restoring the information system after a disruption or failure. Along with planning, controls also exist for contingency training, testing, and plan update, and for backup information processing and storage sites.
Supplemental guidance for the CP controls can be found in the following documents: NIST SP 800-12 provides guidance on security policies and procedures [37]. NIST SP 800-34 provides guidance on contingency planning [50].
NIST SP 800-100 provides guidance on information security governance and planning [25]. ICS Specific Recommendations and Guidance
Contingency plans should cover the full range of failures or problems that could be caused by cyber incidents. Contingency plans should include procedures for restoring systems from known valid backups, separating systems from all non-essential interferences and connections that could permit cyber security intrusions, and alternatives to achieve necessary interfaces and coordination. Employees should be trained and familiar with the contents of the contingency plans. Contingency plans should be periodically reviewed with employees responsible for restoration of the ICS, and tested to ensure that they continue to meet their objectives. Organizations also have business continuity plans and disaster recovery plans that are closely related to contingency plans. Because business continuity and disaster recovery plans are particularly important for ICS, they are described in more detail in the sections to follow.
6.8.1 Business Continuity Planning
Business continuity planning addresses the overall issue of maintaining or reestablishing production in the case of an interruption. These interruptions may take the form of a natural disaster (e.g., hurricane, tornado, earthquake, flood), an unintentional man-made event (e.g., accidental equipment damage, fire or explosion, operator error), an intentional man-made event (e.g., attack by bomb, firearm or vandalism, attacker or virus), or an equipment failure. From a potential outage perspective, this may involve typical time spans of days, weeks, or months to recover from a natural disaster, or minutes or hours to recover from a malware infection or a mechanical/electrical failure. Because there is often a separate discipline that deals with reliability and electrical/mechanical maintenance, some organizations choose to define business continuity in a way that excludes these sources of failure. Because business continuity also deals primarily with the long-term implications of production outages, some organizations also choose to place a minimum interruption limit on the risks to be considered. For the purposes of ICS cyber security, it is recommended that neither of these constraints be made. Long-term outages (disaster recovery) and short- term outages (operational recovery) should both be considered. Because some of these potential
interruptions involve man-made events, it is also important to work collaboratively with the physical security organization to understand the relative risks of these events and the physical security countermeasures that are in place to prevent them. It is also important for the physical security
organization to understand which areas of a production site house data acquisition and control systems that might have higher-level risks.
Before creating a business continuity plan (BCP) to deal with potential outages, it is important to specify the recovery objectives for the various systems and subsystems involved based on typical business needs. There are two distinct types of objectives: system recovery and data recovery. System recovery involves the recovery of communication links and processing capabilities, and it is usually specified in terms of a Recovery Time Objective (RTO). This is defined as the time required to recover the required
communication links and processing capabilities. Data recovery involves the recovery of data describing production or product conditions in the past and is usually specified in terms of a Recovery Point
Objective (RPO). This is defined as the longest period of time for which an absence of data can be tolerated.
Once the recovery objectives are defined, a list of potential interruptions should be created and the recovery procedure developed and described. For most of the smaller scale interruptions, repair and replace activities based on a critical spares inventory will prove adequate to meet the recovery objectives. When this is not true, contingency plans need to be developed. Due to the potential cost and importance of these contingency plans, they should be reviewed with the managers responsible for business
continuity planning to verify that they are justified. Once the recovery procedures are documented, a schedule should be developed to test part or all of the recovery procedures. Particular attention must be paid to the verification of backups of system configuration data and product or production data. Not only should these be tested when they are produced, but the procedures followed for their storage should also be reviewed periodically to verify that the backups are kept in environmental conditions that will not render them unusable and that they are kept in a secure location, so they can be quickly obtained by authorized individuals when needed.
6.8.2 Disaster Recovery Planning
ICS Specific Recommendations and Guidance
A disaster recovery plan (DRP) is essential to continued availability of the ICS. The DRP should include the following items:
Required response to events or conditions of varying duration and severity that would activate the recovery plan
Procedures for operating the ICS in manual mode with all external electronic connections severed until secure conditions can be restored
Roles and responsibilities of responders
Processes and procedures for the backup and secure storage of information Complete and up-to-date logical network diagram
Personnel list for authorized physical and cyber access to the ICS
Communication procedure and list of personnel to contact in the case of an emergency including ICS vendors, network administrators, ICS support personnel, etc.
Current configuration information for all components
emergency. If possible, replacements for hard-to-obtain critical components should be kept in inventory.
The security plan should define a comprehensive backup and restore policy. In formulating this policy, the following should be considered:
The speed at which data or the system must be restored. This requirement may justify the need for a redundant system, spare offline computer, or valid file system backups.
The frequency at which critical data and configurations are changing. This will dictate the frequency and completeness of backups.
The safe onsite and offsite storage of full and incremental backups
The safe storage of installation media, license keys, and configuration information
Identification of individuals responsible for performing, testing, storing, and restoring backups