Secure Systems Design

In the design of secure systems, several key design features must be

incorporated to address typical system vulnerabilities: security protocol design, password management design, access control, addressing distributed system issues, concurrency control, fault tolerance, and failure recovery. Appendix E describes security functions that are typically incorporated in secure systems. This is not meant to be an exhaustive list, but rather to provide illustrative examples. The following sections discuss two significant design issues with security implications, which are not directly related to security functionality.

4.3.1 Timing and Concurrency Issues in Distributed Systems

As noted by Anderson, in large distributed systems (i.e., systems of systems), scale-up problems related to security are not linear because there may be a large change in complexity. A systems engineer may not have total control or awareness over all systems that make up a distributed system. This is particularly true when dealing with concurrency, fault tolerance, and recovery. Problems in these areas are magnified when dealing with large distributed systems.

Controlling the concurrency of processes (whereby two or more processes execute simultaneously) presents a security issue in the form of potential for denial of service by an attacker who intentionally exploits the system’s concurrency problems to interfere with or lock up processes that run on behalf of other principals. Concurrency design issues may exist at any level of the system, from hardware to application. Some examples of and best practices for dealing with specific concurrency problems, includes—

u_{Processes Using Old Data (e.g., out of date credentials, cookies):}

Propagating security state changes is a way to address this problem.

u_{Conflicting Resource Updates: Locking to prevent inconsistent updates}

(resulting from two programs simultaneously updating the same resource) is a way to address this.

u_{Order of Update in Transaction-Oriented Systems and Databases: Order of}

arrival and update needs to be considered in transaction-oriented system designs.

u_{System Deadlock, in which concurrent processes or systems are waiting for}

each other to act (often one process is waiting for another to release resources): This is a complex issue, especially in dealing with lock hierarchies across multiple systems. However, note that there are four necessary conditions, known as the Coffman conditions (first identified by

Software Security Assurance State-of-the-Art Report (SOAR)

6 Context.6 Context.

Section 4 Secure Systems Engineering

E.G. Coffman in 1971)[72] that must be present for a deadlock to occur— mutual exclusion, hold and wait, no preemption, and circular wait.

u Nonconvergence in Transaction-Oriented Systems: Transaction-based

systems rely on the ACID (atomic, consistent, isolated, and durable) properties of transactions (e.g., the accounting books must balance). Convergence is a state in transaction systems; when the volume of transactions subsides, there will be a consistent state in the system. In practice, when nonconvergence is observed, recovery from failures must be addressed by the systems design.

u Inconsistent or Inaccurate Time Across the System: Clock synchronization

protocols, such as the Network Time Protocol or Lamport’s logical locks, can be run to address this issue.

The above list is merely illustrative. A number of other concurrency issues can arise in software-intensive systems.

4.3.2 Fault Tolerance and Failure Recovery

In spite of all efforts to secure a system, failures may occur because of physical disasters or from security failures. Achieving system resilience through failure recovery and fault tolerance is an important part of a system engineer’s job,

especially as it relates to recovery from malicious attacks. Fault tolerance and failure recovery make denial of service attacks more difficult and thus less attractive.

As noted by B. Selic [73], dealing with faults involves error detection, damage confinement, error recovery, and fault treatment. Error detection detects that something in the system has failed. Damage confinement isolates the failure. Error recovery removes the effects of the error by restoring the system to a valid state. Fault treatment involves identifying and removing the root cause of the defect.

Failure models of the types of attacks that can be anticipated need to be developed by the systems engineer. Resilience can then be achieved through fail-stop processors and redundancy to protect the integrity of the data on a system and constrain the failure rates.

A fail-stop processor automatically halts in response to any internal failure and before the effects of that failure become visible. [74]

The systems engineer typically applies a combination of the following to achieve redundancy at multiple levels.

u_{Redundancy at the hardware level, through multiple processors,}

mirrored disks, multiple server farms, or redundant arrays of independent disks (RAID).

u_{At the next level up, process redundancy allows software to be run}

simultaneously on multiple geographically distributed locations, with voting on results. It can prevent attacks where the attacker gets physical control of a machine, inserts unauthorized software, or alters data.

Software Security Assurance State-of-the-Art Report (SOAR) _6 Context.7 Section 4 Secure Systems Engineering

u At the next level is systems backup to unalterable media at regular

intervals. For transaction-based systems, transaction journaling can also be performed.

u At the application level, the fallback system is typically a less capable

system that can be used if the main system is compromised or unavailable. Note that while redundancy can improve the speed of recovery from a security incident, none of the techniques described above provide protection against attack or malicious code insertion.

For Further Reading

“IEEE Computer Society Technical Committee on Dependable Computing and Fault-Tolerance” and “IFIP WG 10.4 on Dependable Computing and Fault Tolerance” [portal page]

Available from: http://www.dependability.org

Christ Inacio, (CMU SEI), Software Fault Tolerance, (Pittsburgh, PA: CMU SEI, Spring, 1998). Available from: http://www.ece.cmu.edu/~koopman/des_s99/sw_fault_tolerance/

CMU SEI, A Conceptual Framework for System Fault Tolerance, (Gaithersburg, MD: NIST, March 30,1995). Available from: http://hissa.nist.gov/chissa/SEI_Framework/framework_1.html

In document Security (Page 93-95)