Applying These Incident Management Concepts and Processes

The basic principles put forth in this report are that incident management processes are dis-tributed in nature and should

• be enterprise driven

• have defined roles and responsibilities to ensure accountability

• have defined interfaces and communication channels with supporting policies and proce-dures for coordination across processes and process actors

• be integrated into other business and security management processes

The incident management processes described in this report can be used as a framework to help an organization meet the principles listed above.

This framework requires a best practice process model for incident management. That is what we have begun to develop with our new work. This report documents the initial details of that model. To develop this best practice incident management process model, we identified the processes, as described before, outlined each process via a workflow diagram, and provided the details and requirements of each process in a corresponding workflow description table.

In the following sections (3 and 4), we describe our incident management processes model in detail. The majority of the work is presented through figures called workflow diagrams that map the flow of incident management actions. These workflow diagrams and their supporting workflow descriptions and handoffs are included in Section 4, “Incident Management Proc-ess Workflows and Descriptions.” An example of a workflow is show in Figure 6. A readable version of the figure can be found in Section 4.2.3.4, “D: Detect Events Workflow Diagram.”

Figure 6 Example of an Incident Management Workflow Diagram

28 CMU/SEI-2004-TR-015

Most of the workflow diagrams have supporting information and explanations included in a multipage document called a workflow description. This workflow description includes in-formation on the mission, triggers, completion criteria, inputs, outputs, subprocess require-ments, associated written procedures to follow when performing the subprocess tasks, key people who could perform the tasks, technology used to perform the tasks, and any other general requirements. Figure 7 provides an example of a workflow description. The readable version of the figure can be found in Section 4.2.3.5, “D: Detect Events Workflow Descrip-tion.”

Figure 7 Example of an Incident Management Workflow Description

Mapping incident management processes helps an organization understand all the activities that are occurring and how they relate to and depend on one another. It also allows missing activities or activities with inherent weaknesses to be identified and targeted for improve-ment.

Incident management process maps are valuable for depicting these key items [Sharp 01]:

• roles – the actors or performers who participate in the process

• responsibilities – the tasks for which each actor is responsible

• routes – the workflows and decisions connecting tasks together and defining the path an individual work item takes through the process

Using the processes as a guide,²¹ an organization can map its own processes to create a work-flow diagram that details its own current or “as-is” state. The fields in the workwork-flow descrip-tions can be interpreted as quesdescrip-tions for the organization to answer, such as “What is the

21 It would be difficult to use the processes as presented here for this guidance, but future work products will be developed to help organizations apply these methodologies.

CMU/SEI-2004-TR-015 29

sion of the process?” and “Who performs the process?” The workflow descriptions included in our report list a fairly inclusive set of potential answers to each question. Therefore there are no real roles and responsibilities assigned to any one group in the workflow descriptions included here. When members of an organization would do such mappings themselves, they would streamline these descriptions to only include their particular arrangement or structure (i.e., the answers to the questions for their organization). When they were done mapping their current state, they would have a list of the processes and who was performing them. Account-ability is very important to effective incident management.

Using this organization-specific information, the process workflow for an organization will look different from our generic workflow shown above in Figure 6. It will show the work-flow or routes of the work and who is responsible for performing the work. This type of dia-gram is called a “swimlane” diadia-gram. More information about swimlane diadia-grams can be found in Section 3, “Overview of Process Mapping.”

To see an example of this mapping process and the resulting swimlane diagram, do the fol-lowing:

1. Look at the Detect Events workflow diagram on page 98 and its corresponding work-flow description on page 100.

2. An organization’s documentation of its own as-is process for Detect might result in the workflow description shown in Table 2. Notice that instead of the multitude of key peo-ple or technologies that are listed in the generic Detect Events workflow diagram, this table only includes the actors performing each subprocess and the technologies currently being used to perform the process for this example organization.

3. When we actually map this process, knowing the responsible actors, we get the swim-lane diagram depicted in Figure 8, which shows the workflow and associated actor.²² Once the as-is state is documented, and depending on the goal or outcome you are looking for in your process mapping work, you can benchmark your workflows against our best prac-tice incident management process model to determine gaps and weaknesses. This is a tradi-tional gap analysis. During this process, you would look for characteristics of the processes such as

• missing or poorly defined handoffs

• missing or poorly defined aspects of each process activity (e.g., no procedures in place or inadequate staff)

• bottlenecks in the process

• poorly defined activity flows (e.g., too much parallelism, too linear, too many handoffs)

22 Note that we only show the workflow description for Detect, but the swimlane diagram also in-cludes the Triage and Respond processes. There would be a similar workflow description for Tri-age and Respond that would have been done, before the swimlane diagram was completed as shown here.

30 CMU/SEI-2004-TR-015

• single points of failure

You can also identify process improvements and create a new “to-be” or desired state work-flow and use this as a map for implementing this new structure or process. In this case, you build the to-be state by modifying the as-is state. This will include

• identifying new activities

• identifying improvements to poor characteristics such as missing procedures or poorly trained staff

• streamlining inefficient flows

• redesigning bottlenecks

If you are planning an incident management capability from scratch, in theory you can use our generic model to help determine what functions you want to include and what proce-dures, staff, and technologies will be needed. We are piloting this work to test our theories.

We are also looking for better tools and packaging of our workflows diagrams and descrip-tions, so they can be more easily used to perform this work.

CMU/SEI-2004-TR-015 31

32 CMU/SEI-2004-TR-015

Table 2: Detect Events Workflow Example Mission/Objectives Triggers Completion CriteriaPolicies and Rules General Requirements • To identify unusual activity that might compromise the mission of the CSIRT constituency and/or the CSIRT

• When suspicious or unusual activity is noticed

• When a decision about an event is made (i.e., forward to T: Triage Events, reassign to other processes, or close) • When outputs are ready to be passed to the next process

• CSIRT/IT policies • Security-related regulations and laws

• Note: no defined requirements for handling sensitive information • CSIRT staff and some IT staff receive appropriate training in procedures and technologies related to the tasks they are required to perform. • Yearly quality assurance checks are performed on automated tools. • CSIRT and IT staff consider appropriate security measures when configuring and maintaining automated tools. Inputs Outputs InputDescription FormDecision Output Description Form General indicatorsThis is data that is proactively gathered by IT staff, such as loginformation. Electronic, or physical Event requires further incident management action (i.e., event is sent to T: Triage Events) Event information This includes all information that is passed to T: Triage Events for a given event. It can include the reported information and general indicators.

Electronic, or physical Event re- portsThis includes reports of unusual or suspicious activity to the CSIRT identified by the organization’s employ- ees using the computing infrastructure.

Verbal, electronic, or physical Event report is closed Closed reportsThis includes all information related to an event that has been closed. It can include the reported information and general indicators. It can also include the rationale for closing the event.

Electronic, or physical SubprocessSubprocess Requirements Written Procedures Key People Technology • Computing infrastructure users notice suspicious or unusual activity and report it to the CSIRT. Inputs Outputs

D1. Notice event (reac- tive) • Event reports

• No written or formal procedures for organization’s staff to follow when reporting events.

• CSIRT constituency notice and report events to the Help Desk.

• CSIRT constituency uses the following technology when noticing and reporting events: − Communication channels (telephone, email, web) • Help desk personnel review report information, verify it, and decide whether to forward an event report to T: Triage or close it. Inputs Outputs

D2. Receive report • Event reports• Event information* • Closed reports*

• Help Desk staff follow procedures for collecting required event information and sending it to T: Triage. • Help desk staff follow appropriate procedures for closing events.

• Help desk personnel receive reported information from users of the organization’s computing infrastructure

• Help desk personnel use the following technology when receiving, reviewing, and deciding what to do about reported information: − Communication channels (telephone, email, web) • IT staff proactively monitor indicators for potential events (e.g., log information). • IT staff analyze event indicators and decide whether to forward an event report to T: Triage. Inputs Outputs

D3. Proactive Detect • General indicators* • Event reports

• IT staff follow operational procedures for monitoring and reviewing general indicators. • IT staff follow operational procedures when completing an event report and sending it to T: Triage.

• IT staff perform proactive detect activities.

• Designated personnel can use the following technology when monitoring for general indicators: − System administration and security tools (e.g., logs and vendor applications) − Communication channels (e.g., email, web)

CMU/SEI-2004-TR-01533

Figure 8: Example of Swim-Lane Chart Showing a Specific Instantiation of an Incident Handling Capability Derived from the Detect, Triage, and Respond Process Workflows and Descriptions

If management or legal response is needed External Experts and Organizations

Provide additional information Coordinate Management Response

Plan Technical Response

Event report

Receive Report Proactive DetectPlan Technical Response

Closed event

Closed report CSIRT

Help Desk

System Users IT Department Management

Event report Execute Technical Response

If no response is needed

If no response is needed Categorized, prioritized, assigned event

Possible event report

DetectTriageRespond Analyze Event Event report

If technical response is needed Execute Technical Response Closed event

General Indi- cators Provide advice and guidance

Notice event Triage

34 CMU/SEI-2004-TR-015

In document Defining Incident Management Processes for CSIRTs: A Work in Progress (Page 47-54)