• No results found

Stop Reacting; Start Anticipating Disasters BEFORE They Occur Using Predictive Analytics

N/A
N/A
Protected

Academic year: 2021

Share "Stop Reacting; Start Anticipating Disasters BEFORE They Occur Using Predictive Analytics"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Stop Reacting; Start Anticipating Disasters

BEFORE They Occur Using Predictive

Analytics

Richard Cocchiara: IBM Distinguished Engineer; CTO – IBM Business Continuity & Resiliency Services (BCRS);

Managing Partner – IBM Resiliency Consulting Services; Member of the IBM Academy of Technology Leadership Team

CIO’s are struggling to address board level requirements that are vastly different

from what they were responsible for just 5 years ago.

How do you…?

• Increase regulatory compliance without increasing capital expenses

• Block potential incoming threats without inhibiting traffic flow, data availability and uptime • Prepare for the unexpected outage or disaster

“I need to manage complexity of compliance across my organization and silos -- and be audit-ready all the time.”

“Lack of resources, expertise and tools to cost effectively manage multi-vendor environments” “I need to protect against threats – even the ones I’m not prepared for.”

“I need to provide access to and recoverability of data at any time.”

2

Companies face a growing number of growing risks to their IT that continuously

stress their ability to deliver service to their customers.

Frequency of occurrences per year 1,000 100 10 1 1/10 1/100 1/1,000 1/10,000 1/100,000

US$1,000 US$10,000 US$100,000 US$1,000,000 US$10,000,000 US$100,000,000

Fr eq ue nt In fr eq u en t

Consequences (single occurrence loss) in dollars per occurrence

Low High

Viruses

Worms

Disk failures

System availability failures

Pandemics Natural disasters Application outages Data corruption Network problems Building fires Terrorism/civil unrest Data driven Event driven Business driven Regulatory compliance Workplace inaccessibility Failure to meet industry standards

Regional power failures Governance

Source: IBM

Data growth

Long term preservation

Mergers and acquisitions New products Marketing campaigns Audits AC failure Acid leak Asbestos Bomb threat Bomb blast Brown out Burst pipe Cable cut Capacity failure Chemical spill CO fire Coffee machine Condensation Construction Coolant leak Cooling tower leak Corrupted data Denial of Service Attack Diesel generator failure Earthquake Electrical short Epidemic Evacuation Explosion Fire Flood Fraud Frozen pipes Hacker Hail storm Halon discharge Human error Humidity Hurricane HVAC failure Hardware failure Ice storm Insects Lightning Logic bomb Lost data Low voltage Microwave fade Network failure Pandemic PCB contamination Plane crash Power grid outage Power outage Power spike Power surge Programmer error Raw sewage Relocation delay Rodents Roof cave-in Sabotage Sprinkler static Shotgun blast Shredded data Sick building Smoke damage Smoke from restaurant Programmer error Regulatory Compliance Raw sewage Snow storm Software error Electricity Strike action Swimming pool leak S/W ransom Terrorism Theft Tornado Train derailment Transformer fire UPS failure Vandalism Vehicle crash Virus Water (various) Wind storm Volcano or volcano ash

Source: Contingency Planning Research, Inc. and IBM

As budgets shrink and service level requirements increase, a company’s

business becomes even more vulnerable to IT outages.

The impact of lost data or unplanned downtime can be catastrophic, leading

to lost revenue, reputation, and competitive position.

Finances ƒ Lost deals

ƒ Disruption of cash flow ƒ Lost discounts ƒ Missed payments ƒ Drop in stock price

Loss of reputation ƒ Company reputation ƒ Damaged relationships with: a. Customers b. Suppliers c. Partners d. Lenders e. Investors Revenue

ƒ Loss of direct revenue ƒ Loss of future revenues ƒ Losses due to invoices

that cannot be completed ƒ Losses due to investments

not made

Miscellaneous costs ƒ Temporary staff needed ƒ Travel expenses incurred ƒ Equipment rental costs

incurred

Productivity

ƒ Employees who cannot perform their jobs ƒ Missed deadlines

Regulatory ƒ Inability to meet

compliance requirements

(2)

At the same time the cost of downtime increases, company’s are inundated with

disjointed information.

*Zettabytes equals 1 trillion gigabytes

The amount of information managed by enterprise data centers is expected to increase by at least 50 times over the next decade1

2010 2020 40 Zettabytes*

50x

2. Source: Aberdeen Group: “Datacenter Downtime: How Much Does it Really Cost?,” March 2012

$110K

2010 2012 $182K The average cost per hour of system downtime is increasing as more business operations become automated2

1. Source: IDC Digital Universe Study, June 2011

Average cost of one hour of downtime

5

The changing risk landscape will require a shift to a new paradigm that

anticipates and integrates daily operations, emergency management &

business continuity

Handle both emergency and non-emergency events, tests and alerts.

Organize response teams, enabling fast and clear communications between team members.

Define and provide standard operating procedures for varying situations, with proper assignments, based on legal requirements or historical experience. • Track the progress and performance of procedures,

including the results of the actions for rehearsals and

events.

Locate and manage resources with the required

capabilities and skills to handle the events. • Enable the continuous improvement of the

organization’s services and responses.

Recover Manage Prevent 6 ƒ IT: reactive Business: none

ƒ Recovery time: days or weeks

IBM has seen this shift coming for a while and understands that this is part of the

evolution from disaster recovery, business continuity and business resilience into

the era of Intelligent IT Risk.

Syndicated hardware Dedicated hardware Cloud computing

ƒShared recovery model

ƒDedicated recovery model

ƒVirtualized model ƒ IT: proactive Business: reactive ƒ Recovery time: minutes or hours ƒ IT: proactive Business: proactive ƒ Recovery time: minutes or seconds IBM BCRS founded in 1989 IBM BCRS in the future Predictive Analytics 7 ƒ IT: anticipative Business: anticipative ƒ Recovery time: seconds or always up Disaster recovery Business continuity Business resiliency Intelligent IT Risk Management

The definition of IT risk is drawn from several synergistic points.

1. Business results are inextricably reliant on IT service thus IT must support critical business processes and key initiatives by being

– reliable, – predictable, – available and – secure.

2. “IT” is much broader than ‘infrastructure’ (boxes and network) and includes process, people, data and applications, facilities, and business and IT strategies.

3. “IT” is under intense pressure to execute thus must be: – flexible and appropriate,

– available and recoverable, – scalable and ready to perform, – secure and protected, – accurate and timely.

• Thus, “IT Risk”is “The business risk associated with the use,

ownership, operation, involvement, influence and adoption of IT within an enterprise.”

“The priority now is to connect the top-down and bottom-up views so that our risk management framework will be a truly holistic business resilience strategy.”

– Jean-Pierre Bourbonnais, CIO and Vice President of Information Technologies, Bombardier Aerospace

(3)

Leveraging information to

make better decisions

Anticipating problems to

resolve them proactively

Coordinating resources

and processes to operate

effectively

Intelligent IT Risk will use predictive risk analytics tied to response capabilities

designed to ensure continuous operations.

Predictive risk analysis integrated within the business’ daily operations helps to filter business critical information so they may anticipate problems and opportunities to make the right response faster.

• Large scale situational awareness • Mitigate risk across wider risk spectrum • Respond to risk and opportunities • Monitor multiple, diverse inputs

• Manage key risk indicators • Prepare earlier to cut response time

• Intelligent response Integrated with the

fabric of the business Continual involvement versus one time training

9

IBM has created a framework for identifying the risks associated with the

use of IT that takes a broad and integrated view starting with an

understanding of the core business requirements.

10

IT risk management requires the analysis of a broadly linked IT Risk

Spectrum that goes beyond the traditional view of business continuity.

Availability & Recoverability

keep systems running and, if necessary, recover from interruptions in line with business expectations.

Security & Data Protection

provide the appropriate access controls while protecting the business’ information and resources

Agility & Appropriateness

respond in a timely manner with the correct new or modified IT Service in support of changes in business requirements

Scalability & Performance

maintain acceptable performance based on business needs and appropriately accommodate changes in business service volume

Accuracy & Timeliness

provide accurate data, to the right people, at the right time to make informed business decisions.

IT

Risk

Spectrum™

The IBM Risk Spectrum is applied against the company’s business resilience

delivery framework and can be “decomposed” for both dependency and

parallel analysis.

People

Human resources with assigned responsibilities within the company and the processes to maintain

Components under company control that enable operations

“Exo-Structure”

Ecosystem components outside company control (power, water, food, roads, communications and governance

Suppliers Businesses and government agencies that provide the critical materials, services and information

Process

How company conducts its core business through business process modeling and IT governance

Technology

Equipment and tools that support the company’s business processes

(4)

Our methodology helps a company to understand their strategic business goals and

risks to create a real-time IT risk management system.

1

ASCERTAIN and align strategic business goals with value of IT services

ASSESS IT risks and capabilities

ACT to create an ongoing IT risk management governance system

2

3

1. Identify Strategic Initiatives against which to manage and exploit IT capabilities

2. Map strategic initiatives to Organization and IT support processes and services with measurable indicators and estimated impact to initiatives 3. Categorize IT performance metrics

against the IT Risk Spectrum.

1. Quantify IT risk to organization as the gap between required vs. actual business performance metrics 2. Conduct an IT service “all capabilities”

analysis to identify measurable IT risk and performance metrics

3. Define and prioritize the appropriate IT service risk treatment and roadmap

1. Define or integrate IT Risk management principles into an ongoing IT risk management program

2. Recommend organization structure, roles & responsibilities, and policies to help you continuously monitor and respond to changes in IT risk

3. Define communication and awareness programs

“Exploit IT Services to

Support Organizational goals” “Improve response to IT Risk”

“Create a Risk Aware Organization”

13

13

A effective IT Risk strategy includes defining and measuring Key Risk Indicators

(KRI) customized to each company’s unique requirements.

Scalability and performance Agility and appropriate-ness Security and data protection Availability and recover-ability Accuracy and timeliness People Processes Technology Infrastructure Suppliers Exo-structure IT KRI IT KRI IT KRI IT KRI IT KRI IT KRI IT KRI IT KRI de fi ned at each in te rs ectio n 14

But KRI’s are only useful if you can combine real-time monitoring and predictive

analytics with robust response capabilities.

Expert System Predictive Systems Modeling & Simulation Archives Portal Access Incident Management Directives KPI䇻s Alerts

Event Rules Workflows

Standards Based Interfaces Domain Specific Interfaces Gateway Security Monitoring Reporting Rapid Recovery Resources Semantic Models Service Bus

Analytics Response Capabilities

Data Integration Feeds for : • Weather • Geological • Traffic • Employees • Health • Financial

What ,When, Where, Why and How

Gateway IT Operations Monitoring Gateway Compliance Monitoring Gateway Event Monitoring

Companies must have access to flexible and dynamic information readily

available that can be used to assess the current situation and take appropriate

corrective action.

ke appropriate

Active Workflows

Command Center

Communication Management

Integrated System Monitoring Integrated System Monitorin

Role based views

Data drill down

3d Modeling Event correlation detection Click to Action Social interaction Executive Dashboard Progress Reporting Prog

ProgressressRepReportiortingng

available that can be

c

c

co

co

cor

cor

cor

cor

or

cor

or

or

or

or

o

rrec

rec

rec

rec

rec

rec

rec

rec

rec

rec

tiv

tiv

tiv

tiv

tiv

ti

tiv

tiv

tiv

tiv

ti

i

e a

e a

e a

e a

e a

e a

e a

e a

e a

e a

cti

ction.

cti

cti

cti

cti

cti

cti

cti

ti

ti

i

on

on

on

on

on

on.

on

on.

on.

o

n

n

Active Workflows

.

.

s s baaseed vd viewiew

Click to Action nteractioni

3d Modeling Video Analytics

(5)

Operational Efficiency Incident / Event Management

Global Operations Work Area

Mega Centers

GOAL: Ensure that the managed environments maintain

operational efficiencies.

GOAL: Effectively manage events and return to a steady state.

Collaboration Intelligent Response

GOAL: Get the right information to the right people at the right time for rapid problem resolution

GOAL: Anticipate to provide real-time response using best practice SOPs, workflows, and resources

Plans Workflows Business Rules Available Resources Intelligent Operations

The use of predictive analytics and a robust command center allows for

improved efficiency, management, collaboration and response to events.

Daily Operations Predictive analytics

Incident Identification / Warning Emergency And Crisis Mngt Business Continuity

17 17

Rapidly respond to emergencies

Standard Operating Procedures (SOP) •Extreme Weather Event Preparation •Flash Flood Preparation •Flash Flood •Evacuation

Example Scenario:

Heavy rains are predicted to cause large scale flooding in the city where the business’ main processing center is located. The center monitors sources that predict the magnitude of the storm and possible outcomes. This will allow the center to start the SOPs that are needed for extreme weather preparation.

As the weather incident continues to affect the city, additional SOPs can be activated to send people home, begin critical backup, move operations, or mobilize additional resources. As these predetermined SOPs execute, constant situational awareness events from the center can be used to ensure the most appropriate response is delivered. 1. Predicted extreme weather 2. Situational awareness engines monitor weather feeds in the center

3. Rules engines will start automated responses via standard procedures (SOPs)

4. The center will manage the most appropriate response based on the situational awareness information and the incident in hand

Example scenario

News feeds Traffic flow Weather prediction – Deep Thunder

18

Combining predictive analytics and business continuity capabilities into an

intelligent command center provides near and long term cost efficiencies.

The right business resilience strategy can help you:

• Mitigate risk

– Avoid the costs of downtime, brand damage and market share lost to competitors, and reduce the financial impact from business disruptions • Protect brand and revenue

– Properly assessing the dynamic threats to your IT infrastructure, their potential business impact and your tolerance for risk can help you plan a realistic strategy

• Protect capital

– Analyzing cost tradeoffs can help you avoid unnecessary investments • Reduce costs

– Creating proactive SOP’s with tested response capabilities can help protect you from costs associated with failed recovery and lost data • Improve service

– You can better align a resilient infrastructure to the needs of your business to maintain service level agreements based on your tolerance for risk

Thank

you

ibm.com/services/continuity

Richard Cocchiara

IBM Distinguished Engineer

[email protected]

(6)

21

Copyright information

© Copyright IBM Corporation 2014 IBM Global Services Route 100 Somers, NY 10589 U.S.A. Produced in the United States of America February, 2014

All Rights Reserved

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of

IBM trademarks is available on the Web at “Copyright and trademark information”.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Other company, product and service names may be trademarks or service marks of others.

References

Related documents

Discussion has so far applied the MLP and SPT independently to examine the limited success of VBBs and related banking practices, respectively. We undertook these analyses with the

(C) Air pressure on the floor equals the weight of the air coloumn inside the room (from floor to ceiling) per unit area.. (D) Air pressure on the walls is zero since the weight of

Overall, based on the in-sample evidence in both countries, it is clear that the foreign term spreads, several lagged stock returns and the interest rate differential in Germany

3 Forced Convection Flow Inside Non-Circular Ducts, Turbulent (Re > 2300) Equations for circular tube with hydraulic diameter. 4 Forced Convection Flow Across Single

This study, therefore, aimed to evaluate the costs and consequences of introducing POC testing for HbA1c in patients with type 2 dia­ betes mellitus at community health centres in

Characterization of vertically aligned carbon nanotube forests grown on stainless steel surfaces.. Eleftheria Roumeli 1,2 , Marianna Diamantopoulou 1 , Marc Serra-Garcia 1,3 ,

Financial institutions that have branches in Mexico and near the Southwest border are more exposed to money laundering risk due to a unique, dangerous and volatile combination of

The survey tools were designed by the assessment team and reviewed by a stakeholder group represent- ing academics, practicing clinicians, and public health officials from