ILM, classification and the
Information-Centric Enterprise
Per Sedihn, Vice Chair Nordics Comitte SNIA Europe
CTO Proact IT Group
SN IA Legal N otice
The material contained in this tutorial is copyrighted by the SNIA.
Member companies and individuals may use this material in presentations
and literature under the following conditions:
Any slide or slides used must be reproduced without modification
The SNIA must be acknowledged as source of any material used in the body of any
document containing material from these presentations.
This presentation is a project of the SNIA Education Committee.
Neither the author nor the presenter is an attorney and nothing in this
presentation is intended to be nor should be construed as legal advice or
opinion. If you need legal advice or legal opinion please contact an attorney.
The information presented herein represents the author's personal opinion
and current understanding of the issues involved. The author, the
presenter, and the SNIA do not assume any responsibility or liability for
damages arising out of any reliance on or use of this information.
Important New Datacenter
Trends
Information Convergence
•
“Driven by cost, complexity, and business risk, the operations,
practices, and roles in the datacenter are converging around
information and its value to the organization, catalyzing the
transformation of the datacenter into an ‘Information-Centric
Enterprise’.”
“The Information-Centric Enterprise
is the future of the datacenter”
Why Create an
“Information-Centric Enterprise!
Expanding Business
Requirements
•
Global business, global
datacenters, global information
sharing and access to
information
•
Real-time information
integration requirements
Growing Cost of Operations
Staffing - limited resources
and virtual locations
Complexity and information
overload
Information security threats
Compliance and legal
overhead
Risk management and
increasing risk
•
Litigation, discovery, privacy &
confidentiality, security, and
audit
Long-term retention &
preservation requirements
for extremely large amounts
of information
Creating the
Information-Centric Enterprise
At the beginning…
•
The message was right, but
the method is not just “better
information management”
or just ‘managing information
according to its value…’
Now we need a shift in
operating practices based
on:
•
Collaboration
•
Clear requirements
•
Service Management Practices
•
Instrumentation & Automation
of Services
… It is time for organizations to
begin architecting and
implementing practical information
management compliance solutions.
By managing information
according to its value …
Creating the
Information-Centric Enterprise
NEW THINKING
•
Use the “value of information
to the organization” to define
the requirements for
management and operating
practices
The solution requires a
process approach
incorporating technology
•
Information-Centric
Management practices
“If you want to successfully solve
the complexity and cost crisis in
operating the datacenter, you have
to change current practices and
begin working together as an
organization.”
Michael Peterson, “Collaborate or Die” Nov.
2006
“We’re discovering how the old,
data-centered approach to I.T. really isn’t
working. The difference between managin
data and Information-Centric Management
is profound.”
– Large Nordic Company
Information-centric strategy
Corporate
Corporate
information is simply
information
is simply
data to the data center
data
to the data center
•
•
Data is what I.T. manages: files, volumes, bits and bytes
Data is what I.T. manages: files, volumes, bits and bytes
•
•
Information is data with context: decisions are based on
Information is data with context: decisions are based on
information
information
•
•
Use a
Use a
collaborative process to identify
collaborative process to identify
information
information
service requirements
service requirements
Use these requirements to define an SLAUse these requirements to define an SLA
Line of Business (LOB) information stakeholders:
Line of Business (LOB) information stakeholders:
•
•
Application performance, availability,
Application performance, availability,
…
…
•
•
IT response times, asset reporting,
IT response times, asset reporting,
…
…
•
•
Cost (for chargeback)
Cost (for chargeback)
Corporate information stakeholders:
Corporate information stakeholders:
•
•
Security officer: Secret, confidential, public,
Security officer: Secret, confidential, public,
…
…
•
•
Records Manager: retention time,
Records Manager: retention time,
…
…
•
•
Compliance officer: authorization, retention,
Compliance officer: authorization, retention,
…
…
IT service delivery stakeholder:
IT service delivery stakeholder:
•
•
Effective resource management based on requirements
Effective resource management based on requirements
•
•
Service delivery that meets business needs
Service delivery that meets business needs
DBA Business Process Analyst Records Manager Security Officer Legal Data Admin IT Admins IT Architect
Information Classification Requirements Data SLA App Owner DBA Business Process Analyst Records Manager Security Officer Legal Data Admin IT Admins IT Architect
Information Classification Requirements Data SLA
ILM: Information Lifecycle
Management
From the SNIA Dictionary:
The
policies
, processes, practices and tools used to align the
business value of
information
with the most appropriate and cost effective
IT infrastructure
from
the time information is created through it’s final disposition.
Information is aligned
with business processes
through management polices and service levels
associated with applications, metadata, information and data
Today, let’s just talk about
ILM Policies
:
“ILM” as the policy-based alignment of
information requirements
and the most
appropriate infrastructure using
data classification
and
service level
AMOUNT OF DIGITAL INFORMATION
CREATED AND REPLICATED EACH YEAR
10-fold growth in five years!
1,773 exabytes
Exabytes
173 exabytes
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2006 2007 2008 2009 2010 2011 DVD RFID Digital TV MP3 players Digital cameras Camera phones, VoIP Medical imaging, Laptops, Data center applications, Games Satellite images, GPS, ATMs, Scanners Sensors, Digital radio, DLP theaters, Telematics Peer-to-peer, Email, Instant messaging, Videoconferencing, CAD/CAM, Toys, Industrial machines, Security systems, AppliancesWhy Classification? Why
Now?
Growth of Information hasn’t
slowed
•
Over a Zettabyte by 2010
Biggest impact is not the
storage – it’s the
management!
•
70% created by individuals
•
85% responsibility of organizations
to ensure privacy, security,
reliability & compliance
•
Driven by many federal, state,
local and
industry-specific regulations
•
Switches focus from “capex” to
“opex”
Source: IDC White Paper, "The Diverse and Exploding Digital Universe,"AMOUNT OF DIGITAL INFORMATION
CREATED AND REPLICATED EACH YEAR
10-fold growth in five years!
1,773 exabytes
Exabytes
173 exabytes
0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 2006 2007 2008 2009 2010 2011 DVD RFID Digital TV MP3 players Digital cameras Camera phones, VoIP Medical imaging, Laptops, Data center applications, Games Satellite images, GPS, ATMs, Scanners Sensors, Digital radio, DLP theaters, Telematics Peer-to-peer, Email, Instant messaging, Videoconferencing, CAD/CAM, Toys, Industrial machines, Security systems, AppliancesWhy Classification? Why
Now?
Storage landscape is
changing
•
Is the DVD player more
important…
or the movies you watch?
Need to answer some basic
questions
•
“What
is all this information?”
•
“Is anyone managing
it?”
•
“How is authenticity
being
maintained”?
•
“Can we delete
some of it?”
Source: IDC White Paper, "The Diverse and Exploding Digital Universe," Sponsored by EMC, March 2008
What’s Driving The Need for
Classification TODAY
Corporations are
saving everything
, because
•
They are unsure about the value of their information
•
They are being litigated
•
They are complying with government regulations
Resulting in
Massive amounts of information growing at fantastic rates
Information security breaches
Lots of money being spent for governance and compliance
Corporations are balancing
IT
Infrastructure and Management Costs
against
Information Risk Management
Capacity
Time
Data Growth
Corporations are
saving everything
, because
•
They are unsure about the value of their information
•
They are being litigated
•
They are complying with government regulations
Resulting in
Massive amounts of information growing at fantastic rates
Information security breaches
Lots of money being spent for governance and compliance
Three drivers for automated classification:
1.Risk Management
2.Reduce Storage TCO
3.Improve Productivity
Capacity
Time
Data Growth
What’s Driving The Need for
Classification TODAY
Classification Driver #1
Risk management
•
Compliance:
`
Payment Card Industry Data Security Standard
(
PCI
)
`
Health Insurance Portability and Accountability
Act (
HIPAA
)
`
New Federal Rules of Civil Procedure (
FRCP
)
`
EU Directive on Privacy and Electronic
Communications (2002/58/EC)
•
Information Security
`
Protecting Personally Identifiable Information
(
PII
)
`
Data Leakage Prevention (
DLP
)
$80B spent on compliance by 2009
Compliant records growing 60%/yr at > 2PB in
2007
Fastest growing application segment of storage
* Source: Fred Moore, Horison, Storage Spectrum 2006
Top 10 Customer Data-Loss Incidents Since 2000* Number of affected customers Date of initial disclosure Company / Organization 94,000,000 2007-01-17 TJX Companies Inc. 40,000,000 2005-06-19 Visa, et al 30,000,000 2004-06-24 America Online 26,500,000 2006-05-22 U.S. Department of Veterans Affairs 25,000,000 2007-11-20 HM Customs and Revenue
8,637,405 2007-03-12 Dai Nippon Printing Company 8,500,000 2007-07-03 Fidelity National Information Services 6,300,000 2007-09-14 TD Ameritrade 6,000,000 2008-05-11 Chilean Ministry of Education 5,000,000 2003-03-06 Data Processors International
Litigation Support and eDiscovery
eDiscovery and records management coming
together
•
Driven by huge costs and risks
•
Changes to the Federal Rules of Civil Procedure
Electronically Stored Information (ESI) is subject to
production (the way it is managed from cradle to grave
will affect costs and risks of eDiscovery)
There will be an early “meet and confer”
Word “preserving” appears in the rules for the first time
There is a need to understand the “sources” of ESI
•
Average eDiscovery costs can run into the millions of
dollars per event
Classification Driver #2
Storage TCO
•
External disk storage purchase projected to grow at 52% annually
•
Capacity is #1 storage issue driven by email, unstructured data
•
Significant transition to disk-based archival storage
•
Digital archive capacity will increase nearly tenfold between 2005 and
2010
2005
2010
Total Digital Archived Capacity,WW
Database
Unstructured
54% CAGR54% CAGR79% CAGR
79% CAGR
68% CAGR
Classification Driver #2
Storage TCO
•
External disk storage
purchase projected to grow at
52% annually (capex)
•
Capacity is #1 storage issue
– driven by email &
unstructured data
•
Significant transition to
disk-based archival storage
•
Digital archive capacity will
increase nearly tenfold
between 2005 and 2010
Storage Capex vs.
Opex
•
Capex may be as low as
25% of TCO
*•
Impact of data
management is rising
•
Greater savings possible
through more effective
processes
IDC report – by 2011:
• 1.773 zettabytes, ~60% CAGR • 70% created by individuals • 85% managed by organizationsClassification Driver #3
Improved productivity
The average knowledge worker spends
six hours per week
searching for information
•
50% of all searches fail to locate desired information
•
15% of the average knowledge worker’s time is spent recreating
existing information
Need
•
Better organization of information
•
Accurate search
•
Consistent management of information
Who Needs to Classify
Traditionally –
Records Information Managers
•
Records retention for regulatory compliance
•
Coordinated records management across the enterprise
Now –
Electronically Stored Information (ESI) forces information-centric
collaboration
•
Information Technology:
provides service delivery for ESI
•
Line of Business
(LOB information stakeholders):
`
Specify ESI & application behavior: performance, availability, recoverability…
`
Specify application support: staff response time, asset reporting…
`
Identify budget/cost
•
Corporate stakeholders
Æ
Â
Classification must address
multiple
perspectives
Â
Support both overlapping and
non-overlapping requirements
Risk Management TCO
Perspective Security & Risk Mitigation Litigation Support Records Mgmt Cost & Performance Mgmt Chief Security Officer 9
Chief Legal Officer 9 9 9
Corporation Counsel 9 9
Records Information Manager 9 9 Chief Compliance Officer 9 9 9 Chief Risk Officer 9
Chief Financial Officer 9 9
Line of business 9 9 9 9
Chief Information Officer 9 9 9 9
Different performance requirements
per classification
Data may be classified by:
- Application or business process
- Metadata (e.g., time last accessed)
- Content
Organizing your data into logical groups
High Throughput
Four 9’s Availability
Fast Recoverability
• High Throughput
• Five 9’s Availability
• Fast Recoverability
• Medium Throughput
• Four 9’s Availability
• 24 hour recovery
Classification perspectives:
- Application performance
- Records Retention
- Archive / Lifecycle Mgmt
- Security
- Privacy
- Information Rights Mgmt
- Legal Discovery
- and more
Organize the same data into
multiple logical groups
Immutable
Retain for 7 years
DoD Shred
•Retain for 3 years
• Retain Forever
ILM Assessment Phases
Prepare
•
Obtain Sponsorship
•
Build a team
•
Set Goals and milestones
Analysis
•
Collect/Develop service level objectives
•
Analyze Gaps
•
Recommendations
•
Design
Implement
Defining “Classification”
Online Dictionary (Answers.com):
•
A way or condition of being arranged
•
A subdivision of a larger group
Records Management perspective (Indiana University*):
•
Classification is the systematic identification and arrangement of
records into categories according to logically structured conventions,
methods, and procedural rules represented in a classification scheme
SNIA Dictionary (data classification):
•
An organization of data into groups for management purposes. A
purpose of a classification scheme is to associate service level
objectives with groups of data based on their value to the business.
EIW Workshop use of “classification”:
•
An organization of data, information, or resources into groups for
management purposes. A purpose of a classification scheme is to
associate requirements or policies for the handling of that data or
information
ILM, Classification, and Resources
Key characteristics of ILM:
1.
Information Classification gathers requirements to guide configurations
2.
Standard configurations described using Service Level Metrics
3.
Data classification can align data to resources by Service Levels
Standard Configurations