©Copyright 2003-Present Atre Group, Inc. www.atre.com Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Big Data & Data Gate
:
Auditing Big Data : What do we need to know ? *What is Big Data?*Why is it important? *10 Golden Rules for it to be useful *Security & Privacy *How to audit “Big Data Implementation” in an Organization?
Atre Group, Inc. Milan, Italy: Oct 02,2013 Innovations & Norms: Friends or Enemies? Shaku Atre,
Atre Group, Inc.
366 West 11th Street, Suite 7D, New York, NY 10014
521 38th Avenue, Santa Cruz, CA 95062
2
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Is the overuse of the phrase “Big Data” tiring you?
Too much of anything could be tiring
Big Data seems to be the phrase that is tiring us now
When can we label the data as Big Data?
Let us attempt to set up a framework of rules when a Data System can be called a Big Data System
Here is the framework of 10 Rules!
My Blog http://www.atre.com/big-data/
Shift focus from “Big” to “”Data”! And then to “Big Data Business
Analytics”! And then to “Value” gained!
3
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Shaku Atre’s Ten Golden Rules of Big Data Systems
1.The Big Data System for Business Analytics Rule
2.
The Big Data System Access Rule
3.The Big Data System Scalability Rule
4.The Big Data System Flexibility Rule
5.
The Big Data System Security and Privacy Rule
6.
The Big Data System Visualization & Data Mining Rule
7.The Big Data System Dispersion and Reassembly Rule
8.The Big Data System Dormant Data Access Rule
9.
The Big Data System Skill Set Rule
10.
The Big Data System “Big Human Judgement” Rule
We will try to handle one rule at a time. Here we go!
4
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Ten Golden Rules of Big Data Systems
The Big Data System for Business Analytics Rule
The V⁵ Function
To qualify as a Big Data System, a system must be a function of V⁵
Big Data =
f
(Variety of Data,Various Interactions due to correlations between data,
Velocity of data arrival and departure,
Volume
of data,Providing big Value to the business via Business Analytics)
http://www.atre.com/big-data/
Big Data Business Analytics Must:
• Provide business
performance comparisons • Use Big Data as actionable
information to improve
performance of the business New Big Data Trends are:
• Meet V⁵ requirements
• Look at the data for opportunities
• Use the data to reap the most benefits from it! • Determine your “Golden
Path” business analytics
5
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
What is Needed to Perform Business Analytics with Big Data?
First principle of analysis Based on comparison Better or worse? Than what? Second principle of analysis Based on evidence Facts…
Because data doesn’t lie What we sold
How we sold Why we sold
or didn’t sell
What we were able to sell compared
to the goal set?
Third principle of analysis
Provide actionable information with multiple variables
Why, how, what, where, when, how much
What action should we be taking or should we keep
it status quo?
6
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
% of technology-savvy users/customers is in business environment % of business-savvy users/customers is in business environment % of young customers (< 25 years of age) is on the rise at
a very fast rate
One of the big reasons is easy access to
social media!
Ten Golden Rules of Big Data Systems
Big Data System Access Rule
http://www.atre.com/big-data/big-data-system-access-rule/
For a system to qualify as a Big Data System
it must address the needs of all types of
users
,
their varied demands, and various technologies they are using.
Let us see whether the demography of our varied user groups is
changing as we speak?
7
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System Scalability Rule
RULE 3Growth of data
Growth of number of users
Growth of various devices and channels
used for sending and receiving data Growth of various functionalities
Growth of interactions between these variables Growth of expectation levels of performance
http://www.atre.com/big-data/big-data-system-scalability-rule/ For a system to qualify as a Big Data System, it must be scalable with Big
Data in Motion and Humongous Data at rest
Scalability
of information
α
α
Left Variable Right Variable
8
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System Flexibility Rule
http://www.atre.com/big-data/big-data-system-flexibility-rule/
For a system to qualify as a Big Data System,
it must have
flexibility
with its underlying architecture
RULE 4
Flexibility
of Delivery of Information
Is the underlying architecture of the software flexible enough to:
• Receive and accept data from
a variety of devices
• from various types of users • in a variety of formats, • at various timeframes • at various speeds, and • in various volumes?
Business Analytics Software’s and Hardware’s
Underlying Architecture
α
Can the software store it, read it, and divide it if necessary, work on it and present it in various formats, on various devices to satisfy variety of needs of business analytics to improve business performance?
If the underlying architecture is not meant for the new features and you try to fit it – as a “Square Peg in Round Hole” the new doesn’t fit and the old gets
9
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System Security and Privacy Rule
Decide how to implement security and privacy
between the interacting sources of data
with the
complexity of correlations
involved.
RULE 5
http://www.atre.com/big-data/rule-5-the-big-data-system-security-and-privacy-rule/
For a system to qualify as a Big Data
System, it must implement
security and privacy
between the interacting sources of data
with the complexity of correlations
involved.
Implementationof Privacy
α
Implementation of SecurityAs a car is driven thru a toll booth, various interacting databases can be accessed and the driver’s private information will be at risk!
10
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System Visualization & Data Mining Rule
Very large sets of data in complex relationships are difficult to grasp. Can the software you are considering prepare
performance dashboards based on the Key Performance Indicators (KPIs)
determined by the organization?
RULE 6
http://www.atre.com/big-data/
For a system to qualify as a Big
Data System,
it must be able to represent data in a
visualized
form as a performance dashboard and find some
“nuggets” of insights, as well, that we didn’t know before
Business Analytics
Effective Performance Dashboards
integrating words, numbers, images
and possibly audio/ video by integrating data sources for providing possible
inferences for action
α
11
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The System Dispersion and Reassembly Rule
http://www.atre.com/big-data/rule-7-the-big-data-system-dispersion-and-reassembly-rule/
For a system to qualify as a
Big Data System, it must be
able to disperse data in “chunks” to a number of processors,
reassemble them and not lose anything on the way.
RULE 7
Divide and conquer by using low
-
cost commodity hardware
ProcessorsReassemble
Disperse Big Data
12
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System Dormant Data Access Rule
The software should be able to reach into document archives to draw upon the wealth of hidden information
for improving performance of the business
RULE 8
http://www.atre.com/big-data/rule-8-the-big-data-system-dormant-data-access-rule/
For a system to qualify as a Big Data
System,
it must be able to exploit both conventional
and “quirky” reservoirs of
dormant data
Big Data =
f
(Structured and unstructured dormant datagenerated by old, conventional systems as well as new, quirky systems
13
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System Skillset Rule
The greatest skill required is quick determination of knowing what to throw away,
knowing what to keep
and knowing when to walk away That is the secret to survival!
RULE 9
http://www.atre.com/big-data/
For a system to
qualify
and
succeed
as a Big Data System,
a workforce with
skillsets
in Business Knowledge
and Data Science working in tandem is necessary
Success with Big Data System =
f
(Team with Skillsets in Business Knowledge + Expertise in Data Science as a cumulative expertise from the disciplines ofComputer Science, Mathematics, Statistical Analysis, Data Visualization and even Social Sciences!)
14
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Big Data Business Analytics Skillset
Vertical Industry More Specialties Analytics with Deductive Logic Vis ualiz at io n, Au dio , V id eo Pr es enta tio ns
Big Data Business
Technology Business Analytics:
Presentation & Visualization
• Is there one Data Scientist who has expertise in all of these four sectors: with Mathematics, Statistics, Economics, Business Administration…? Absolutely Not
• People with different expertise have to be teamed up!
©Copyright of Atre Group, Inc.
><
><
><
15
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Top Ten Rules of Big Data Systems
The Big Data System “Big Human Judgment” Rule
http://www.atre.com/big-data/
For a system to qualify as a Big
Data System,
don’t ignore I
ntuition + Common
S
ense
…which is not very common!
RULE 10
Big Data is new, but not that new, so it presents a challenge…
convincing the Old Guard that they need to trust Big Data results the way they trust their own intuition…
aka Big Human Judgment
Keep in mind that Garbage in Garbage out is valid for any
machine! But a human brain is capable of determining what
garbage is and what is not almost in an unlimited way!
Determine how Business Analytics
can add the most
improvement value
by using the Big Data and insights hidden in it
16
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Security Requirements
A typical distribution of security requirements
for data:
* 85% of data needs little or no security
* 13% of data needs some level of security
* 2% of data needs high level of security
17
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Security & Privacy : What is at stake with Big Data?
Security and Privacy
walk hand in hand. Implementation of
Privacy is directly proportional to implementation of Security.
That means “No Security Implementation” implies “No Privacy
Implementation”.
Let us consider an example:
A car is being driven through a toll booth. Toll Booth E-Z Pass Toll
Collection System is a Big Data System. The “electronic eye” reads
off of the EX pass located on the front wind shield of each car that passes thru the EZ pass lane (a driver without an EZ pass driving through the EX pass lane creates a big honking competition!).
A banking or credit card database is accessed, which is also stored at one of the banks, for deducting the toll.
18
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Privacy Issues of Time & Location Data
Time and location data is about more than just customers – it is a
way for businesses to know how and where their employees are
doing their work
For example: A delivery service is going to want to know where each delivery person is at any given time
Time and location data raises a lot of serious questions, not only
privacy, but also moral and ethical. Making it one of the most
privacy-sensitive types of Big Data
For example: Should microchips be implanted in children in case they get lost? Or an elderly person with dementia who has a tendency to wander away from home?
19
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Public Cloud versus Private Cloud : Security & Privacy are the
major concerns
Figure 4.5 Public Clouds versus Private Clouds
©Taming the Big Data Tidal Wave, Bill Franks, John Wiley & Sons, Inc
Public Cloud
Firewall
Users
20
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Global Privacy Principles
1.
Notice (Transparency): Why information is being collected
2.Choice: Offer the opportunity to choose what information is
used and or disclosed
3.
Consent: Information is only disclosed to the parties consistent
with notice and choice
4.
Security: Protect collected information from loss, disclosure,
destruction…etc
5.
Data Integrity: Ensure the information is true, complete, and up
to date
6.
Access: Individuals should have access to their personal data
7.Accountability: Firms must be accountable for these principles
21
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Security & Privacy : What is at stake with Big Data?
What are different ways of implementing security?
Centralized vs Decentralized solutions Physical security
Logical security
Authentication, Authorization, Access Control
22
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Security & Privacy : What is at stake with Big Data?
Step 1: Determine Connectivity Paths
Laptops Mobile Devices PCs Mainframe
A
B
D
E
F
G
H
C
This figure maps the physical network to the logical data paths in the Network Communication
Server
Database Server
LAN File Server
23
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Security exists No security
Security & Privacy : What is at stake with Big Data?
Step 2: Linking Connectivity Paths to Security Packages
Use this chart to show the data paths on one axis and the security systems on the other. Mainframe Security Package LAN Security Package PC Security Package ???? Password
Security Encryption Function Specific Security Package 1 Specific Security Package 2 A B C D E F G H I
24
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
Big Data & Auditing : Security & Privacy Centered
Instead of trying to build out your own big data infrastructure,
use big data capabilities in the cloud. Another reason to consider
this is that (1) most organizations don’t have the required big
data security skills anyway, and (2) offloading this to somebody
else frees up resources to deal with the information coming from
the big data analytics
Very Important Questions:
Which big data is considered important enough to be
secured?
Does your organization have Big Data Security Skills?
Have you thought about
Many other domains are also involved, such as legal, privacy,
25
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
How do I audit Organizations with Big Data?
Shaku Atre’s Ten Golden Rules of Big Data Systems
1.
The Big Data System for Business Analytics Rule
2.
The Big Data System Access Rule
3.The Big Data System Scalability Rule
4.The Big Data System Flexibility Rule
5.
The Big Data System Security and Privacy Rule
6.
The Big Data System Visualization & Data Mining Rule
7.The Big Data System Dispersion and Reassembly Rule
8.The Big Data System Dormant Data Access Rule
9.
The Big Data System Skill Set Rule
26
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
How do I audit Organizations with Big Data?
Shaku Atre’s Ten Golden Rules of Big Data Systems
Audit objectives represent the high-level goals and anticipated
accomplishments of the review and address controls and risks
associated with the client's activity.
In the Context of Rule #1:
The Big Data System for BusinessAnalytics Rule
Questions for the client:
What are the different types of Big Data you are planning to use? Describe: Types of Big Data, Volume, “Noise” in data, Are you
deleting Serially Correlated Data, If yes – how? If not – why not?
V5: Variety: Have you classified it? How many different types of data
are you managing? Can you specify what they are? Various
Interactions: Do you know them, Velocity: At what speed is the data
“hitting “ you? Volume: What is the rate at which it is increasing?,
Value: What type and how much value are you getting out of it? Can
27
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
How do I audit Organizations with Big Data?
Shaku Atre’s Ten Golden Rules of Big Data Systems
In the Context of Rule #2:
The Big Data System Access Rule Questions for the client:
Do you have any cross reference list of which data is being accessed by which users?
Which customer data is accessed by which users in your organization?
Do you have any categories of user groups? And corresponding needs for data?
Does the client have any standards for use of mobile device? Have you implemented Bring Your Own Device (BYOD)? Any standards for that?
How user-friendly are your systems? (see slide for user-friendliness) How strong a presence do you have on the web? Do you conduct
business (selling) from your website? Do you use “cookies”?
28
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
What is User-Friendliness and User-Intuitiveness?
Quantifiable criteria or numbers:
1 Average or median time to learn procedures
Measured in clock time
2 Speed of task accomplishment
Measured in clock time
3 Acceptable rate of user errors
Set the rate and change for repeat users with the moving calendar – errors in first-time and repeat use
4 What percentage of users ask for user’s manual?
If it is more than 10% the system is NOT user intuitive!
5 User retention of commands and queries over a period of time
How long before users forget or start making same errors
6 Subjective satisfaction
Percentage of users who find system usable and come back for more
7 Help system: does it really resolve problems?
Percentage of times that users give up
29
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
How do I audit Organizations with Big Data?
Shaku Atre’s Ten Golden Rules of Big Data Systems
I have prepared a list with questions that you can
ask of your clients for Auditing based on the 10
Rules such as the previous 3 slides
Please send me an email to
requesting those slides and I will email those to
you.
Visit
http://www.atre.com/forms/survey1.html
Fill out a small form and you will have access to
many columns I have written out of 500+ columns
Visit
the blog
30
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
How do I audit Organizations with Big Data?
Shaku Atre’s Ten Golden Rules of Big Data Systems
Credit The Executive Guide to Information Security: Threats,
Challenges, and Solutions By Tim Mather
Credit: Cloud Security and Privacy: An Enterprise Perspective on Risks
and Compliance (Theory in Practice) By Tim Mather
Credit:
http://www.infosecisland.com/blogview/19643-Data-at-Rest-Dormant-But-Dangerous.html By Simon Heron
Credit: A White paper by Actuate: Requirements of an Enterprise Reporting Application Platform
Credit: Securing Big Data, Cloud Security Alliance Congress 2011, Tim Mather KPMG
Credit: http://wiki.apache.org/incubator/AccumuloProposal,
31
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan Slide 31
About Shaku Atre
President of Atre Group, Inc. Author of 6 books and close to thousand articles as
columnist for Information Management, Computerworld, DM Review , BIReview and other publications
Co-Author of “Business Intelligence Roadmap-The Complete Project Lifecycle for Decision Support Applications” ( Addison-Wesley)
Former partner with Price Waterhouse Coopers
Former faculty member at IBM’s prestigious Systems Research Institute
Keynote speaker and lecturer on business intelligence, data warehousing and databases throughout the world Reach Shaku at [email protected]
32
©Copyright 2003–Present Atre Group, Inc.
www.atre.com
Auditing Big Data : What do we need to know ? Milan
How do I audit Organizations with Big Data?
Shaku Atre’s Ten Golden Rules of Big Data Systems
Big Data Questions:
Following questions have been requested as a backup in case during the presentation there is not a proper possibility to allow for live questions 1) Success stories of Big Data seem to be very dependent on long
business experience on the Internet. What would you advise to new starters?
2) How much time/effort should be allowed for a new starter to obtain results from Big Data collection?
2) Should security protection of Big Data be extended to avoid that internal users violate ethic rules while collecting information from outside?
Best Regards