Big Data and the Internet of Things
Big Data and the
Internet of Things
• Overview of NIST • Internet of Things
• Big Data
NIST Bird’s eye view
Courtesy HDR Architecture, Inc./Steve Hall © Hedrich Blessing
The National Institute of Standards and Technology (NIST) is where Nobel Prize-winning
science meets real-world engineering.
With an extremely broad research
portfolio, world-class facilities, national networks, and an international reach, NIST works to support industry
• Major assets
– ~ 3,000 employees
– ~ 2,800 associates and facilities users
– ~ 1,300 field staff in partner organizations
– Two main locations:
Gaithersburg, Md., and Boulder, Colo.
–Nobel Prize Winners: 1997, 2001,
2005, 2007, 2013
NIST: Basic Stats and Facts
©
R.
Rat
h
Working with NIST on
Cyber Physical Systems
• Overview of NIST
• Internet of Things • Big Data
If we had computers that knew
everything there was to know about
things—using data they gathered
without any help from us—we would
be able to track and count everything,
and greatly reduce waste, loss and
cost.
—Kevin Ashton, That 'Internet of Things' Thing, RFID Journal, July 22, 2009
Internet of Things
What are the defining characteristics of
the “Internet of Things?”
•
Scale
•
Capability
Internet of Things - Scale
Devices connected to the Web: • 1970 = 13 • 1980 = 188 • 1990 = 313,000 • 2000 = 93,000,000 • 2010 = 5,000,000,000 • 2020 = 31,000,000,000 Source: Intel
Internet of Things - Capability
Intel Edison:
"It's a full Pentium-class PC in the
form factor of an SD card,"
Internet of Things – Reach
Physical
Internet
Physical
Internet of Things
Physical
Working with NIST on
Cyber Physical Systems
• Overview of NIST
• Internet of Things
• Big Data
IBM Smarter Planet
“For five years, IBMers have been working with companies, cities, and communities around the world to build a Smarter
Planet. We’ve seen enormous
Source:
http://www.ibm.com/smarterplanet/global/files/us__en_us__overview__win_in_the_ era_of_smart_op_ad_03_2013.pdf
advances, as leaders have begun using the vast supply of Big Data to transform their enterprises and institutions …
GE Industrial Internet
New GE technology merges big iron
with big data to create brilliant machines. This convergence of machine and intelligent data is Source: http://www.ge.com/stories/industrial-internet
known as the Industrial Internet, and it's changing the way we work.
Big Data - Volume
Source: IDC Corporation, http://idcdocserv.com/1678, sponsored by EMC
•
From 2013 to 2020, the digital universe will grow by a factor of 10 – from 4.4 trillion gigabytes to 44 trillion. It more than doubles every two years.•
In 2014, the digital universe will equal 1.7megabytes a minute for every person on Earth.
•
Data from embedded systems will grow from 2%of the digital universe in 2013 to 10% in 2020.
•
In 2013, the available storage capacity could hold just 33% of the digital universe. By 2020, it will be able to store less than 15%.Source: John Gantz, IDC Corporation, The Expanding Digital Universe 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 2005 2006 2007 2008 2009 2010
Big Data - Volume
Information Available Storage Pe tabytes W orld wi de
Big Data - Velocity
• Sloan Digital Sky Survey
• 140 Terabytes, year 2000 to present
• LSST – Large Synoptic Survey Telescope
• Expect 140 Terabytes every 5 days
• Square Kilometer Array
• Expect 140 Terabytes every 3 sec
LSST:
“Suspended between its vast mirrors will be a three billion-pixel sensor array, which on a clear winter night will
produce 30 terabytes of data. In less than a week this remarkable telescope will map the whole night sky …. And then the next week it will do the same again … building up a database of
billions of objects and millions of billions of bytes.”
Big Data - Variety
Big Data – Potential Value
“Using detailed survey data on the business
practices and information technology investments of 179 large publicly traded firms, we find that
firms that adopt DDD [data driven decision
making] have output and productivity that is 5-6% higher than what would be expected given their
other investments and information technology usage.”
Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen,
Strength in Numbers: How Does Data-Driven Decision making Affect Firm Performance? (April 22, 2011).
Big Data - Limitations
Good Data Won't Guarantee Good Decisions
Shvetank Shah, Andrew Horne, and Jaime Capellá Harvard Business Review April 2012
“At this very moment, there’s an odds-on chance that someone in your organization is making a
poor decision on the basis of information that was enormously expensive to collect.”
• Analytical skills are concentrated in too few employees
• IT needs to spend more time on the “I” and less on the “T”
Working with NIST on
Cyber Physical Systems
• Overview of NIST
• Internet of Things
• Big Data
Big Data - Questions
• What are the attributes that define Big Data solutions?
• How is Big Data different from traditional data environments and related applications?
• What are the essential characteristics of Big Data environments?
• How do these environments integrate with currently deployed architectures?
• What are the central scientific, technological, and standardization challenges needed to accelerate the deployment of robust Big Data solutions?
NIST Big Data Public
Working Group &
Standardization
Activities
Wo Chang, NIST
Robert Marcus, ET-Strategies
Chaitanya Baru, UC San Diego
Big Data PWG - Charter
The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing consensus definitions, taxonomies, secure reference architectures, and a
technology roadmap. The aim is to create vendor- and technology-neutral and infrastructure-agnostic deliverables to enable big data stakeholders to select the best analytics tools for their processing and
visualization requirements while enabling value-add from big data service providers and data flow among stakeholders in a cohesive and secure manner.
Big Data PWG - Deliverables
1. Definitions 2. Taxonomies
3. Requirements & Use Cases
4. Security & Privacy Requirements 5. Architectures Survey
6. Reference Architecture
7. Security & Privacy Architecture 8. Technology Roadmap
SUBGROUPS
NBD-PWG Requirements and Use Cases Definitions & Taxonomies Security and Privacy Reference Architecture Technology Roadmap 29Requirements
& Use Cases
Geoffrey Fox, U. Indiana Joe Paiva, VA Tsegereda Beyene, Cisco Scope (M0020)
The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains.
Tasks
• Gather input from all stakeholders regarding Big Data requirements.
• Analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment
• Develop a comprehensive list of Big Data requirements
Requirements
and Use Case Subgroup
1. Government Operations (4): National Archives & Records Administration, Census Bureau
2. Commercial (8): Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (e.g. UPS)
3. Defense (3): Sensors, Image Surveillance, Situation Assessment
4. Healthcare & Life Sciences (10): Medical Records, Graph & Probabilistic Analysis, Pathology, Bio-imaging, Genomics, Epidemiology, People Activity Models, Biodiversity
5. Deep Learning & Social Media (6): Driving Car, Geolocate Images, Twitter, Crowd Sourcing, Network Science, NIST Benchmark Datasets
6. Astronomy & Physics (5): Sky Surveys, Large Hadron Collider at CERN, Belle Accelerator II (Japan)
7. Earth, Environmental & Polar Science (10): Ice Sheet Scattering, Earthquake, Ocean, Earth Radar Mapping, Climate Simulation, Atmospheric Turbulence,
Subsurface Biogeochemistry, AmeriFlux &FLUXNET gas sensors
8. Energy (10): Smart Grid
31
Definitions &
Taxonomies
Nancy Grady, SAIC Natasha Balac, SDSC Eugene Luster, R2AD
Scope (M0018)
It is important to develop a consensus-based
common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibilities…
Tasks
• For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain
applications, and diversified operational environments.
• For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and
subcomponents based on their similarities and differences
• Develop Big Data Definitions and taxonomies
Big Data consists of extensive datasets, primarily in the
characteristics of volume, velocity and/or variety, that require a scalable architecture for efficient storage, manipulation, and analysis.
Data Scientist is a practitioner who
has sufficient knowledge of the
overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific method process through each stage in the Big Data lifecycle.
Reference
Architecture
Orit Levin, Microsoft James Ketner, AT&T Don Krapohl,
Augmented Intelligence
Scope (M0021)
The goal is to enable Big Data stakeholders to pick-and-choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing
added value from Big Data service providers
and the flow of data between the stakeholders in a cohesive and secure manner.
Tasks
• Gather and study available Big Data
architectures representing various stakeholders, different data types,’ use cases, and document the architectures using the Big Data taxonomies model based upon the identified actors with their roles and responsibilities.
• Ensure that the developed Big Data reference architecture and the Security and Privacy Reference Architecture correspond and complement each other.
Reference Architecture Subgroup
Key documents:
M0151 – White Paper M0123 – Working Draft
35
M0039 | Data Processing Flow M0017 | Data Transformation
Flow
Security &
Privacy
Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE Scope (M0019)The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure
reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what
standards are available or under development, as well as identifies which key organizations are working on these standards.
Tasks
• Gather input from all stakeholders regarding security and privacy concerns in Big Data processing, storage, and services.
• Analyze/prioritize a list of challenging security and privacy requirements that may delay or prevent adoption of Big Data deployment
• Develop a Security and Privacy Reference Architecture that supplements the general Big Data Reference Architecture
Security and Privacy Subgroup
37 Requirements Scope • Infrastructure Security • Data Privacy • Data Management • Integrity & Reactive Security Requirements – Use Cases Studied • Retail (consumer) • Healthcare • Media • Government • Marketing Architecture & Taxonomies • Privacy • Provenance • System HealthTechnology
Roadmap
Carl Buffington, USDA/Vistronix Dan McClary, OracleDavid Boyd, Data Tactic
Scope (M0022)
The goal is to develop a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD
subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations.
Tasks
• Gather input from NBD subgroups and study the taxonomies for the actors’ roles and responsibility, use cases and
requirements, and secure reference architecture.
• Gain understanding of what standards are available or under development for Big Data
• Perform a thorough gap analysis and document the findings
• Identify what possible barriers may delay or prevent adoption of Big Data
• Document vision and recommendations
Technology Roadmap Subgroup
Key document: M0087 – Working Draft
• Inputs from other subgroups
• Potential Standards Group with
Big Data-related activities (M0035)
• Capabilities & Technology Readiness
• Decision Framework
• Mapping & Gap Analysis
• Big Data Strategies
39 Definitions & Taxonomies Requiremen ts & Use Cases Security & Privacy Reference Architecture
Subgroups Working Draft Outline
Contact: [email protected]
Website: http://bigdatawg.nist.gov
Join NBD-PWG: http://bigdatawg.nist.gov/newuser.php
Documents: http://bigdatawg.nist.gov/show_InputDoc.php
Working Drafts (under editing) Big Data Definitions & Taxonomies (M0142) Big Data Requirements (M0245)
Big Data Security & Privacy Requirements (M0110) Big Data Architectures White Paper Survey (M0151) Big Data Reference Architectures (M0226)
Big Data Security & Privacy Reference Architecture (M0110) Big Data Technology Roadmap (M0087)
NIST Big Data
Workshop Slides: http://bigdatawg.nist.gov/workshop.php
Build an integrated Cyber-Physical Systems
Framework
that allows interconnection of test
beds and interoperation through shared data
and associated data analytics for easy
integration and accelerated adoption of CPS
applications.
The “Arpanet” for CPS Innovation
The SmartAmerica Challenge
• Industry
– GE, IBM, Qualcomm, Intel, Schneider Electric, Philips,
AT&T, UTRC, Boeing…
• Research/Educational Institutions
– MIT, Harvard, UC Berkeley, Vanderbilt, U Penn, UCLA,
Internet2, US Ignite, Massachusetts General Hospital…
• Government
– NIST, NSF, DoT, DoD, DHS, Montgomery County…