During the development of version 2 of the NBDIF, the Use Cases and Requirements Subgroup and the
Security and Privacy Subgroup identified the need for additional use cases to strengthen the future work
of the NBD-PWG. These two subgroups collaboratively created the Use Case Template 2 with the aim of
collecting specific and standardized information for each use case. In addition to questions from the
original use case template, the Use Case Template 2 contains questions that will provide a comprehensive
view of security, privacy, and other topics for each use case.
The NBD-PWG invites the public to submit new use cases through the Use Case Template 2. To submit a
use case, please fill out the PDF form
(https://bigdatawg.nist.gov/_uploadfiles/M0621_v2_7345181325.pdf) and email it to Wo Chang
([email protected]). Use cases will be accepted until the end of Phase 3 work and will be evaluated as
they are submitted.
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
Appendix A:
Use Case Study
Source Materials
Appendix A contains one blank use case template and the original completed use cases. The Use Case
Studies Template 1 included in this Appendix is no longer being used to collect use case information. To
submit a new use case, refer to Appendix E for the current Use Case Template 2.
These use cases were the source material for the use case summaries presented in Section 2 and the use
case requirements presented in Section 3 of this document. The completed use cases have not been edited
and contain the original text as submitted by the author(s). The use cases are as follows:
GOVERNMENT OPERATION>USE CASE 1:BIG DATA ARCHIVAL:CENSUS 2010 AND 2000 ... A-6
GOVERNMENT OPERATION>USE CASE 2:NARAACCESSION,SEARCH,RETRIEVE,PRESERVATION ... A-7
GOVERNMENT OPERATION>USE CASE 3:STATISTICAL SURVEY RESPONSE IMPROVEMENT ... A-9
GOVERNMENT OPERATION>USE CASE 4:NON TRADITIONAL DATA IN STATISTICAL SURVEY ... A-11
COMMERCIAL>USE CASE 5:CLOUD COMPUTING IN FINANCIAL INDUSTRIES ... A-13
COMMERCIAL>USE CASE 6:MENDELEY—AN INTERNATIONAL NETWORK OF RESEARCH ... A-22
COMMERCIAL>USE CASE 7:NETFLIX MOVIE SERVICE ... A-24
COMMERCIAL>USE CASE 8:WEB SEARCH ... A-26
COMMERCIAL>USE CASE 9:CLOUD-BASED CONTINUITY AND DISASTER RECOVERY ... A-28
COMMERCIAL>USE CASE 10:CARGO SHIPPING ... A-33
COMMERCIAL>USE CASE 11:MATERIALS DATA ... A-35
COMMERCIAL>USE CASE 12:SIMULATION DRIVEN MATERIALS GENOMICS ... A-37
DEFENSE>USE CASE 13:LARGE SCALE GEOSPATIAL ANALYSIS AND VISUALIZATION ... A-39
DEFENSE>USE CASE 14:OBJECT IDENTIFICATION AND TRACKING –PERSISTENT SURVEILLANCE ... A-41
DEFENSE>USE CASE 15:INTELLIGENCE DATA PROCESSING AND ANALYSIS ... A-43
HEALTHCARE AND LIFE SCIENCES>USE CASE 16:ELECTRONIC MEDICAL RECORD DATA ... A-46
HEALTHCARE AND LIFE SCIENCES>USE CASE 17:PATHOLOGY IMAGING/DIGITAL PATHOLOGY ... A-49
HEALTHCARE AND LIFE SCIENCES>USE CASE 18:COMPUTATIONAL BIOIMAGING ... A-51
HEALTHCARE AND LIFE SCIENCES>USE CASE 19:GENOMIC MEASUREMENTS ... A-53
HEALTHCARE AND LIFE SCIENCES>USE CASE 20:COMPARATIVE ANALYSIS FOR (META)GENOMES ... A-55
HEALTHCARE AND LIFE SCIENCES>USE CASE 21:INDIVIDUALIZED DIABETES MANAGEMENT ... A-58
HEALTHCARE AND LIFE SCIENCES>USE CASE 22:STATISTICAL RELATIONAL AI FOR HEALTH CARE ... A-60
HEALTHCARE AND LIFE SCIENCES>USE CASE 23:WORLD POPULATION SCALE EPIDEMIOLOGY ... A-62
HEALTHCARE AND LIFE SCIENCES>USE CASE 24:SOCIAL CONTAGION MODELING ... A-64
HEALTHCARE AND LIFE SCIENCES>USE CASE 25:LIFEWATCH BIODIVERSITY ... A-66
DEEP LEARNING AND SOCIAL MEDIA>USE CASE 26:LARGE-SCALE DEEP LEARNING ... A-69
DEEP LEARNING AND SOCIAL MEDIA>USE CASE 27:LARGE SCALE CONSUMER PHOTOS ORGANIZATION ... A-72
DEEP LEARNING AND SOCIAL MEDIA>USE CASE 28:TRUTHY TWITTER DATA ANALYSIS... A-74
DEEP LEARNING AND SOCIAL MEDIA>USE CASE 29:CROWD SOURCING IN THE HUMANITIES ... A-76
DEEP LEARNING AND SOCIAL MEDIA>USE CASE 30:CINETNETWORK SCIENCE CYBERINFRASTRUCTURE ... A-78
DEEP LEARNING AND SOCIAL MEDIA>USE CASE 31:NISTANALYTIC TECHNOLOGY MEASUREMENT AND EVALUATIONS ... A-81
THE ECOSYSTEM FOR RESEARCH>USE CASE 32:DATANET FEDERATION CONSORTIUM (DFC) ... A-84
THE ECOSYSTEM FOR RESEARCH>USE CASE 33:THE ‘DISCINNET PROCESS’ ... A-86
THE ECOSYSTEM FOR RESEARCH>USE CASE 34:GRAPH SEARCH ON SCIENTIFIC DATA ... A-88
THE ECOSYSTEM FOR RESEARCH>USE CASE 35:LIGHT SOURCE BEAMLINES ... A-91
ASTRONOMY AND PHYSICS>USE CASE 36:CATALINA DIGITAL SKY SURVEY FOR TRANSIENTS ... A-93
ASTRONOMY AND PHYSICS>USE CASE 37:COSMOLOGICAL SKY SURVEY AND SIMULATIONS ... A-96
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
ASTRONOMY AND PHYSICS>USE CASE 38:LARGE SURVEY DATA FOR COSMOLOGY ... A-98
ASTRONOMY AND PHYSICS>USE CASE 39:ANALYSIS OF LHC(LARGE HADRON COLLIDER)DATA ... A-100
ASTRONOMY AND PHYSICS>USE CASE 40:BELLE IIEXPERIMENT ... A-106
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 41:EISCAT3DINCOHERENT SCATTER RADAR SYSTEM ... A-108
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 42:COMMON ENVIRONMENTAL RESEARCH INFRASTRUCTURE ... A-111
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 43:RADAR DATA ANALYSIS FOR CRESIS... A-117
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 44:UAVSARDATA PROCESSING ... A-119
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 45:NASALARC/GSFC IRODSFEDERATION TESTBED ... A-121
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 46:MERRAANALYTIC SERVICES ... A-125
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 47:ATMOSPHERIC TURBULENCE—EVENT DISCOVERY... A-128
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 48:CLIMATE STUDIES USING THE COMMUNITY EARTH SYSTEM MODEL . A-130
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 49:SUBSURFACE BIOGEOCHEMISTRY ... A-132
EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 50:AMERIFLUX AND FLUXNET ... A-134
ENERGY>USE CASE 51:CONSUMPTION FORECASTING IN SMART GRIDS ... A-136
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
NBD-PWGUSE CASE STUDIES TEMPLATE 1
Use Case Title Vertical (area) Author/Company/Email Actors/ Stakeholders and their roles and responsibilities
Goals Use Case Description Current Solutions Compute(System) Storage Networking Software Big Data
Characteristics (distributed/centralized) Data Source Volume (size) Velocity (e.g. real time)
Variety (multiple datasets,
mashup) Variability (rate of change) Big Data Science
(collection, curation, analysis, action) Veracity (Robustness Issues, semantics) Visualization Data Quality (syntax) Data Types Data Analytics Big Data Specific
Challenges (Gaps) Big Data Specific Challenges in Mobility
Security and Privacy Requirements Highlight issues for generalizing this use case (e.g. for ref. architecture) More Information (URLs) Note: <additional comments>
Notes: No proprietary or confidential information should be included.
ADD picture of operation or data architecture of application below table.
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
Comments on fields
The following descriptions of fields in the template are provided to help with the understanding of both
document intention and meaning of the 26 fields and also to indicate ways that they can be improved.
•
Use Case Title: Title provided by the use case author
•
Vertical (area): Intended to categorize the use cases. However, an ontology was not created prior
to the use case submissions so this field was not used in the use case compilation.
•
Author/Company/Email: Name, company, and email (if provided) of the person(s) submitting
the use case.
•
Actors/ Stakeholders and their roles and responsibilities: Describes the players and their roles
in the use case.
•
Goals: Objectives of the use case.
•
Use Case Description: Brief description of the use case.
•
Current Solutions: Describes current approach to processing Big Data at the hardware and
software infrastructure level.
o
Compute (System): Computing component of the data analysis system.
o
Storage: Storage component of the data analysis system.
o
Networking: Networking component of the data analysis system.
o
Software: Software component of the data analysis system.
•
Big Data Characteristics: Describes the properties of the (raw) data including the four major
‘V’s’ of Big Data described in NIST Big Data Interoperability Framework: Volume 1, Big Data
Definition of this report series.
o
Data Source: The origin of data, which could be from instruments, Internet of Things, Web,
Surveys, Commercial activity, or from simulations. The source(s) can be distributed,
centralized, local, or remote.
o
Volume: The characteristic of data at rest that is most associated with Big Data. The size of
data varied drastically between use cases from terabytes to petabytes for science research
(100 petabytes was the largest science use case for LHC data analysis), or up to exabytes in a
commercial use case.
o
Velocity: Refers to the rate of flow at which the data is created, stored, analyzed, and
visualized. For example, big velocity means that a large quantity of data is being processed in
a short amount of time.
o
Variety: Refers to data from multiple repositories, domains, or types.
o
Variability: Refers to changes in rate and nature of data gathered by use case.
•
Big Data Science: Describes the high-level aspects of the data analysis process
o
Veracity: Refers to the completeness and accuracy of the data with respect to semantic
content. NIST Big Data Interoperability Framework: Volume 1, Big Data Definition
discusses veracity in more detail.
o
Visualization: Refers to the way data is viewed by an analyst making decisions based on the
data. Typically, visualization is the final stage of a technical data analysis pipeline and
follows the data analytics stage.
o
Data Quality: This refers to syntactical quality of data. In retrospect, this template field
could have been included in the Veracity field.
o
Data Types: Refers to the style of data such as structured, unstructured, images (e.g., pixels),
text (e.g., characters), gene sequences, and numerical.
o
Data Analytics: Defined in NIST Big Data Interoperability Framework: Volume 1, Big Data
Definition as “the synthesis of knowledge from information”. In the context of these use
cases, analytics refers broadly to tools and algorithms used in processing the data at any stage
including the data to information or knowledge to wisdom stages, as well as the information
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
•
Big Data Specific Challenges (Gaps): Allows for explanation of special difficulties for
processing Big Data in the use case and gaps where new approaches/technologies are used.
•
Big Data Specific Challenges in Mobility: Refers to issues in accessing or generating Big Data
from Smart Phones and tablets.
•
Security and Privacy Requirements: Allows for explanation of security and privacy issues or
needs related to this use case.
•
Highlight issues for generalizing this use case: Allows for documentation of issues that could
be common across multiple use-cases and could lead to reference architecture constraints.
•
More Information (URLs): Resources that provide more information on the use case.
•
Note: <additional comments>: Includes pictures of use-case in action but was not otherwise
used.
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
SUBMITTED USE CASE STUDIES
Government Operation> Use Case 1: Big Data Archival: Census 2010
and 2000
Use Case Title Big Data Archival: Census 2010 and 2000—Title 13 Big Data
Vertical (area) Digital Archives
Author/Company/Email Vivek Navale and Quyen Nguyen (NARA)
Actors/Stakeholders and their roles and responsibilities
NARA’s Archivists
Public users (after 75 years)
Goals Preserve data for a long term in order to provide access and perform analytics after 75 years. Title 13 of U.S. code authorizes the Census Bureau and guarantees that individual and industry specific data is protected.
Use Case Description Maintain data “as-is”. No access and no data analytics for 75 years. Preserve the data at the bit-level.
Perform curation, which includes format transformation if necessary. Provide access and analytics after nearly 75 years.
Current
Solutions Compute(System) Storage Linux servers NetApps, Magnetic tapes. Networking
Software Big Data
Characteristics (distributed/centralized) Data Source Centralized storage.
Volume (size) 380 Terabytes.
Velocity (e.g. real time) Static.
Variety (multiple datasets, mashup) Scanned documents Variability (rate of change) None
Big Data Science (collection, curation,
analysis, action)
Veracity (Robustness
Issues) Cannot tolerate data loss.
Visualization TBD
Data Quality Unknown.
Data Types Scanned documents
Data Analytics Only after 75 years. Big Data Specific
Challenges (Gaps) Preserve data for a long time scale. Big Data Specific
Challenges in Mobility TBD
Security and Privacy
Requirements Title 13 data.
Highlight issues for generalizing this use case (e.g. for ref. architecture) More Information (URLs)
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
Government Operation> Use Case 2: NARA Accession, Search,
Retrieve, Preservation
Use Case Title National Archives and Records Administration Accession NARA Accession, Search,
Retrieve, Preservation Vertical (area) Digital Archives
Author/Company/Email Quyen Nguyen and Vivek Navale (NARA)
Actors/Stakeholders and their roles and responsibilities
Agencies’ Records Managers NARA’s Records Accessioners NARA’s Archivists
Public users
Goals Accession, Search, Retrieval, and Long-Term Preservation of Big Data.
Use Case Description 1) Get physical and legal custody of the data. In the future, if data reside in the cloud, physical custody should avoid transferring Big Data from Cloud to Cloud or from Cloud to Data Center.
2) Pre-process data for virus scan, identifying file format identification, removing empty files
3) Index
4) Categorize records (sensitive, unsensitive, privacy data, etc.)
5) Transform old file formats to modern formats (e.g. WordPerfect to PDF)
6) E-discovery
7) Search and retrieve to respond to special request
8) Search and retrieve of public records by public users Current
Solutions Compute(System) Storage Linux servers NetApps, Hitachi, Magnetic tapes. Networking
Software Custom software, commercial search products,
commercial databases. Big Data
Characteristics (distributed/centralized) Data Source Distributed data sources from federal agencies. Current solution requires transfer of those data to a centralized storage.
In the future, those data sources may reside in different Cloud environments.
Volume (size) Hundreds of Terabytes, and growing.
Velocity
(e.g. real time) Input rate is relatively low compared to other use cases, but the trend is bursty. That is the data can arrive in batches of size ranging from GB to hundreds of TB. Variety
(multiple datasets, mashup)
Variety data types, unstructured and structured data: textual documents, emails, photos, scanned documents, multimedia, social networks, web sites, databases, etc. Variety of application domains, since records come from different agencies.
Data come from variety of repositories, some of which can be cloud-based in the future.
Variability (rate of
change) Rate can change especially if input sources are variable, some having audio, video more, some more text, and other images, etc.
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
Government Operation> Use Case 2: NARA Accession, Search,
Retrieve, Preservation
Use Case Title National Archives and Records Administration Accession NARA Accession, Search,
Retrieve, Preservation Big Data Science
(collection, curation, analysis,
action)
Veracity (Robustness
Issues) Search results should have high relevancy and high recall. Categorization of records should be highly accurate.
Visualization TBD
Data Quality Unknown.
Data Types Variety data types: textual documents, emails, photos,
scanned documents, multimedia, databases, etc. Data Analytics Crawl/index; search; ranking; predictive search.
Data categorization (sensitive, confidential, etc.) Personally Identifiable Information (PII) data detection and flagging.
Big Data Specific
Challenges (Gaps) Perform preprocessing and manage for long-term of large and varied data. Search huge amount of data. Ensure high relevancy and recall.
Data sources may be distributed in different clouds in future. Big Data Specific
Challenges in Mobility Mobile search must have similar interfaces/results Security and Privacy
Requirements Need to be sensitive to data access restrictions.
Highlight issues for generalizing this use case (e.g. for ref. architecture) More Information (URLs)
This
publication
is
available
free
of
charge
from:
https://doi.org/10.6028/NIST.SP.1500-3r1
Government Operation> Use Case 3: Statistical Survey Response
Improvement
Use Case Title Statistical Survey Response Improvement (Adaptive Design)
Vertical (area) Government Statistical Logistics
Author/Company/Email Cavan Capps: U.S. Census Bureau/[email protected]
Actors/Stakeholders and their roles and responsibilities
U.S. statistical agencies are charged to be the leading authoritative sources about the nation’s people and economy, while honoring privacy and rigorously protecting confidentiality. This is done by working with states, local governments and other government agencies.
Goals To use advanced methods, that are open and scientifically objective, the statistical agencies endeavor to improve the quality, the specificity and the timeliness of statistics provided while reducing operational costs and maintaining the confidentiality of those measured.
Use Case Description Survey costs are increasing as survey response declines. The goal of this work is to use advanced “recommendation system techniques” using data mashed up from several sources and historical survey para-data to drive operational processes in an