• No results found

During the development of version 2 of the NBDIF, the Use Cases and Requirements Subgroup and the

Security and Privacy Subgroup identified the need for additional use cases to strengthen the future work

of the NBD-PWG. These two subgroups collaboratively created the Use Case Template 2 with the aim of

collecting specific and standardized information for each use case. In addition to questions from the

original use case template, the Use Case Template 2 contains questions that will provide a comprehensive

view of security, privacy, and other topics for each use case.

The NBD-PWG invites the public to submit new use cases through the Use Case Template 2. To submit a

use case, please fill out the PDF form

(https://bigdatawg.nist.gov/_uploadfiles/M0621_v2_7345181325.pdf) and email it to Wo Chang

([email protected]). Use cases will be accepted until the end of Phase 3 work and will be evaluated as

they are submitted.

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

Appendix A:

Use Case Study

Source Materials

Appendix A contains one blank use case template and the original completed use cases. The Use Case

Studies Template 1 included in this Appendix is no longer being used to collect use case information. To

submit a new use case, refer to Appendix E for the current Use Case Template 2.

These use cases were the source material for the use case summaries presented in Section 2 and the use

case requirements presented in Section 3 of this document. The completed use cases have not been edited

and contain the original text as submitted by the author(s). The use cases are as follows:

GOVERNMENT OPERATION>USE CASE 1:BIG DATA ARCHIVAL:CENSUS 2010 AND 2000 ... A-6

GOVERNMENT OPERATION>USE CASE 2:NARAACCESSION,SEARCH,RETRIEVE,PRESERVATION ... A-7

GOVERNMENT OPERATION>USE CASE 3:STATISTICAL SURVEY RESPONSE IMPROVEMENT ... A-9

GOVERNMENT OPERATION>USE CASE 4:NON TRADITIONAL DATA IN STATISTICAL SURVEY ... A-11

COMMERCIAL>USE CASE 5:CLOUD COMPUTING IN FINANCIAL INDUSTRIES ... A-13

COMMERCIAL>USE CASE 6:MENDELEY—AN INTERNATIONAL NETWORK OF RESEARCH ... A-22

COMMERCIAL>USE CASE 7:NETFLIX MOVIE SERVICE ... A-24

COMMERCIAL>USE CASE 8:WEB SEARCH ... A-26

COMMERCIAL>USE CASE 9:CLOUD-BASED CONTINUITY AND DISASTER RECOVERY ... A-28

COMMERCIAL>USE CASE 10:CARGO SHIPPING ... A-33

COMMERCIAL>USE CASE 11:MATERIALS DATA ... A-35

COMMERCIAL>USE CASE 12:SIMULATION DRIVEN MATERIALS GENOMICS ... A-37

DEFENSE>USE CASE 13:LARGE SCALE GEOSPATIAL ANALYSIS AND VISUALIZATION ... A-39

DEFENSE>USE CASE 14:OBJECT IDENTIFICATION AND TRACKING –PERSISTENT SURVEILLANCE ... A-41

DEFENSE>USE CASE 15:INTELLIGENCE DATA PROCESSING AND ANALYSIS ... A-43

HEALTHCARE AND LIFE SCIENCES>USE CASE 16:ELECTRONIC MEDICAL RECORD DATA ... A-46

HEALTHCARE AND LIFE SCIENCES>USE CASE 17:PATHOLOGY IMAGING/DIGITAL PATHOLOGY ... A-49

HEALTHCARE AND LIFE SCIENCES>USE CASE 18:COMPUTATIONAL BIOIMAGING ... A-51

HEALTHCARE AND LIFE SCIENCES>USE CASE 19:GENOMIC MEASUREMENTS ... A-53

HEALTHCARE AND LIFE SCIENCES>USE CASE 20:COMPARATIVE ANALYSIS FOR (META)GENOMES ... A-55

HEALTHCARE AND LIFE SCIENCES>USE CASE 21:INDIVIDUALIZED DIABETES MANAGEMENT ... A-58

HEALTHCARE AND LIFE SCIENCES>USE CASE 22:STATISTICAL RELATIONAL AI FOR HEALTH CARE ... A-60

HEALTHCARE AND LIFE SCIENCES>USE CASE 23:WORLD POPULATION SCALE EPIDEMIOLOGY ... A-62

HEALTHCARE AND LIFE SCIENCES>USE CASE 24:SOCIAL CONTAGION MODELING ... A-64

HEALTHCARE AND LIFE SCIENCES>USE CASE 25:LIFEWATCH BIODIVERSITY ... A-66

DEEP LEARNING AND SOCIAL MEDIA>USE CASE 26:LARGE-SCALE DEEP LEARNING ... A-69

DEEP LEARNING AND SOCIAL MEDIA>USE CASE 27:LARGE SCALE CONSUMER PHOTOS ORGANIZATION ... A-72

DEEP LEARNING AND SOCIAL MEDIA>USE CASE 28:TRUTHY TWITTER DATA ANALYSIS... A-74

DEEP LEARNING AND SOCIAL MEDIA>USE CASE 29:CROWD SOURCING IN THE HUMANITIES ... A-76

DEEP LEARNING AND SOCIAL MEDIA>USE CASE 30:CINETNETWORK SCIENCE CYBERINFRASTRUCTURE ... A-78

DEEP LEARNING AND SOCIAL MEDIA>USE CASE 31:NISTANALYTIC TECHNOLOGY MEASUREMENT AND EVALUATIONS ... A-81

THE ECOSYSTEM FOR RESEARCH>USE CASE 32:DATANET FEDERATION CONSORTIUM (DFC) ... A-84

THE ECOSYSTEM FOR RESEARCH>USE CASE 33:THE ‘DISCINNET PROCESS’ ... A-86

THE ECOSYSTEM FOR RESEARCH>USE CASE 34:GRAPH SEARCH ON SCIENTIFIC DATA ... A-88

THE ECOSYSTEM FOR RESEARCH>USE CASE 35:LIGHT SOURCE BEAMLINES ... A-91

ASTRONOMY AND PHYSICS>USE CASE 36:CATALINA DIGITAL SKY SURVEY FOR TRANSIENTS ... A-93

ASTRONOMY AND PHYSICS>USE CASE 37:COSMOLOGICAL SKY SURVEY AND SIMULATIONS ... A-96

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

ASTRONOMY AND PHYSICS>USE CASE 38:LARGE SURVEY DATA FOR COSMOLOGY ... A-98

ASTRONOMY AND PHYSICS>USE CASE 39:ANALYSIS OF LHC(LARGE HADRON COLLIDER)DATA ... A-100

ASTRONOMY AND PHYSICS>USE CASE 40:BELLE IIEXPERIMENT ... A-106

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 41:EISCAT3DINCOHERENT SCATTER RADAR SYSTEM ... A-108

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 42:COMMON ENVIRONMENTAL RESEARCH INFRASTRUCTURE ... A-111

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 43:RADAR DATA ANALYSIS FOR CRESIS... A-117

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 44:UAVSARDATA PROCESSING ... A-119

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 45:NASALARC/GSFC IRODSFEDERATION TESTBED ... A-121

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 46:MERRAANALYTIC SERVICES ... A-125

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 47:ATMOSPHERIC TURBULENCE—EVENT DISCOVERY... A-128

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 48:CLIMATE STUDIES USING THE COMMUNITY EARTH SYSTEM MODEL . A-130

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 49:SUBSURFACE BIOGEOCHEMISTRY ... A-132

EARTH,ENVIRONMENTAL AND POLAR SCIENCE>USE CASE 50:AMERIFLUX AND FLUXNET ... A-134

ENERGY>USE CASE 51:CONSUMPTION FORECASTING IN SMART GRIDS ... A-136

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

NBD-PWGUSE CASE STUDIES TEMPLATE 1

Use Case Title Vertical (area) Author/Company/Email Actors/ Stakeholders and their roles and responsibilities

Goals Use Case Description Current Solutions Compute(System) Storage Networking Software Big Data

Characteristics (distributed/centralized) Data Source Volume (size) Velocity (e.g. real time)

Variety (multiple datasets,

mashup) Variability (rate of change) Big Data Science

(collection, curation, analysis, action) Veracity (Robustness Issues, semantics) Visualization Data Quality (syntax) Data Types Data Analytics Big Data Specific

Challenges (Gaps) Big Data Specific Challenges in Mobility

Security and Privacy Requirements Highlight issues for generalizing this use case (e.g. for ref. architecture) More Information (URLs) Note: <additional comments>

Notes: No proprietary or confidential information should be included.

ADD picture of operation or data architecture of application below table.

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

Comments on fields

The following descriptions of fields in the template are provided to help with the understanding of both

document intention and meaning of the 26 fields and also to indicate ways that they can be improved.

Use Case Title: Title provided by the use case author

Vertical (area): Intended to categorize the use cases. However, an ontology was not created prior

to the use case submissions so this field was not used in the use case compilation.

Author/Company/Email: Name, company, and email (if provided) of the person(s) submitting

the use case.

Actors/ Stakeholders and their roles and responsibilities: Describes the players and their roles

in the use case.

Goals: Objectives of the use case.

Use Case Description: Brief description of the use case.

Current Solutions: Describes current approach to processing Big Data at the hardware and

software infrastructure level.

o

Compute (System): Computing component of the data analysis system.

o

Storage: Storage component of the data analysis system.

o

Networking: Networking component of the data analysis system.

o

Software: Software component of the data analysis system.

Big Data Characteristics: Describes the properties of the (raw) data including the four major

‘V’s’ of Big Data described in NIST Big Data Interoperability Framework: Volume 1, Big Data

Definition of this report series.

o

Data Source: The origin of data, which could be from instruments, Internet of Things, Web,

Surveys, Commercial activity, or from simulations. The source(s) can be distributed,

centralized, local, or remote.

o

Volume: The characteristic of data at rest that is most associated with Big Data. The size of

data varied drastically between use cases from terabytes to petabytes for science research

(100 petabytes was the largest science use case for LHC data analysis), or up to exabytes in a

commercial use case.

o

Velocity: Refers to the rate of flow at which the data is created, stored, analyzed, and

visualized. For example, big velocity means that a large quantity of data is being processed in

a short amount of time.

o

Variety: Refers to data from multiple repositories, domains, or types.

o

Variability: Refers to changes in rate and nature of data gathered by use case.

Big Data Science: Describes the high-level aspects of the data analysis process

o

Veracity: Refers to the completeness and accuracy of the data with respect to semantic

content. NIST Big Data Interoperability Framework: Volume 1, Big Data Definition

discusses veracity in more detail.

o

Visualization: Refers to the way data is viewed by an analyst making decisions based on the

data. Typically, visualization is the final stage of a technical data analysis pipeline and

follows the data analytics stage.

o

Data Quality: This refers to syntactical quality of data. In retrospect, this template field

could have been included in the Veracity field.

o

Data Types: Refers to the style of data such as structured, unstructured, images (e.g., pixels),

text (e.g., characters), gene sequences, and numerical.

o

Data Analytics: Defined in NIST Big Data Interoperability Framework: Volume 1, Big Data

Definition as “the synthesis of knowledge from information”. In the context of these use

cases, analytics refers broadly to tools and algorithms used in processing the data at any stage

including the data to information or knowledge to wisdom stages, as well as the information

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

Big Data Specific Challenges (Gaps): Allows for explanation of special difficulties for

processing Big Data in the use case and gaps where new approaches/technologies are used.

Big Data Specific Challenges in Mobility: Refers to issues in accessing or generating Big Data

from Smart Phones and tablets.

Security and Privacy Requirements: Allows for explanation of security and privacy issues or

needs related to this use case.

Highlight issues for generalizing this use case: Allows for documentation of issues that could

be common across multiple use-cases and could lead to reference architecture constraints.

More Information (URLs): Resources that provide more information on the use case.

Note: <additional comments>: Includes pictures of use-case in action but was not otherwise

used.

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

SUBMITTED USE CASE STUDIES

Government Operation> Use Case 1: Big Data Archival: Census 2010

and 2000

Use Case Title Big Data Archival: Census 2010 and 2000—Title 13 Big Data

Vertical (area) Digital Archives

Author/Company/Email Vivek Navale and Quyen Nguyen (NARA)

Actors/Stakeholders and their roles and responsibilities

NARA’s Archivists

Public users (after 75 years)

Goals Preserve data for a long term in order to provide access and perform analytics after 75 years. Title 13 of U.S. code authorizes the Census Bureau and guarantees that individual and industry specific data is protected.

Use Case Description Maintain data “as-is”. No access and no data analytics for 75 years. Preserve the data at the bit-level.

Perform curation, which includes format transformation if necessary. Provide access and analytics after nearly 75 years.

Current

Solutions Compute(System) Storage Linux servers NetApps, Magnetic tapes. Networking

Software Big Data

Characteristics (distributed/centralized) Data Source Centralized storage.

Volume (size) 380 Terabytes.

Velocity (e.g. real time) Static.

Variety (multiple datasets, mashup) Scanned documents Variability (rate of change) None

Big Data Science (collection, curation,

analysis, action)

Veracity (Robustness

Issues) Cannot tolerate data loss.

Visualization TBD

Data Quality Unknown.

Data Types Scanned documents

Data Analytics Only after 75 years. Big Data Specific

Challenges (Gaps) Preserve data for a long time scale. Big Data Specific

Challenges in Mobility TBD

Security and Privacy

Requirements Title 13 data.

Highlight issues for generalizing this use case (e.g. for ref. architecture) More Information (URLs)

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

Government Operation> Use Case 2: NARA Accession, Search,

Retrieve, Preservation

Use Case Title National Archives and Records Administration Accession NARA Accession, Search,

Retrieve, Preservation Vertical (area) Digital Archives

Author/Company/Email Quyen Nguyen and Vivek Navale (NARA)

Actors/Stakeholders and their roles and responsibilities

Agencies’ Records Managers NARA’s Records Accessioners NARA’s Archivists

Public users

Goals Accession, Search, Retrieval, and Long-Term Preservation of Big Data.

Use Case Description 1) Get physical and legal custody of the data. In the future, if data reside in the cloud, physical custody should avoid transferring Big Data from Cloud to Cloud or from Cloud to Data Center.

2) Pre-process data for virus scan, identifying file format identification, removing empty files

3) Index

4) Categorize records (sensitive, unsensitive, privacy data, etc.)

5) Transform old file formats to modern formats (e.g. WordPerfect to PDF)

6) E-discovery

7) Search and retrieve to respond to special request

8) Search and retrieve of public records by public users Current

Solutions Compute(System) Storage Linux servers NetApps, Hitachi, Magnetic tapes. Networking

Software Custom software, commercial search products,

commercial databases. Big Data

Characteristics (distributed/centralized) Data Source Distributed data sources from federal agencies. Current solution requires transfer of those data to a centralized storage.

In the future, those data sources may reside in different Cloud environments.

Volume (size) Hundreds of Terabytes, and growing.

Velocity

(e.g. real time) Input rate is relatively low compared to other use cases, but the trend is bursty. That is the data can arrive in batches of size ranging from GB to hundreds of TB. Variety

(multiple datasets, mashup)

Variety data types, unstructured and structured data: textual documents, emails, photos, scanned documents, multimedia, social networks, web sites, databases, etc. Variety of application domains, since records come from different agencies.

Data come from variety of repositories, some of which can be cloud-based in the future.

Variability (rate of

change) Rate can change especially if input sources are variable, some having audio, video more, some more text, and other images, etc.

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

Government Operation> Use Case 2: NARA Accession, Search,

Retrieve, Preservation

Use Case Title National Archives and Records Administration Accession NARA Accession, Search,

Retrieve, Preservation Big Data Science

(collection, curation, analysis,

action)

Veracity (Robustness

Issues) Search results should have high relevancy and high recall. Categorization of records should be highly accurate.

Visualization TBD

Data Quality Unknown.

Data Types Variety data types: textual documents, emails, photos,

scanned documents, multimedia, databases, etc. Data Analytics Crawl/index; search; ranking; predictive search.

Data categorization (sensitive, confidential, etc.) Personally Identifiable Information (PII) data detection and flagging.

Big Data Specific

Challenges (Gaps) Perform preprocessing and manage for long-term of large and varied data. Search huge amount of data. Ensure high relevancy and recall.

Data sources may be distributed in different clouds in future. Big Data Specific

Challenges in Mobility Mobile search must have similar interfaces/results Security and Privacy

Requirements Need to be sensitive to data access restrictions.

Highlight issues for generalizing this use case (e.g. for ref. architecture) More Information (URLs)

This

publication

is

available

free

of

charge

from:

https://doi.org/10.6028/NIST.SP.1500-3r1

Government Operation> Use Case 3: Statistical Survey Response

Improvement

Use Case Title Statistical Survey Response Improvement (Adaptive Design)

Vertical (area) Government Statistical Logistics

Author/Company/Email Cavan Capps: U.S. Census Bureau/[email protected]

Actors/Stakeholders and their roles and responsibilities

U.S. statistical agencies are charged to be the leading authoritative sources about the nation’s people and economy, while honoring privacy and rigorously protecting confidentiality. This is done by working with states, local governments and other government agencies.

Goals To use advanced methods, that are open and scientifically objective, the statistical agencies endeavor to improve the quality, the specificity and the timeliness of statistics provided while reducing operational costs and maintaining the confidentiality of those measured.

Use Case Description Survey costs are increasing as survey response declines. The goal of this work is to use advanced “recommendation system techniques” using data mashed up from several sources and historical survey para-data to drive operational processes in an

Related documents