• No results found

Challenges and Solutions for Big Data in the Public Sector:

N/A
N/A
Protected

Academic year: 2021

Share "Challenges and Solutions for Big Data in the Public Sector:"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Challenges and Solutions for Big

Data in the Public Sector:

Digital Government Institute’s Annual Big Data Conference, October 9, Washington, DC Reagan Building

Dr. Brand Niemann

Director and Senior Data Scientist Semantic Community

http://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/

http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

(2)

Overview

• Related Presentations:

– COM.BigData Conference (Keynote and Panel), August 4-6, Washington, DC, and

– IEEE 2014 Big Data Conference (Paper and NIST Big Data Workshop), October 27-30, Washington, DC.

• Moderator:

– Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community, and Co-organizer, Federal Big Data Working Group Meetup

• Panelists:

– Dr. Tom Rindflesch, Information Research Specialist at Cognitive

Science Branch, National Institutes for Health (NIH): Semantic Medline (Ontology, Cray Graph Appliance, and Relational Databases)

– Dr. Kirk Borne, Professor of Astrophysics and Computational Science, George Mason University: NSF Big Data Project of the Decade: LSST

(3)

Fourth Paradigm and Fourth Question

• The Fourth Paradigm of Science (1):

– First Paradigm. Observation, descriptions of natural phenomena, and experimentation.

– Second Paradigm. Theoretical science such as Newton’s laws of motion and Maxwell’s equations.

– Third Paradigm. Simulation and modelling, such as in astronomy. – Fourth Paradigm. Data-intensive science that exploits the large

volumes of data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy.

• The Fourth Question of Big Data for Science (2):

– How was the data collected?

– Where is the data stored? – What are the data results? – Does the data story persuade?

(1) Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298.

(2) de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.

President Obama

Discovers Big Data in 2009

(4)

Mission Statement

• Federal: Supports the Federal Big Data Initiative, but not endorsed

by the Federal Government or its Agencies;

• Big Data: Supports the Federal Digital Government Strategy which

is "treating all content as data", so big data = all your content;

• Working Group: Data Science Teams composed of Federal

Government and Non-Federal Government experts producing big

data products (How was the data collected, Where is it stored,

What are the results, and Does the data story persuade?); and

• Meetup: The world's largest network of local groups to revitalize

local community and help people around the world self-organize

like MOOCs (Massive Open On-line Classes) being considered by

the White House to reduce the cost of higher education.

(5)

What Are We Doing?

• Leadership of the Semantic Data Science Team that produced Semantic Medline running on the Yarc Data Graph Appliance.

• Founding and co-organizing of the Federal Big Data Working Group Meetup.

• A graduate class prepared for GMU entitled “Practical Data Science for Data Scientists”.

• Using the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000) to build a Data Science Knowledge Base

• Mining of the Data Science and Digital Earth scientific journals for the CODATA International Workshop on Big Data for International Scientific Programmes, June 8-9, in Beijing.

• Participation in the Data FAIRport (Findable, Accessible, Interoperable, and Reusable) with “Data Publication in Data Browsers”.

• Providing data stories that persuade and presentation materials for public education conferences like the COM.BigData Conference, August 4-6, in Washington, DC.

(6)

NIH Data Commons

Dr. Phil Bourne (7/30/2014): Rules, Credit/Not Money, & More Offline

http://semanticommunity.info/Data_Science/Data_Science_for_RDA#Slide_50_The_Power_of_the_Commons

(7)

How Are We Doing It?

• Federating Uses Cases: Data Science (Brand Niemann);

Environmental and Earth Science (Joan Aron); and Astronomy

(Kirk Borne)

• Federating Data Publications: Structured Scientific Content

(Papers, journals, books, reports, etc.); Data FAIRports (Findable,

Accessible, Interoperable); and Reusable Data Stories That

Persuade (Claims and Evidence)

• Federating Solutions & Technologies: Hand-Crafted by

Individuals and Teams (Mary Galvin, STEM); Data Mining

Standards and Products (Brand Niemann, Data Publications in

Data Browsers); Machine Processing (Fredrik Salvesen, Semantic

Data Publications on Yarc Data Graph Appliance); Reading and

Reasoning (Katherine Goodier and Chuck Rehberg (Semantic

Insights on Elsevier Content Text Mining); and Data Curation at

Scale (Alan Wagner, Tamr on 1000s of Spreadsheets)

(8)

Data Science for JHU DIBBs Project:

Knowledge Bases

Data Science for JHU DIBBs Project SDSS.xlsx

Data Science Data Publication: Table of Contents is An Ontology!

Data Science Publication Index: Index is Linked Open Data!

(9)

Data Science for JHU DIBBs Project:

Analytics & Visualizations

Web Player

Spotfire Content, Network, and Data Analytics and Data Ecosystem: Spotfire is a Microscope and a Telescope!

(10)

Data Science for JHU DIBBs Project:

Conclusions

• Science is increasingly driven by data (big and small)

• New instruments: “microscopes” & “telescopes” for data

• A major challenge on the “long tail”

• A new, Fourth Paradigm of Science is emerging…

• SDSS has been at the cusp of this transition

• Now the SciServer is continuing the legacy Gray's Law of

Data Engineering:

– Scientific computing is revolving around data

– Need scale‐out solution for analysis

– Take the analysis to the data!

– Start with “20 queries”

– Go from “working to working”

http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup http://www.digitalgovernment.com/Events/Conferences/Government-Big-Data-Conference--Expo.shtml President Obama http://semanticommunity.info/Data_Science/Data_Science_for_RDA#Slide_50_The_Power_of_the_Commons Data Science for JHU DIBBs Project SDSS.xlsx Web Player

References

Related documents

The theoretical concerns that should be addressed so that the proposed inter-mated breeding program can be effectively used are as follows: (1) the minimum sam- ple size that

If you want to receive the paper you can reply to this e-mail writing  " Second z/OS Knights tournament" in the subject.. CMG-Italia

INJECTION MOLDING PARAMETER OPTIMIZATION USING THE TAGUCHI METHOD FOR HIGHEST GREEN STRENGTH FOR BIMODAL POWDER MIXTURE WITH SS316L.. IN PEG

Quantitative analysis performed in this synthesis paper showed that both the process and empirical classes, which are based on the RC, were connected to the broad proper- ties of

e , f The 18 F-fluorodeoxyglucosepositron emission tomography-computed tomography ( 18 F-FDG PET/CT) scan performed five months after starting cART showed intense accumulation

As per Bangasan (2006), Life-Cycle Cost Examination is a process for evaluating the total economic worth of a usable project segment by analyzing initial costs and discounted

World Health Organization and the European research Organization on Genital Infection and Neoplasia in the year 2000 mentioned that HPV testing showed

js[kk dqekjh ¼mez 10 o’kZ½ vU; fiNM+h tkfr ¼eYykg½ ls vkrh gSaA xkao esa mldk ifjokj vkfFkZd :Ik ls vU; ifjokjksa dh rqyuk esa csgrj gSA js[kk ds firk ds ikl pkj ch?kk tehu gS