• No results found

Data Science Initiative

N/A
N/A
Protected

Academic year: 2021

Share "Data Science Initiative"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Terri L. Lomax Vice Chancellor

Research, Innovation + Economic Development

Data Science

Initiative

Joint Research Committees Meeting October 27, 2014

(2)

Data Science

is a Big Deal

Managing, processing and exploiting data to improve decision making will continue to grow in importance

(3)
(4)

Key Insights: after McKinsey (May 2011)

1.  Data have swept into every industry and business function and are now an

important factor of production.

2.  Big Data creates value but ONLY if appropriate and advanced Data

Science is used to deal with Big Data.

3.  Use of Big Data and Data Science will become a key basis of

competition, growth and pro-active risk management for individual firms.

4.  Big Data underpins new waves of productivity growth and consumer

surplus.

5.  Big Data will matter across sectors, but some sectors are poised for

greater gains.

6.  There is already a shortage of talent necessary for organizations to take

advantage of Big Data.

From McKinsey (May 2011):

Demand for deep analytical positions could exceed the supply by 140,000 to 190,000 positions in

2018.

The need for additional managers and analysts in the US who can ask the right questions and

consume the results of data analysis is estimated at 1.5 million.

In the short-term, retraining existing talent will be required to meet demand.

(5)

Big data can generate significant financial value across sectors

(6)

Source: Data Science Research Center, Amsterdam (http://dsrc.nl/ what-is-data-science/)

Data Science is

multidisciplinary

Security  

Privacy  

Provenance  

Understand   and  decide   Percep3on   Cogni3on   Business   Analy3cs   Visual   Analy3cs   Decision   Theory   Store  and   Process   Analyze   &  Model   Large  Scale   Databases   SoBware   Engineering   System/   Network   Engineering   Distributed   Processing   Reasoning   Knowledge   Represent’n   Mul3media   Retrieval   Modeling  &   Simula3on   Machine   Learning   Informa3on   Retrieval  

(7)

Research Triangle - Data Science Powerhouse

(8)

Data4Decisions’  unique  concept  is  guided  by  the  event’s  Advisory  Council,  a  powerful  compendium  of  the  region’s   leading  research  universi3es,  private  companies  and  thought-­‐leading  Research  Triangle-­‐based  associa3ons:  

(9)

NC State Data Science-Related

Centers + Institutes New:  NSF  I/UCRC  on  End-­‐to-­‐End  

Enablement     of  Data  

(10)

Laboratory for Analytic Sciences (LAS)

NSA’s goal is to build an advanced data innovation hub in the Research Triangle with LAS as the anchor tenant

(11)

CHANCELLOR’S FACULTY EXCELLENCE CLUSTERS

(12)

•  Institute for Advanced Analytics – PSM, primarily SAS tools

•  COE/PCOM executive education for based on

open-source tools

•  CSC graduate track in Data Science

•  CSC/Stat/Math undergraduate concentration in Data Science

•  PCOM executive education based on IBM tools

•  UNC GA Research Opportunities Initiative – Data Science

Institutionalize NC State’s Data Science Initiative (together with UNC Charlotte and RENCI)

(13)

Data Science Infrastructure at NC State

•  Portions of VCL-HPC facilities (large memory

machines) – run Linux-based, user-provided analytics

•  CSC MRC VCL-BigData testbed (x86 and IBM Power7

and Power8 computers with lots of memory, tightly coupled storage, and advanced accelerators) – run IBM Analytics

•  NCBP-VCL cluster – lots of memory and disk space –

runs SAS analytics

•  IAA VCL facilities – primarily runs SAS analytics

•  OSCAR lab - Extensive BiGData and Data Science

computational and data storage facilities, including a BlueGene/P supercomputer, also LAS “low lab”.

(14)

NC State Data Science Initiative

Goals

•  Raise visibility & increase reputation

•  Coordinate data science activities, including education

•  Increase research funding

•  Build industry partnerships

•  Establish interdisciplinary undergraduate curriculum

•  Provide services & infrastructure to faculty

Organizational Structure

•  Director / Assistant

•  Coordinating Council

•  Steering Committee

(15)

Data Science Initiative – Coordinating Council

(formative stage)

COE

Mladen Vouk – CSC – Director Dan Stancil – ECE

Jerry Bernholc – CHIPS Michael Young – DGRC James Lester – CEI

Paul Turinsky – CASL Jacob Jones – AIF Dennis Kekas – ITng Yousry Azmy – CNEC

Rada Chirkova – new I/UCRC (STEED Lab, CHMPR)

COS

Montse Fuentes – Stat Marie Davidian – CQSB

Alyson Wilson – Cluster, LAS John Blondin – Phys

Loek Helminck – Math Tom Banks – CRSC

Fred Wright – Bioinformatics

CED

Glenn Kleiman – WIFIEI

CNR

Ross Mietenmeyer – GSA

PCOM

Mike Kowolenko – CIMS

CHASS

Carolyn Miller – Dig. Humanities

Provost

Michael Rappa – IAA

ORIED

(16)

Summary

Managing and extracting information from complex

data sets continues to grow in importance in most all sectors of the US economy

The Research Triangle has significant programs in data science that will be leveraged for future growth

Importance of data science recognized by all levels and types of industry, government and academia

Working together at NC State, we can:

capitalize on multidisciplinary opportunities,

build significant programs, and

educate the skilled workforce to maintain our

(17)

Terri L. Lomax

research.ncsu.edu

Data to

(18)

Gap Analysis for Data Science Cluster Proposal Physical  models   Social  models   Symbolic  models   Numerical  solvers   HPC,  HPD,  OS   So=ware  architectures   Storage/Index/Access   Privacy   Security   StaDsDcal  methods   Discrete  mathemaDcs   AI/Knowledge  Mgmt   Database   Data  integraDon   Natural  Language  

Coverage:   Weak   LiIle   Strong  

1  

2  

3   4  

(19)

Current Gap Analysis for Data Science Cluster (Oct 2014) Physical  models   Social  models   Symbolic  models   Numerical  solvers   HPC,  HPD,  OS   So=ware  architectures   Storage/Index/Access   Privacy   Security   StaDsDcal  methods   Discrete  mathemaDcs   AI/Knowledge  Mgmt   Database   Data  integraDon   Natural  Language  

Coverage:   Weak   LiIle   Strong  

1  

2  

3   4  

(20)

•  Ensemble and Comparative Visualization of Scientific Datasets (Sandia,

Christopher Healey)

•  Computer-aided Human Centric Cyber Situation Awareness (Penn State, Peng

Ning; Michael Young)

•  Runtime System for I/O Staging in Support of In-Situ Processing of Extreme

Scale Data (DOE, Nagiza Samatova)

•  Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems (DOE,

Nagiza Samatova)

•  Damsel: A Data Model Storage Library for Exascale Science (DOE, Nagiza

Samatova)

•  Scalable Data Management, Analysis, and Visualization (SDAV) Institute (DOE)

Nagiza Samatova; Anatoli Melechko)

•  Scientific Data Management Center (DOE, Vouk)

•  Collaborative Research: Understanding Climate Change: A Data Driven Approach

(NSF, Nagiza Samatova; Frederick Semazzi)

•  Policy-Based Governance for the OOI Cyberinfrastructure (NSF, Munindar Singh)

•  Interdisciplinary Cyber-Enabled Crime Reconstruction through Innovative

Methodology and Engagement (IC-CRIME); (NSF, David Hinks; Michael Young, ASU, IU-B)

(21)
(22)

Capturing Value from Data

•  Create transparency

•  Enable experimentation to discover needs, expose

variability, and improve performance

•  Segment populations to customize actions

•  Replace and/or support human decision making with

automated algorithms

•  Innovate new business models, products, and services

Source:  Big  Data:  The  next  fron4er  for  innova4on,  compe44on,  and  produc4vity,  

(23)

Analytics Acquisition Computation Sciences Cyber Infrastructure Cyber Security Data Management Education Gaming Informatics Mobility/ Wireless Networking Policy/ Governance Processing & Preservation Modeling & Simulation Visualization Virtualization ApplicaDons   Biological  Sciences   Business   Climate   Engineering  Aps.   Energy,  Health   Social,  Humani3es   Physics   Security   Policy   Etc.   IAA,  CSC,     CIMS,  MEAS,  +   Data  Types   Structured   Unstructured   Image   Signal   Streams   Data  Science   ICSE   ORSC   VCL   CSC   Math   Physics   ICSE   ITng   CSC   ECE   SOSI   ITng   SoSI   CSC   IAA   CEI   CSC   Stat   DGRC   BRC,  CSC   ITng   CSC,  ECE,  SOSI   ITng   CSC   ECE   CHiPS   COE   NC  B-­‐Prepared   RENCI   CSC   COD   CQSB   NCICS   CSC   ECE   Fault Tolerance /Recovery

Big  Data  Research  and   Development  IniDaDve   Natural Language Processing Pattern Recognition Consider:   v-­‐Centennial   CSC     VCL   ITng   CSC   SDM  CSC   Trans- portation

(24)

Graphic Representation of Data Science

(25)

•  Two one-day Workshops held at NC State

–  Hosted by VCs Terri Lomax and Marc Hoit

–  Led by Tina Bennefield, HR Senior Consultant &

Performance Leadership Program Manager

–  Organized by Bonnie Aldridge

•  Day 1

–  Individual faculty presentations on current research

–  Table discussions on trends, barriers and needs

•  Day 2

–  Developing a shared vision

–  Developing recommendations

(26)

McDonald* Wolfram Bird* Devine Whetten* Baron* Bolotnov* Chakrabortty* Chirkova Chow* Dai Edwards* Ferguson* Franzon* Healey* Krim* Misra* Muth Overton Rotenberg Vouk Westmoreland* Xie* Breen Kennedy-Stoskopf* Kouri* Kowolenko* Krishnamurthy* Blondin Brown* Daniels* Ghosh Ipsen Mitasova* Reading* Sullivant* Xie* Yuter Zhou Pasquinelli CALS CVM PCOM PAMS COE COT CHASS CNR

(27)

•  Research trends

–  Analysis of unstructured data sets

–  Enhanced visualization methods

–  Data interoperability and fusion techniques

–  Model-driven vs data-driven approaches

•  Barriers

–  Infrastructure (bandwidth, storage, power, etc.)

–  Human capital

–  Privacy, proprietary and standards

–  Departmental cultures

•  Needs & Vision

–  Understand industry funding

–  Collaborative data tools

–  Communication between producers and consumers

–  Overarching coordinating structure

References

Related documents

1- Results for the ANSYS analysis for the hydrodynamic dam- reservoir-foundation system indicates that the behavior of this system is different for the cases of

This application note provides code samples, which will enable the user to get a jump- start into using some of the serial communication interfaces of the LPC2000 family.. In

The U.S. military has long been a bastion of hegemonic masculinity, an institution that rewards aggressive, traditionally masculine values via rituals emphasizing

To conclude, research has highlighted the role of space and place in young people ’s lives in general, and schools in particular, and a handful of geographical studies has exam-

 Walks around local community to examine human or natural environment..  Use compass directions during warm

Initially, I had difficulty understanding how it was that students were integrating the various disciplinary perspectives in their pursuit of the question, “What does it mean to

Al Lund, the Program Director for PTEC, is now working with Police, Fire and Emergency Health Services academies in the development and delivery of their

Introduction to laparoscopic and robotic surgery Referente:Dr.Cozzaglio Docenti: Dr.. Spinelli 5-6 Patient management