1
ESS event:
Big Data in Official Statistics
v
2
LEARNING AND DEVELOPMENT:
CAPACITY BUILDING AND TRAINING
FOR ESS HUMAN RESOURCES
FACILITATOR: JOSÉ CERVERA- FERRI
3
Session 2
Related Scheveningen challenges
[SCH5] Short-term Human Resources needs:
recruitment, professional training,
secondment/re-deployment
[SCH5] Long-term needs: academic curricula
for Data Scientists
[SCH6] Collaboration with academia for
4
Session 2: Topics for discussion
• Skills for Big Data
• Opportunities for building skills
• Proposal for a key input to the roadmap to
be established by the ESS Task Force
5
Session 2: Organization
Short-term
Long-term
Skills for Big Data
Session 2A
Opportunities for acquiring
skills
Session 2A
Session 2B
Proposal for a roadmap to
acquire skills for Big Data in the
ESS
6
SKILLS FOR BIG DATA
OPPORTUNITIES FOR ACQUIRING
SKILLS
7
Session 2A
Preliminary considerations (1):
Can NSIs rely on existing skills?
• “Non-traditional set of skills to develop”
• Trained statisticians and IT staff in statistics are already close to the
“data science” skills required for Big Data (data cleaning, cubes,
analytical software, data mining, etc.). Staff well-trained in methodology and statistical domains (UNECE Sprint paper, SWOT analysis – strength).
• The Official Statistics Community has less knowledge of Big Data than
many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the new, non-traditional, technologies used to gather, process and analyse Big Data (UNECE Sprint paper, SWOT analysis – weakness).
8
Session 2A
Preliminary considerations (1):
Can NSIs rely on existing skills? (cont.)
• Young staff coming in from universities may be very innovative and already have a personal relationship with Big Data (Facebook, Google, Twitter trends) and less constrained by traditional IT and analysis (UNECE Sprint paper, SWOT analysis – opportunity).
• Failure to permit innovative methods might render OSC organizations less attractive workplaces for top talent (UNECE Sprint paper, SWOT analysis – threats).
• Cultural change:
– “a culture that values high quality and accurate information and regards the best way to achieve this through use of methods where the design can be controlled. Big Data doesn't allow this luxury” – Innovative thinking, risk-taking (is it the realm of Civil Servants??)
9
Session 2A
Preliminary considerations (2):
Learning methods
• Learning by doing in OS
• Training individuals, or teams?
• The business analyst and project manager
• The mathematician who builds algorithms
• The data architect
• The statistician (data collection, editing, processing)
• The communicator (visualization)
• Data analyst • Data scientist • Data engineer • Data integrator • System manager
10
Session 2A
Preliminary considerations (3):
Competition
• Competition with the Industry: better
salaries in the private sector for Data
Scientists?
11
Session 2A
Skills for Big Data
Data Scientist vs. Statistician
•
Data Scientist as the “connective tissue”
between processing technologies and
data-driven decision making
Necessary skills: math/statistics, IT,
visualization, subject matter specialization
•
Math/stat: data mining techniques
•
IT: Hadoop, MongoDB, NoSQL, …
12
Session 2A:
IT Skills for Big Data
• R-SAS-SPSS
• Business Intelligence, Visual Analytics, Excel
• MapReduce
• Pig, Java
• SQL
• ETL (Extract, transform, load)
• Linux…
13
Session 2A
Statistical Skills for Big Data
• Computational statistics
• Analytical methods: correlations & causality,
modelling, network analysis, information reduction
• Dissemination: data visualization
14
Session 2A
Opportunities in the ESS
• ESS Learning and Development Framework
• ESTP 2014 course
– Big Data: Effective Processing and Analysis of Very Large and Unstructured Data for Official
Statistics
• Contents: classification of various massive data sets, ETL (extract, transform, load), specific challenges, Privacy and statistical disclosure issues, comuting base, overview of statistical methods. Focus on concrete examples.
• Course requirements:
– Database fundamentals and data manipulation languages – Data collection and integration tools
– Data mining techniques for large data sets – Object-oriented design and programming – Probablity and random variables
• Is there anyone with such a complete background in Official Statistics???
• European Masters in Official Statistics (EMOS): ESS certification of
programmes offered by Universities
– EMOS workshop 2014 (Helsinki, June 2014)
15
OPPORTUNITIES FOR ACQUIRING
SKILLS (CONT.)
KEY INPUT TO THE ROADMAP TO BE
ESTABLISHED BY THE ESSTASK
FORCE
16
Sessions 2B
Opportunities outside the ESS
Grasping the opportunities outside:
•
Diversity of academic programmes on Big Data,
Business Analytics, Data Science…
(certification?)
•
Training offer from private companies
(certification?)
17
Session 2B
[SH6] Collaboration with Academia
• Academic collaborators: use of existing expertise in
statistical analysis of large sets of data: astronomy, remote
sensing, genetics, image processing….
• Source of training: need for mapping academic
programmes on Big Data
• How can academics be integrated with NSI staff?
• How can training be financed? National or ESS level?
18
Session 2B
Horizon 2020
• Marie Sklodowska-Curie actions: support for innovative training networks, mobility of researchers, inter-sectoral cooperation • ICT 15 -2014: Big data and Open Data Innovation and take-up:
– Objective: To contribute to capacity-building by designing and coordinating a network of European skills centres for big data analytics technologies and business development. The network is expected to identify knowledge/skills gaps in the European industrial landscape and produce effective learning curricula and documentation to train large numbers of European data analysts and business developers, capable of (co)operating across national borders on the basis of a common vision and methodology
– Expected impact: Availability of deployable educational material for data scientists and data workers and thousands of European data professionals trained in state-of-the-art data analytics technologies and capable of (co)operating in cross-border, cross-lingual and cross-sector European data supply chains.
• Call on “Training and educating Data Scientists” • More detailed linkages in Horizon 2020??
19
Session 2B
Input to the Roadmap: The actions
• Ideas for actions (which term?):
– Identify existing skills in the ESS– Recruit Data Scientist with the missing skills
– Establish a network of providers of Big Data skills within the ESS – Map the offer of Data Science training programmes in the private
sector and their applicability to OS
– Establish a repository of assessed training materials
– Establish agreements with private sector and academia as providers of training,…
• Who?
– NSIs, Eurostat, International organizations, private sector, Academia?
– Working Groups? Gexp (EMOS), HLG, ESTP, ???