An interdisciplinary model for
analytics education
Raffaella Settimi, PhD
School of Computing
,NPSMA Workshop, May28-29, 2013
Drew Conway’s Data Science Venn Diagram
Who is a data scientist?
….and let’s not forget …
a subject-domain expert
Curious and inquisitive
A computer scientist
A statistician
A data miner
Creative and with strong
communication skills
Learning outcomes of an Analytics MS curriculum
Database
Processing
Programming
/Scripting
Algorithms and
data modeling
Visualization
and
communication
Applications
Hands on
experience
Domain- specific
competence
NPSMA Workshop, May28-29, 2013Core
subjects
NSPMA Workshop, May28-29, 2013
• SQL queries
• DB programming
• NoSQL DB’s (e.g. Hadoop)
Database Processing
• Data cleaning, integration, and governance • Association Rules
• Basic Statistics & Data visualization
Data Mining
• Multivariate statistics • Time series analysis
Statistical Analysis
• Classification techniques, • Clustering methods
• Supervised and unsupervised learning Machine Learning
Tools and
Platforms
NPSMA Workshop, May28-29, 2013
Programming • Python, Java
Data Storage and integration
• Relational Databases (MySQL, Oracle, SQL Server) • Hadoop, NoSQL, Mongo DB
Modeling and analysis
• R
• SPSS & SPSS Modeler
• SAS & SAS Enterprise Miner • Matlab
• Weka
• PANDA (Python Data Analysis Library)
Visualization
• Tableau • MapPoint • ArcView, etc… Students should master a
core set of tools and platforms for
- Data storage and integration
- Modeling and analysis - Data visualization and
reporting
Both open source and commercial software
DePaul’s MS degree in Predictive Analytics
Originally a specialization in
Machine Learning of our MS
in Computer Science.
Created in 2010 to address
increasing
demand
of
graduates
with
deep
technical and analytics skills
to meet the challenge of
mining Big Data.
0 10 20 30 40 50 60 70 80 90 100 AY 2010/11 AY 2011/12 AY 2012/13
Enrollment
From a 2012 survey of our students
Students are most interested in courses around working with data (including “big data”) and data analysis,
as well as gaining additional experience in programming and marketing.
*Created on Wordle.net. Size is relative to the
overall number of mentions/responses; position does not matter. Top 50 words/mentions shown.
Current Positions held by our students
Analytics Positions
Not Related Position
Breakdown of industries among students with analytics positions
50% 20%
30%
Not working full time
B ank ing C on su lt in g Ed u ca ti on Food /B ev er ag e In su ra n ce IT /T echn ol og y M ar ke ti n g /Ad ve rt is in g N ot f or P rof it H ea lt h C ar e N /A
DePaul’s MS in Predictive Analytics curriculum
NPSMA Workshop, May28-29, 2013
Common
core
Computational Methods concentration (Fall 2011) Marketing concentration (Fall 2010) Hospitality concentration (Fall 2013) Health Care concentration (Winter 2014) Prerequisite knowledge: Intro to Statistics PythonCalculus & Liner Algebra (can be taken before MS)
Practicum Course
Links to Curriculum
Course home page:
http://www.cdm.depaul.edu/academics/Pages/MS-in-Predictive-Analytics.aspx
Concentrations:
– Computational methods:
view requirements
– Marketing:
view requirements
– Hospitality:
view requirements
– Health Care: available in winter 2014
Common Core
Teaches the fundamental tools and techniques for Data Science.
• Database processing (SQL queries, relational databases,
noSQL DB’s, data management and integration)
• Statistical modeling (regression analysis, multivariate
statistics)
• Data mining and machine learning (data cleaning, association
rules, clustering, classification techniques, etc…)
• Application of analytics in social networks, web data mining,
text mining
Applications
NSPMA Workshop, May28-29, 2013
• Analysis of network structure, Data retrieval from networks, Text analysis
Social Networks
• User behavior modeling, E-metrics for business intelligence, Web personalization,
recommender systems, privacy and ethical issues
Web analytics
• Information retrieval models, document clustering, taxonomies, sentiment analysis. Text mining
Additional electives
• Image analysis: image representation, segmentation, pattern
recognition
• Monte Carlo techniques
• Visualization techniques and design principles
• Data stream analysis
• ETL, data warehousing and business intelligence tools
(dashboards, reporting, etc…
Computational Methods concentration
view requirements
Created in response to the demand of those students who wanted to
develop strong technical skills required for Big Data analytics.
Courses in
– Mining Big Data
– Programming analytics applications in Python
– Advanced data mining techniques (matrix factorization, probabilistic networks, etc.)
– Machine Learning algorithms
Students learn how to apply advanced data mining and data base
processing techniques for the analysis and management of extremely
large datasets.
Marketing concentration jointly with
the Marketing Department
view requirements
Everyday the amount of data available to businesses increases, and more
information is available about markets, products, competitors and customers. Companies gain a competitive advantage by using analytics to uncover
insights about their markets and make smarter decisions. Courses in
– Customer Relationship Management – Marketing analytics
– Internet marketing
– Customer service and analysis Students learn how to
– Apply analytics to mine marketing data
– Extract information from data to support business decision making and marketing decisions.
Hospitality concentration jointly with
the School of Hospitality Leadership
view requirements
Organizations in the tourism (hotels, restaurants , travel) industry have
access to an abundance of data, both internal and from third-party
available through social media channels, such as Trip Advisor and Yelp.
Students learn how to
– Apply analytics to mine hospitality data incorporating revenue
management principles, and optimization techniques
– Assess hospitality global distribution system analytics and
predict impacts on service-firm financial performance
– Identify revenue management principles and optimization
models unique to the various services sector within the
hospitality industry
Health Care concentration jointly with
the Marketing dept. and Health Sector Mgmt program
The recent changes in healthcare have lead to a paradigm shift in
healthcare industry and an increasing need in using data to predict
trends in illness, disease, injury, utilization, and costs.
Students learn how to
•
Apply analytics to mine health care data such as
– Patient experience / satisfaction/outcomes
– Claim management and cost reduction
– Predictive modeling of care, costs and utilization
– Pharmacy data
• To develop evidence-based business models to improve health care
strategies, such as patient experience, clinical processes, and
resource allocation.
NPSMA Workshop, May28-29, 2013
In 2010, we created an interdisciplinary academic center to bring together expertise of faculty from different schools and programs at DePaul University:
– Computing – Marketing
– Hospitality leadership (added in 2012)
– Health sector management (added in 2013)
Aimed to be a “center without walls” facilitating:
Faculty and students’ research across disciplines
State-of-the-art curriculum for preparing a new generation of specialists in data mining and predictive analytics
Faculty and students’ collaborations with industries
Students/Alumni matching application with employers’ needs
Networking events
Provide students with real world
experience: Think outside the classroom
• Data science cannot be learnt just by sitting in a
classroom and listening to lectures
• Students should
– Use real data in courses
– Work on large scale projects
– Gain experience through internships or industry sponsored
projects
– Have access to a variety of platforms and tools
– Network with analytics professionals
Challenge: Access to real data
It can be hard to get real data from companies,
because data often contain sensitive information about
the company or customers.
Internships are easier to set up, as data remains at the
company site.
Industry-sponsored projects are a win-win opportunity
for companies that can take advantage of a team of
students and the expertise of a faculty member
supervising the project - at no or relatively low cost .
DaMPA Industry Partnerships
DaMPA Industry Partnerships
Software Education
Data
Research Innovation Board Education Advisory Board
Research
Companies provide datasets to be used for class projects or student research projects.
Companies recruit students, serve as industry advisors, and guide the Center on curriculum development and long terms planning.
Companies provide software or training material to be used for teaching or research. Partner to translate new science into novel technologies and to address unmet industry’s needs.
Examples of projects
Medical Informatics
NSF REU program in medical informatics (joint with University of Chicago)
Computer-aided detection, diagnosis, and characterization for lung nodules (joint
with University of Chicago)
Prediction of chronic fatigue syndrome (joint with DePaul Psychology Department) Tracking illness from Tweets
Analysis of legionellosis occurrence (data from Chicago Public Health Office)
Web Data Mining, Web Personalization, and Recommender Systems
Ontology-based user modeling for web personalization and recommendation Recommender Systems for the Social Web
Trustworthy and Secure Recommender Systems for the Web
Urban studies
Motor Vehicle theft analysis (data from Chicago Police Dept.)
A data-driven typology of urban communities in Cook County (joint with Institute
of Housing Studies)
Hospitality Projects
Food and Beverage Analytics and Optimization Modeling