Preliminary Syllabus for the course of Data Science for
Miguel Godinho de Matos1,2
and Pedro Ferreira2,3
_lica-Lisbon, School of Business and Economics
Heinz College, Carnegie Mellon University
Department of Engineering and Public Policy, Carnegie Mellon University
1 Course overview
Firms create massive amounts of data as by-products of their activity. The volume and speed with which such data is created makes it increasingly necessary for managers to leverage on intelligent systems capable of processing large volumes of information in real time to improve decision making.
In this course we will study how business experimentation and data analysis technologies can be used to improve business knowledge and decision making. We will learn about funda-mental principles and techniques of predictive modeling data analysis and causal inference. We will examine real-world examples and cases of the application of such tools.
We will work hands-on with state-of-the-art data analysis software. After taking this course students should be able to:
_ Have hands-on experience with data analytics.
_ Be able to think systematically about how and when data can improve decision making in contexts of management, marketing investments, etc.
_ Be able to understand and discuss topics of data analysis for business intelligence. In particular, know basic principles and algorithms of data mining to interact with data analytics professionals.
_ Be able to design simple experiments to improve business knowledge and decision making.
2 Course Participation Rules
Lectures will cover examples of the fundamental principles and uses of data analytics and data mining. This is not a data mining algorithms course, but we will discuss the mechanics of how these methods work.
Class meetings will be a combination of lectures on fundamental material, case discus-sions and student exercises.
Reading assignments will cover the core material and we expect that students will be prepared for class discussions.
Students should attend every class session. Failure to do so will have a direct impact on class grade.
I will check my email at least once a day during the week (Monday through Friday). Please use the special tag [ 2014 - Business Analytics ] in the subject header of the e-mail. I use this tag to make sure I process class email _rst. If you fail to include the special tag, I may not read the email for a long time.
3 Course Readings
The mandatory textbook for the class will be: Data Science for Business: Fundamen-tal principles of data mining and data analytic thinking Provost and Fawcett (2013). We will complement the book with discussions of applications, cases, and demon-strations.Whenever relevant, we will hand out lecture notes.
We expect that you ask questions about any material in the notes that is not clear after the corresponding class and after reading the book.
Depending on the direction our class discussion takes, we may not cover all material that is initially planed for any particular session. If the notes and the book are not adequate to explain a topic that we skip, you should ask about it by e-mail. I will be happy to follow up and provide you with additional references.
The grade breakdown is as follows: _ Participation - 10%
_ Home work - 40% _ Final Exam - 50%
You are expected to attend every class session, to arrive on time, to remain for the entire class, and to follow basic classroom etiquette. Basic class etiquette includes disconnecting all electronic devices for the duration of the class (unless otherwise noticed).
You are expected to participate in class discussions and understand the material presented in previous lectures.
Each homework will comprise questions to be answered and/or hands-on tasks. Except as explicitly noted otherwise, you are expected to complete your assignments on your own.
The hands-on tasks will be based on data that we will provide. You will mine the data to get hands-on experience in formulating problems and using the various techniques discussed in class. You will use these data to build and evaluate predictive models.
For the hands-on assignments we will use the R statistical language http://cran.r-project.org/. We also recommend that you use the open source version of R-Studio http://www.rstudio.com/ as your development environment.
In order to use R, you must have access to a computer where you can install software. If you do not have such a computer, please see me immediately so we can make alternative arrangements.
You should bring your computer to class. We will help you install and con_gure the software in the _rst class.
4.3 Final Exam
The subject matters covered and the exact dates will be discussed in class.
5 Class Contents
1. Introduction to data mining and business analytics (a) Data Analytics Thinking
(b) From Big Data 1.0 to Big Data 2.0 (c) From Business Problems to Data Mining (d) Supervised Vs. Unsupervised Data Analysis (e) The Process of Data Mining
(a) Finding informative attributes (b) Tree induction
(c) Probability estimation 3. Model _t and model over_t
(a) Finding \optimal" model parameters based on data (b) Choosing the goal for data mining
(c) Objective functions (d) Loss functions
(f) Fitting and over_tting (g) Complexity control
4. Model quality and performance evaluation (a) Evaluating classi_ers
(b) Expected value as key evaluation framework
(c) Visualizing model performance (ROC, Lift curve, Cumulative response, Pro_t curve)
5. Introduction to the paradigm of causal inference (a) Limits of data mining
(b) Correlation versus causation
(c) Treatment, control, outcomes and randomized experiments (d) Power and sample size
6. Randomized experiments in the wild
(a) Several case discussions (Microsoft, Goodle, Bing, Facebook, Our own work, etc.)
6 Class Schedule
Instructor Topics Readings Deliverables
1 MGM Introduction to data mining and business analytics
Chp 1, 2 Info Sheet (in class) 2 MGM Introduction to predictive modeling Chp 3 Homework 1 due 3 MGM Model _t and model over_t Chp 4, 5 Homework 2 due 4 MGM Model quality and performance
Chp 7, 8 Homework 3 due 5 PF Introduction to the paradigm of
Notes Homework 4 due 6 PF Randomized Experiments in the
7 Instructor Bios
Miguel Godinho de Matos (MGM) is visiting assistant professor of Information Systems and Management at Cato_lica Lisbon School of Business and Economics. He is also a vis-iting scholar at the Heinz College from Carnegie Mellon University. He received a Ph.D. in Telecommunications Policy and Management and a M.Sc. in Engineering and Public Policy from Carnegie Mellon University. Miguel's research interests focus on the analysis of social networks and peer in
uence on consumer behavior and the impact of digitization on consumer search and choice. Miguel has published his work in top journals and top peer-reviewed research conferences such as Management Information Systems Quarterly, the International Conference of Information Systems, IEEE Conference on Social Computing and the Economics of Digitization Seminar Series of the National Bureau of Economic Re-search.
Pedro Ferreira(PF) is an assistant professor of Information Systems and Management at the Heinz College, Carnegie Mellon University. He received a Ph.D. in Telecommunica-tions Policy from CMU and a M.Sc. in Electrical Engineering and Computer Science from MIT. Pedro's research interests lie in two major domains: identifying causal eects in dense network settings, with direct application to understanding the future of the digital media industry, and the evolving role of technology in the economics of education. Currently, he is working on a series of large scale randomized experiments in network settings looking at identifying the role of peer in
uence in the consumption of media. Pedro has published in top journals and top peer-reviewed research conferences such as Management Science, Man-agement Information Systems Quarterly and the IEEE Conference on Social Computing.
8 O_ce Hours
Miguel Godinho de Matos' o_ces hours will be detailed in the _rst lecture of the course. Pedro Ferreira will be on campus only for the last sessions of the course. He will not have o_ce hours. Pedro will be available to meet by appointment during his stay at Cat_olica Lisbon School of Business and Economics. Details will be provided in class.