Teaching Business Statistics through Problem Solving
David M. Levine, Baruch College, CUNY with
David F. Stephan, Two Bridges Instructional Technology
CONTACT: [email protected]
Typical student perception of the introductory business
statistics course
It’s a math course
I’ll never use anything from this course in my other courses and after I graduate
This is a required course that somehow,
some way, I will have to get through and
complete
Combatting misperception leads to these course goals
Show relevance of statistics by providing
examples drawn from the functional areas of business that students study
Emphasize interpretation of statistical results over mathematical computation
Give students plenty of practice in learning how to apply statistics to business
Illustrate for students how to use statistical software to assist business decision making
Link course content to current trends in
business
Show
relevance of statistics by providing examples drawn from the functional areas
Just as computers are used in courses beyond the “computer course,” statistics is used in
courses beyond the statistics course
Each statistics topic needs to be presented in an applied context related to at least one
functional area of business
Functional areas of business include accounting, finance, information systems, management, and marketing
When teaching a topic, the focus should be on its application in business
Emphasize interpretation of results
Emphasize
interpretation of statistical results over
mathematical computation
Introductory business statistics courses should recognize the growing need to interpret
statistical results that computerized processes create.
This makes the interpretation of results more important than knowing how to execute the tedious hand calculations required to produce them
Interpretation includes the evaluation of the
assumptions and a discussion of what should
be done if the assumptions are violated
Give students plenty of
practice in
learning how to apply
statistics to business
Both classroom examples and homework
exercises should involve actual or realistic data as much as possible
Students should work with both small and large data sets
Students should be encouraged to look beyond the statistical analysis of data to the
interpretation of results in a managerial context
Clear and reusable instructions should be
provided for using statistical software
Illustrate for students how to use
statistical software to
assist business decision
making
Introductory business statistics courses should recognize that computers in business typically contain programs with statistical functions
Integrating statistical software into all aspects
of an introductory statistics course allows the
course to focus on interpretation of results
instead of computations
Clear and reusable
instructions should be
provided for using
statistical software
Instructions should explain clearly how to use a program such as Microsoft Excel with the study of statistics
Instructions should provide sufficient step-by- step detail, including program elements such as dialog boxes, to enable students to use the instructions for other problems and examples
Using templates, project files, and/or macros
adds in reusability and lessens the burden of
learning the software
Special issues
What to do during the first day of class
Dealing with students’ negative affect
Take note of current trends that require
knowledge of statistics
First Day of Class
First impressions are critically important in everything you do in life
First day is the most important class of the semester
You need to set the tone to create a new impression that the course will be
important to their business education
Deming’s
Eighth Point Drive Out Fear
“Statistics is not Sadistics”
Make the point that this course is not a math course
State that you will be learning analytical skills for making business decisions
Explain that the focus will be on how
statistics can be used in the functional
areas of business
Reading,
Writing, and Arithmetic Statistics
“I keep saying that the sexy job in the next ten years will be statistician.”
—Hal Varian, Chief Economist, Google,
as quoted in The New York Times, August 6, 2009
Current trend example:
Analytics
Analytics can help answer these questions
What happened in the past and how and why it happened?
What is happening now and what is the best action to take?
What will happen and how can you obtain good predictions of what will happen?
“Analytics should be part of the competitive strategy of any organization.”
—Davenport and Harris (references 1 & 2)
How to
proceed with rest of course
Provide a “roadmap” that helps guide students to use statistics for problem solving in business
State example problems that are stories about making decisions in a functional area of business
Fictional or real businesses?
Illustrate that statistics provide a
problem-solving approach for business
decision making
Part Two:
Implementing course goals through the DCOVA
problem-solving framework
DCOVA:
five-steps that serve as a
blueprint for all statistical
problem- solving
Define the data that you want to study in order to solve a problem or meet an objective
Collect the data from appropriate sources
Organize the data by developing tables
Visualize the data by developing charts
Analyze the data to reach conclusions
and present those results
Define Step
Present every problem from the
perspective of what is the business objective for collecting data
(Compare to “Here is some data, let’s analyze it.”)
Use operational definitions to identify the variables that need to be analyzed.
Determine the type (categorical or
numerical) for each variable.
Collect Step
Determine the source of the data
Primary source Secondary source Survey
Designed experiment
Prepare data
Data cleaning
Recoding
Organize Step
Determine the format for data entry
Choose software to be used for data
analysis (potentially could involve several different types of software)
Organize can be done in conjunction with
the Visualize and Analysis steps
Visualize Step
Construct charts and special displays
Explore the charts to discover patterns and relationships
Evaluate the charts to determine the
validity of the methods used in the
Analyze step
Analyze Step
Determine which method(s) should be used to analyze the data
Using a roadmap to help make this determination can be helpful
Summarize the results
Present the results in a report
Example:
Teaching
Simple Linear Regression
Introduce topic with a story-based business problem
Execute the DCOVA framework
Reflect and state solution to business
problem and propose further action
The Story:
Knowing
Customers at Sunflowers Apparel
Having survived recent economic slowdowns that have diminished their
competitors, Sunflowers Apparel, a chain of upscale fashion stores for women, is in the midst of a companywide review that includes researching the factors that make their stores successful. Until recently, Sunflowers managers had no data analyses to support store location decisions, relying instead on subjective factors, such as the availability of an inexpensive lease or the perception that a particular location seemed ideal for one of their stores.
As the new director of planning, you have already consulted with marketing data firms that specialize in using business analytics to identify and classify groups of consumers. Based on such preliminary analyses, you have already tentatively
discovered that the profile of Sunflower shoppers may not only be the upper middle class long suspected of being the chain’s clientele, but may also include younger, aspirational families with young children, and, most surprising, urban hipsters that set trends and are mostly single.
You seek to develop a systematic approach that will lead to making better decisions during the site-selection process. As a starting point, you have asked one marketing data firm to collect and organize data for the number of people in the identified categories that live within a fixed radius of each Sunflower store. You believe that the greater numbers of profiled customers contribute to store sales, and you want to explore the possible use of this relationship in the decision-making process. How can you use statistics so that you can forecast the annual sales of a proposed store based on the number of profiled customers that reside within a fixed radius of a Sunflowers store?
Key Points from the
Sunflowers Apparel
Story
Until recently, Sunflowers managers relied on
subjective factors to support store location decisions
You have already tentatively discovered that the profile of Sunflower shoppers may not only be the upper
middle class shoppers long suspected of being the chain’s clientele
You believe that the greater numbers of profiled
customers living near a store contribute to store sales, and you want to explore this relationship
How can you use statistics so that you can forecast the annual sales of a proposed store based on the number of profiled customers that reside within a fixed radius of a Sunflowers store?
Define and Collect steps
Operational definitions needed for
Profiled customers (in millions) Annual store sales (in $milllions)
Collect data from a sample of 14 stores
(Sampling issues already discussed in
course)
Organize Step (worksheet entry)
Store Profiled Customers Annual Sales
1 3.7 5.7
2 3.6 5.9
3 2.8 6.7
4 5.6 9.5
5 3.3 5.4
6 2.2 3.5
7 3.3 6.2
8 3.1 4.7
9 3.2 6.1
10 3.5 4.9
11 5.2 10.7
12 4.6 7.6
13 5.8 11.8
14 3.0 4.1
Visualize
Step
Analyze
Step (worksheet
results)
Analyze Step
Interpret the regression coefficients
Use the regression model for prediction
Interpret the standard error of the estimate
Interpret the coefficient of determination
Explain the regression sum of squares,
error sum of squares, and total sum of
squares
Analysis Step (residual analysis)
Explain the assumptions of regression
Show residual plots when each assumption has been violated
Show residual plots when each assumption has not been violated
Show the residual plot for these data
Note the integration of visualize and analyze
Analyze
Step (residual plot)
Analyze
Step (inferences)
t test for the slope
Confidence interval for a mean value
Prediction interval for an individual value
Reflection and solution
statement
To make more objective decisions, you used the DCOVA approach to identify and classify groups of consumers and develop a regression model to analyze the
relationship between the number of profiled customers that live in a fixed radius from a Sunflowers store and the annual sales of the store.
The model indicated that about 84.8% of the variation in sales was explained by the number of profiled customers that live in a fixed radius from a Sunflowers store.
Furthermore, for each increase of one million profiled customers, mean annual sales were estimated to
increase by $2.0742 million. You can now use your model to help make better decisions when selecting new sites for stores as well as to forecast sales for existing stores.
Additional thoughts about the
Introductory
Business Statistics
Course
Additional thoughts
Course structure issue Course variations
Typical content Introduction
Tables & Charts
Descriptive Statistics Probability
Discrete Probability Distributions
Normal Distribution
Sampling Distributions Confidence Intervals Hypothesis Testing p-Values
Regression
Quality Management
Use of templates
Course
structure issue
One semester vs. two semester
Undergraduate versus graduate MBA
Course
variations
One semester undergraduate course can only cover a certain amount of topics.
Two semester undergraduate course can cover more tests including some ANOVA and a good deal of multiple regression
Introductory MBA course can cover more
regression than undergraduate one semester course
Specialized MBA courses can focus on multiple
regression and time series
Typical content
Overview/orientation
Tables and Charts/Descriptive Statistics
Probability and Probability Distributions
Confidence Intervals and Hypothesis Testing
Regression
Introduction
Explain that by using software such as Excel or Minitab the focus is on analyzing the results not on doing the computations
Ask the class to tell you whether certain variables are categorical or numerical
Collect data from students that requires them
to measure something such as the time it takes
them to get ready in the morning
Tables &
Charts
Use the student generated data for the classroom example
Focus on the differences between alternative graphs and the circumstances in which each is better
Mention misuse of graphs
Descriptive Statistics
Take a small sample of student generated data and use it for the classroom example
Teach the mean, median, and mode without showing equations first
When you get to variation, build up to the variance and standard deviation slowly by
explaining that you need a measure of variation
that will be 0 when there is no variation, small
when there is some variation, and large when
there is a great deal of variation
Probability
Don’t use Venn diagrams – they are confusing to students; use contingency tables instead
Minimize coverage of probability especially in a
one semester course. This is a statistics course
not a math course
Discrete
Probability Distributions
Do you really need to explicitly cover the binomial, Poisson, and/or hypergeometric distributions especially in a one semester course?
Can you teach confidence intervals and
hypothesis testing without covering these?
Yes!
Normal
Distribution
Don’t show the equation for the normal distribution. It will only intimidate some students and make students think that somehow they need to know it
Work through a classroom example in which you show all the possible variations of finding areas under the curve
Expect that the most difficult example is trying to find the unknown X given an area
Use a picture of the normal table to show that you are doing the inverse of what you did previously
Sampling
Distributions
Probably the most difficult concept for students to learn
Try using a small population and then select all the samples from that population so that they can see that the distribution of the sample
mean is different from the distribution of the population
Then, present the central limit theorem and
show what happens when the sample size is
increased with different populations
Confidence Intervals
The most important points to get across are that you can never be certain that your
confidence interval is correct and that if you took a different sample you would get a
different confidence interval
Review the difference between categorical and numerical variables and point out that there
are different equations for different types of
variables. This will set the stage for using road
maps in hypothesis testing
Hypothesis Testing
Focus on the fact that the alternative
hypothesis H1 never has a equal sign -- it is always <, >, or ≠.
Give a practical example to show the difference between Type I and Type II errors such as
should you market a product or should you take a drug
Beware of trying to “cover” too many different hypothesis tests -- students won’t see the
forest from the trees
Use a roadmap that presents a series of
questions leading to the correct test procedure
p-Values
Students have a more difficult time with this concept than we expect
Use a hypothesis test that involves the normal distribution (such as a Z test for a mean or a proportion) to demonstrate the p-value
Use the mantra
“If the p-value is low, H
0must go”
to help students remember that a low p-value is
significant not a high p-value
Regression
Begin with a business problem of trying to predict the value of a variable of interest. Then ask what other
variables might be useful in helping to predict the value of the variable of interest
Do this before going through any computations
Review the meaning of the Y intercept and the slope
Don’t do the proof of the Least squares method
Focus on interpreting the results of software not on doing computations
Make sure to mention the assumptions and what happens if the assumptions are violated
Discuss residual analysis if time permits
Quality
Management
Integrate control charts with management philosophy
Do the Red Bead experiment if time permits as this transmits the notion that most of the
variation is due to the system not the individual
Use of
templates (stored in a
“library” or
generated by
an add-in)
In this example, the complexity is hidden, yet fully accessible later, to the student focused on the interpretation of results to solve a problem.Even simple linear
regression can
be a template!
Time does not permit
discussion of other topics!
Thanks for your interest and attention!
David Levine, with David Stephan
References
1. Davenport, T. H. and J. G. Harris. Competing on Analytics: The New Science of Winning. Boston, MA: Harvard Business School Press, 2006.
2. Davenport, T. H. , J. G. Harris, and R. Morrison. Competing on Analytics: Smarter Decisions Better Results. Boston, MA:
Harvard Business Press, 2010.
3. Thomas Davenport and D. J. Patil. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review, October 2012: 70—76.
4. Levine, D. M. and D. F. Stephan. “Teaching Introductory Business Statistics Using the DCOVA Framework”, Decision Sciences Journal of Innovative Education, Vol. 9, September 2011: 393—397.
5. Levine, D. M., D. F. Stephan, and K.A. Szabat. Statistics for Managers Using Microsoft Excel, 7th Ed. Upper Saddle River, N.
J.: Pearson Education, 2013.