The Regal Beverage Company makes the soft drink Grapefruit Bizarre. The marketing department wants to refocus its energies and resources.
You have been asked to determine if there are regional differences in consumers' response to advertisements for Grapefruit Bizarre. Specifically, you must find out if the Midwest responds to Grapefruit Bizarre advertisements as well as the West Coast.
The marketing department is clamoring to start a second campaign. It claims that ads that are effective on the West Coast do not go over as well in the Midwest. Management demands statistical evidence at a significance level of 0.05.
In the context of a free movie screening, an ad for Grapefruit Bizarre is shown to 173 Midwesterners. The viewers had been randomly selected, and had not previously tasted the drink.
When asked later, 33% claimed that they were at least mildly interested in trying Grapefruit Bizarre. In a similar survey conducted on the West Coast, 42% of 152 test subjects claimed at least a mild
interest in trying Grapefruit Bizarre.
You calculate the z-value for the difference in sample proportions.
Enter your z-value as a decimal number with 2 digits to the right of the decimal, (e.g., enter "5" as "5.00"). Round if necessary. a. 1.67 b. 1.68 c. 1.69 d. 1.7 e. -1.67 f. -1.68 g. -1.69 h. -1.7 z-table
Utility for Two Populations
The z-value is + or -1.68, depending on how you set up the difference, i.e., in what order you subtract the sample means.
A z-value of -1.68 corresponds to a left-tail probability of 0.0465. What do you report to the marketing department?
This is not the best answer. If the p-value is larger than the significance level, the null hypothesis should not be rejected.
b. The p-value is 0.0930 and the data are inconclusive: the difference between the sample proportions may be due to chance.
This is not the best answer. You may have conducted a two-sided test. Since marketing wants to know if Midwesterners respond less well to the ads than the West Coast, you should conduct a one-sided test. c. The p-value is 0.0465 and the data are inconclusive: the difference between the sample
proportions may be due to chance.
This is not the best answer. The p-value is 0.0465, which is less than the significance level 0.05.
d. The p-value is 0.0465 and the data indicate that Midwesterners respond less well to the ads. This is the best answer. The p-value is 0.0465, which is less than the significance level 0.05.
z-table
Utility for Two Populations
You are conducting a one-sided hypothesis test. The alternative hypothesis states that the proportion of the Midwest sample is less than the proportion of the West Coast sample. Therefore, you are interested only in the left-tail probability. Your p-value is 0.0465.
0.0465 is less than the significance level 0.05. You should reject the null hypothesis. Midwesterners do not respond to the ads as well as people from the West Coast. Marketing's claims are valid.
Challenge: LeMer Fashion Design
Upscale fashion designer Marjorie LeMer must decide from which supplier she should purchase bolts of cloth. Rumor has it that BlueTex's product is superior to Southern Halifax's.
Random 10-yard sections from 43 bolts of Halifax's cloth contain a mean of 1.8 flaws per yard. Similar sections from 42 bolts of BlueTex's product contain 1.6 flaws per yard. The standard deviations are 0.3 and 0.6, respectively.
Marjorie wants you to find out if the rumors that BlueTex makes a better product are statistically warranted.
You conduct a one-sided test. Which of the following is the best alternative hypothesis? a. There is no difference in the quality of the cloth.
This is not the best answer. This is the null hypothesis of a two-sided test. Marjorie has asked you to test the hypothesis that BlueTex has a better product than Halifax.
b. There are fewer flaws per yard in Halifax's cloth than in BlueTex's.
This is not the best answer. This would be an appropriate alternative hypothesis if we wanted to see if Halifax produced the better cloth.
c. There are more flaws per yard in Halifax's cloth than in BlueTex's.
This is the best answer. The alternative hypothesis should state that Halifax's product is inferior to BlueTex's.
z-table
Utility for Two Populations
At what level are these data significant? a. significance level 0.05
This is the correct answer. The p-value is 0.0262, which is greater than 0.01, but less than 0.05. b. significance level 0.01
This is not the correct answer. The p-value is greater than 0.01. c. Both at significance level 0.05 and significance level 0.01
d. Neither at significance level 0.05 and significance level 0.01
This is not the correct answer. To be significant t both levels, a p-value must be less than both 0.01 and 0.05.
z-table
Utility for Two Populations
You find the z-value for the difference between the two sample means using the appropriate formula. The z-value is -1.94.
The cumulative probability for z=-1.94 is 0.0262. This is the left-tail probability. Since you are running a one-sided test, 0.0262 is your p-value.
0.0262 is greater than 0.01. At this significance level, you would not reject the null hypothesis. 0.0262 is less than 0.05. At this significance level, you would reject the null hypothesis.
Southern Halifax's product is slightly cheaper than BlueTex's. All other factors being equal, Marjorie would like to buy the less expensive product. Unless she is 99% confident that there is a difference in quality, she will go with the cheaper cloth.
Based on this information and your calculations data, Marjorie should: a. Buy cloth from Southern Halifax Textile
This is the correct answer. The data are not significant at the 0.01 level, which corresponds to a confidence level of 99%.
b. Buy cloth from BlueTex
This is not the correct answer. Based on these data, Marjorie can't be 99% confident that BlueTex's produce is superior.
z-table
Utility for Two Populations
The data are not significant at the 0.01 level, so you can't reject the null hypothesis that they are different.
A significance level of 0.01 corresponds to a confidence level of 99%. So at a 99% confidence level, you can't reject the null hypothesis. Marjorie can't be 99% confident that BlueTex's product is better than Halifax's.
Since Halifax's product is cheaper and you can't determine a difference in quality at the level of statistical significant Marjorie requires, you recommend Halifax to Marjorie.
"Good work!" says Alice. "You're ready for a new challenge: investigating relationships between variables."
Regression Basics Introduction
As you relax in your room during a brief afternoon downpour, your phone rings. Leo's Bisque Debacle and the Staffing Problem
Leo just called. He wants us to come to his office immediately. He sounds a little angry. We'd better not keep him waiting.
I'm sorry if I was short on the phone. I'm very upset. We just had a little incident down in the restaurant. A server spilled a tureen of crab bisque on one of our most "favored" guests, Mr. Pitt.
The Kahana's occupancy this year has been higher than I expected, and I had to hire extra help from a staffing agency. Those staffing agencies charge a fortune, which is especially irritating considering that the employees they refer to us are often poorly suited to customer service in an up-scale hotel.
Really, this is my fault for not having a more effective staffing process. I just wish I could predict my needs better. Sometimes, when demand is lower than I expected, I'm overstaffed. Then I lose money paying idle bellhops. If I had a good sense of my staffing needs at least a month in advance, I could avoid hiring workers at the last minute and having idle staff.
I had been thinking that the number of advance reservations would give me a good idea of how high my occupancy would be a month down the road. But clearly advance reservations don't tell me the whole story. I've been making way too many false predictions.
Is there anything you can do to help me here? What predictions about occupancy can I make based on advance bookings? And how much can I trust them?
We'll take a look at the data on advance bookings and occupancy and let you know what we find out. Introducing the Regression Line
Alice seems confident that the two of you can offer useful advice on Leo's staffing problem:
"This will be a great opportunity for you to learn regression. It's a powerful statistical tool used all the time in business: in finance, demand forecasting, market research to name just a few areas. I'm sure you'll use it in your MBA program. And it's a great chance to review what you've learned so far: sampling, confidence intervals, and hypothesis testing all play a part in regression."
As we have seen, it is often useful to examine the relationship between two variables. Using scatter diagrams, we can visualize such relationships.
We can learn more about the relationship by finding the correlation coefficient, which measures the strength of the linear relationship on a scale from -1 to 1.
Regression is a statistical tool that goes even further: it can help us understand and characterize the specific structure of the relationship between two variables.
Let's look at an example. Julius Tabin owns a small food processing company that produces the
spreadable lunchmeat product EasyMeat. Julius is trying to understand the relationship between his firm's advertising and its sales.
Total sales in the spreadable meat industry have been fairly flat over the last decade, and Julius'
competitors' actions have been quite stable. Julius believes that his advertising levels influence his firm's sales positively, but he doesn't have a clear understanding of what the relationship looks like.
Let's have a look at data on his firm's advertising and sales over the last 10 years. Click on the Excel link to create the scatter diagram yourself from an Excel spreadsheet.
EasyMeat Data
Plotting annual sales against annual advertising expenditures gives us a visual sense of the relationship between the two variables. Looking at the graph, we can see that as advertising has gone up, sales have generally increased. The relationship looks reasonably linear.
EasyMeat Data
The correlation coefficient for the two variables is 0.93, indicating a strong linear relationship between advertising and sales.
EasyMeat Data
What if we were to draw a line that characterizes this relationship? Which line would best fit the data? Our mind's eye already sees how the two variables are related, but how can we formalize our visual impression?
EasyMeat Data
Before we start any calculations, let's look at several lines that could describe the relationship. EasyMeat Data
One of these lines most accurately describes the relationship between the two variables: the "best-fit" or regression line.
EasyMeat Data
In our example, the best-fit line is Sales = -333,831 + 50*Advertising. For this line, the y- intercept is
-333,831 and the slope is 50. EasyMeat Data
In general, a regression line can be described by a simple linear equation, y = a + bx, with y- intercept a and slope b.
EasyMeat Data
In this equation, the y-variable, sales, is called the dependent variable, to suggest that we think Julius' sales depend to some degree on his advertising. The x-variable, advertising, is called the independent variable, or the explanatory variable.
EasyMeat Data
When we observe that a change in the independent variable (here advertising) is typically accompanied by a proportional change in the dependent variable (here sales), regression analysis can identify and formalize that relationship.
EasyMeat Data Summary
Regression analysis helps us find the mathematical relationship between two variables. We can use regression to describe a linear relationship: one that can be represented by a straight line and characterized by an equation of the form y = a + bx.
The Uses of Regression
What kinds of questions can regression analysis help answer?
How does regression help us as managers? In can help in two ways: first, it helps us forecast. For example, we can make predictions about future values of sales based on possible future values of advertising.
Second, it helps us deepen our understanding of the structure of the relationship between two variables by expressing the relationship mathematically.
EasyMeat Data
Using Regression for Forecasting
Let's talk first about how managers can use regression to forecast. In our example, regression can help Julius predict his company's sales for a specified level of advertising. For example, if he plans to spend $65,000 in advertising next year, what might we expect sales to be?
If we didn't know anything about the relationship, but only had the historical data, we might simply note that the last time Julius spent $65,000 on advertising, his sales were
$3,200,200. But is this the best prediction we can make? EasyMeat Data
Not at all. Regression analysis brings the entire data set to bear on our prediction. In general, this will allow us to make more accurate predictions than if we infer the future value of sales from a single observation of advertising and sales. Having identified the relationship between the two variables from the full data set, we can apply our
understanding of that relationship to our forecast. EasyMeat Data
Using regression analysis, we found the regression line to be Sales = -333,831 +
50*Advertising. If Julius plans to spend $65,000 in advertising, what would we predict sales to be?
a. Around $3,200,000. This is not the best answer. b. Around $2,900,000.
This is the best answer. c. Around $2,500,000.
This is not the best answer. EasyMeat Data
The point on the line shows us what level of sales to expect. In this case, we would expect sales of $2,916,169.
EasyMeat Data
With regression, we can forecast sales for any advertising level within the range of advertising levels we've seen historically. For example, even if Julius has never spent exactly $50,000 on advertising, we can still forecast a corresponding level of sales. EasyMeat Data
We must be extremely cautious about forecasting sales for values of advertising beyond the range of values we have already observed. The further we are from the historical values of advertising, the more we should question the reliability of our forecast.
EasyMeat Data
For example, we might feel comfortable forecasting sales for advertising levels a bit above the observed range- perhaps as high as $100,000 or $105,000. But we shouldn't infer that if Julius spent $10 million on advertising, he would achieve $500 million in sales. The total market for spreadable meat is probably much less than $500 million annually!
EasyMeat Data
Likewise, we might feel comfortable forecasting sales for advertising levels just below the observed range. But we certainly shouldn't report that if Julius spent $0 on advertising he would have negative sales!
EasyMeat Data
If we try to use our regression equation to forecast sales for advertising levels outside of the historical range, we are implicitly assuming that the relationship between advertising and sales continues to be linear outside of the historical range.
EasyMeat Data
In reality, although the relationship may be quite linear for the range of values we've observed, the curve may well level off for advertising values much lower or much higher than those we've observed. With no observations outside the historical data range, we simply don't have evidence about what the relationship looks like there.
EasyMeat Data
Another critical caveat to keep in mind is that whenever we use historical data to predict future values, we are assuming that the past is a reasonable predictor of the future. Thus,
we should only use regression to predict the future if the general circumstances that held in the past, such as competition, industry dynamics, and economic environment, are expected to hold in the future.
The Structure of a Relationship
Regression can be used to deepen our understanding of the structural relationship between two variables. If we think about it, many business decisions are about increasing or
decreasing one variable — investments or advertising, for example — 7 to affect some other variable — productivity, brand recognition, or profits, for example. Regression can reveal the structure of relationships of this type.
Our regression analysis stipulates a linear relationship between sales and advertising. Understanding "the structure" of this relationship translates into finding and interpreting the coefficients of the regression equation.
As we've noted above, the constant term -333,831 may have no real managerial
significance; it just "anchors" the regression line by telling us the y-intercept. We've never seen advertising levels close to $0, so we cannot infer that spending no money on
advertising will lead to sales of -$333,831!
The more important term is the advertising coefficient, 50, which gives us the slope of the line. The advertising coefficient tells us how sales have changed on average as advertising has increased.
In the past, when advertising has increased by $10,000, what has been the average corresponding change in sales?
a. Sales have stayed flat.
This is not the correct answer. Double check your numbers and try again. b. Sales have increased by $50.
This is not the correct answer. Double check your numbers and try again. c. Sales have increased by $500,000.
This is the correct answer.
Assuming that the relationship between sales and advertising is linear, each $1 increase in advertising should be accompanied by the same average increase in sales. In our example, for every incremental $1 in advertising, sales increase on average by $50. Thus, for every incremental $10,000 in advertising, sales increase on average by $500,000.
The regression line gives us insight into how two variables are related. As one variable increases, by how much does the other variable typically change? How much growth in sales can we anticipate from an incremental increase in advertising expenditures? Regression analysis helps managers answer questions like these.
Summary
We use regression analysis for two primary purposes: forecasting and studying the
structure of the relationship between two variables. We can use regression to predict the value of the dependent variable for a specified value of the independent variable. The regression equation also tells us how the dependent variable has typically changed with changes in the independent variable.