Solving the Staffing Problem (III) - Quantitative Methods Online Course

With one eye on the lookout for signs of lurking multicollinearity and two new types of variables in your toolbox, you set out to settle the staffing problem for good.

You might have noticed that the number of arrivals follows an annual seasonal pattern. Arrivals tend to drop off during the late summer, surge again in October, and drop very low for the rest of the year.

Source

They pick up briefly in February, but the tourist business slows through the spring. During the early summer, vacationers start arriving in droves, with arrivals peaking in June or July.

Source

Perhaps we can use the seasonality of arrivals in some way. We might come up with a lagged variable that functions as a proxy for the current month's arrivals. Then we can run a regression with the lagged variable. Since the values of a lagged variable would be based on historical data, Leo would know them ahead of time.

Given that arrivals follow this seasonal pattern, which of the following variables is likely to be a good proxy for this month's arrivals on Kauai?

Source

You run the simple regression of Kahana occupancy versus 12month lagged arrivals.

Kahana Occupancy Data Source

What is the lowest level at which the "lagged arrivals" variable is significant?

Kahana Occupancy Data Kahana Occupancy Regressions Source

The regression output shows a pvalue of 0.000 for the lagged arrivals coefficient, far below the significance level of 0.01.

At 55%, the Rsquared for the simple regression on lagged arrivals is much lower than for the simple regression on current arrivals, 80%. The adjusted Rsquared is only 53%. Still, lagged arrivals have substantial predictive power.

Run a multiple regression of occupancy versus lagged arrivals and advance bookings.

Kahana Occupancy Data

What is the adjusted Rsquared for this multiple regression? Enter the adjusted Rsquared as a decimal number with two digits to the right of the decimal point (e.g., enter "50%" as "0.50").

Kahana Occupancy Data Kahana Occupancy Regressions

The adjusted Rsquared of 60% for the multiple regression is higher than the 53% for the simple regression. But is that the best you can do? For the moment, you report your analysis to Alice.

This model with the lagged variable is more useful to Leo, even if its predictive power is still pretty low. But let's look at these data on Leo's competition that I dug up. Maybe we can use them to give us an even better model.

Leo's competitor, Knut Steinkalt at the Hotel Excelsior, frequently launches promotion campaigns: in some months, Steinkalt slashes room prices dramatically.

These data show the months in which the Excelsior's promotions took place in the last three years. We can use a dummy variable to see how the competition's promotions have affected Leo's occupancy.

Kahana Occupancy Data

The Excelsior has to advertise these promotional packages at least a month in advance to attract customers. As long as Leo keeps an eye on the Excelsior's advertising, he'll be alerted to these promotions in enough time to take them into account when he makes staffing decisions for the following month.

Kahana Occupancy Data

Using Alice's research, you run a simple regression of occupancy versus the promotions. Then you run the multiple regression of occupancy versus advance bookings, lagged arrivals, and Excelsior promotions.

Kahana Occupancy Data

Suppose Leo has used the multiple regression equation to predict July's occupancy using a given level of advance bookings and last July's arrivals. Just before he makes his staffing decisions, he learns that, unexpectedly, his rival Knut has cut the Excelsior's room prices for July. Leo should revise his predicted occupancy by:

Kahana Occupancy Data Kahana Occupancy Regressions

The gross effect of Excelsior promotions on Kahana occupancy, a reduction of 52 guests, is not relevant here since we know the advanced bookings and lagged arrivals for July are still the same as they were before the Excelsior

launched its campaign. Leo wants to use the net effect of the promotion on his occupancy levels, assuming advanced bookings and lagged arrivals remain fixed — an average reduction in occupancy of 60 guests.

Adding Excelsior promotions to the regression analysis:

Kahana Occupancy Data Kahana Occupancy Regressions

Upon Leo's return, you eagerly report your results.

This is good work. Naturally, I'd like to have even greater predictive power than 84%, but I realize you aren't psychics. This model will really help me when I hire staff. Thanks!

I have some good news of my own. Mr. Pitt agreed not to file suit against me! I did have to promise him an extended stay in the Kahana's penthouse, free of charge.

He's coming next month — there's one occupant I don't need to use statistics to predict. This time, I'll serve the bisque myself.

Thanks again for all your help!

Exercise 1: The Kiwana Quandary

Linda Szewczyk, marketing director of Amalgamated Fruits Vegetables & Legumes (AFV&L), is researching the nation's fruit consumption habits. In particular, she would like greater insight into household consumption of the kiwana, a crossbreed of kiwis and bananas that AFV&L pioneered.

Naturally, one important determinant of household consumption is the size of the household — the number of members. Since AFV&L has positioned the kiwana as a "high end" fruit, Linda believes that household income may also influence its consumption.

Run a multiple regression of household kiwana consumption versus household size and income. Make note of important regression parameters such as Rsquared, adjusted Rsquared, the coefficients, and the coefficients' significance. The income variable has a coefficient of 0.0004. Can a variable with such a small coefficient be statistically significant?

Kiwana Consumption Data

The independent variable, income, is statistically significant since its pvalue is less than 0.05, the most common level of significance. The small coefficient tells us that for every additional $10,000 of income, average kiwana consumption increases by 4 lbs. a year.

To date, AF&L has focused its marketing campaigns on highincome, highly educated consumers. Linda would like to deepen her understanding of how the educational level of the household members might affect their appetite for kiwanas.

Kiwana Consumption Data

To incorporate education into her kiwana consumption analysis, Linda separated the households in her data set into three categories based on the highest level of education attained by any member of the household — no college degree, college degree but no postgraduate degree, and postgraduate degree. She represents these categories using two dummy variables — "college only" and "postgraduate."

Run a regression on all four independent variables.

Kiwana Consumption Data

Controlling for household size, income, and postgraduate degree, how many more pounds of kiwanas are consumed in a household in which the highest educational level is a college degree, compared to a household in which no one holds a college degree?

Kiwana Consumption Data Kiwana Consumption Regressions

The coefficient for the dummy variable "college," 51.6, tells us the expected difference in kiwana consumption for

"college degrees only" households compared to the excluded educational category: households in which no one holds a college degree. The coefficient describes the net relationship between "college degrees only" and household kiwana consumption, controlling for household size, income, and postgraduate degree.

Controlling for household size, how many more pounds of kiwanas are consumed in a household in which the highest educational level is a postgraduate degree, compared to a household in which the highest educational level is a college degree? Enter the difference in consumption between the two households as a decimal number with two digits to the right of the decimal point. (e.g., enter "5" as "5.00"). Round if necessary.

Kiwana Consumption Data Kiwana Consumption Regressions

When you control for household size and income, college degree households consume 51.63 lbs more than non

college households. Postgraduate degree households consume 51.95 lbs more than noncollege households. In other words, postgraduate households consume 0.32 lbs more than college households.

The analysis of household kiwana consumption indicates:

Kiwana Consumption Data Kiwana Consumption Regressions

Exercise 2: The Return of the Empire

For this exercise, refer to the regression analyses you ran in exercise 1 of the previous section.

Empire Learning Data Empire Learning Regressions

Bill Hartborne, the CEO of Empire Learning, is using regression analysis to predict the number of laborhours it will take his team to create a new elearning course. He is using data on previous courses Empire created, with the number of pages and the total animation runtime as independent variables.

Empire Learning Data Empire Learning Regressions

Bill believes that the number of illustrations used in the course may also have a significant impact on the number of laborhours it takes to complete an elearning course. He wants to add the number of illustrations to the model as another independent variable.

Empire Learning Data Empire Learning Regressions

Run the simple regression of laborhours versus number of illustrations.

Empire Learning Data Empire Learning Regressions

At which level is the number of illustrations a statistically significant independent variable?

Empire Learning Data Empire Learning Regressions

Run the multiple regression of laborhours versus number of pages, illustrations, and animation runtime.

Empire Learning Data Empire Learning Regressions

Is there evidence of multicollinearity in the data?

Empire Learning Data Empire Learning Regressions

A common symptom of multicollinearity is a high adjusted Rsquared — in this case 94% — accompanied by one or more independent variables with low significance. In this case, the coefficient for the number of illustrations is not significant at the 0.05 level, and the pvalue for the number of pages has risen to 0.0291, up from 0.0015 in the regression without illustrations.

Which of the following is the likely culprit of multicollinearity?

Empire Learning Data Empire Learning Regressions

Multicollinearity occurs when the respective effects of two or more independent variables on the dependent variable are not distinguishable in the data. This can be the result of correlated independent variables. The fact that the pvalue for the number of pages rises when we add the illustrations raises our suspicions that the number of illustrations and the number of pages might be correlated.

We can compute the correlation between the number of pages and the number of illustrations: the correlation coefficient, 67%, is fairly high.

We could also attempt to diagnose the cause of the multicollinearity by running a regression of laborhours versus number of pages and number of illustrations omitting animations. Here, the significance of illustrations is extremely low, with a pvalue of 0.85.

Empire Learning Regressions

In the regression of laborhours versus number of illustrations and runtime of animations omitting pages the respective effects of the independent variables on the dependent variables can be distinguished. Here, the p

values for both variables are much lower than 0.05. All of the evidence points to a linear relationship between number of pages and number of illustrations as the culprit for the multicollearity.

Empire Learning Regressions

Bill wants to use the regression analysis to predict the number of laborhours it will take to complete a new e

learning course. Comparing the two regression models one with all three independent variables, one without illustrations which should Bill use?

Empire Learning Data Empire Learning Regressions

Decision Analysis

In document Quantitative Methods Online Course (Page 93-97)