THE BASEBALL CASE Singha Field is home to the BK Lions professional baseball team. The team's new marketing
THE BASEBALL CASE
Singha Field is home to the BK Lions professional baseball team. The team's new marketing director, Noelle Amsley, has been trying to develop a better understanding of the key drivers of attendance at the ballpark to increase ticket revenues, optimize concession inventories and staffing, and schedule the timing of promotional giveaways.
The stadium is capable of holding almost 41,000 fans. The exact number is hard to pin down due to the sale of standing-room-only tickets and VIP ticket comping. The data for this case are included in the file baseball case.xls.
Part A: REGRESSION ANALYSIS
Noelle's first model uses three concepts to predict attendance: time of day, temperature, and day of the week. Specifically, she has a dummy variable for night games, the day's high temperature, and three dummies indicating if the game takes place on a Friday, Saturday, or Sunday, respectively.
-
Use Regression 1 to estimate attendance for a Sunday afternoon game where the temperature is 82 degrees.
A quick look at the model analysis page on KStat shows six outliers among the 92 data points. Two of them are day games on very cold weekdays where the model predicts the lowest possible turnout. However, these particular games nearly sold out. Noelle kicks herself: They're both the opening day of the season, a special game for baseball fans.
Adding a new dummy variable called opening day that equals one on the first home game of the season and zero otherwise produces Regression 2. -
Use Regression 2 to estimate the attendance for a Sunday afternoon game where the temperature is 82 degrees and it is not opening day.
-
Compare your results from questions 1 and 2. Explain why your estimate changes between the two models.
The team management recently began using a more sophisticated pricing structure to improve its revenues. Instead of charging the same set of prices for every game, there are two different pricing schemes: full-price tickets and cheap tickets. For games where management anticipates a lower level of interest, it charges the cheap ticket prices in order to stimulate demand. Regression 3 shows the significant effect of cheap tickets on attendance, but the coefficient is confusing to Noelle. She had expected the sign to be positive. Shouldn't the lower prices increase attendance?
-
Do these results violate the law of demand that says all else being equal, a lower price should increase the quantity demanded?
Noelle's colleague, Andrew Groden, is interested in learning how two other factors are driving attendance: promotional giveaways such as free hat day; and popular opponents, such as the team's historic rivals, the ML Tigers, as well as their cross-town rivals, the Pachyderms. To test these factors' significance, Noelle has added three dummy variables called promo, Tigers, and Pachyderms, which are added to her earlier regression to produce Regression 4. She quickly informs Andrew that the first two are significant, but the Pachyderms do not seem to be a big draw to the ballpark.
Andrew disagrees: "It's just because those games were all scheduled on days that were already popular. Five of the six times they played were on Fridays or the weekends, and all of the games were in the summer when the weather is usually perfect! Those games increased the interest in the games, but there just weren't enough seats available in the ballpark to see the effect." -
Does Andrew's theory sound reasonable? Why would a team schedule games against a popular rival, knowing that it did not need to encourage attendance on those dates?
Regression 5 adds two more variables to Noelle's model. One is school, which equals one whenever the local public school system is in session (keeping thousands of potential fans away from many games), and zero otherwise. The other variable she adds is Cheap Tickets, as was used in Regression 3 . -
Is the variable Cheap Tickets significant in this regression? Interpret the coefficient and its significance in the context of this new regression.
-
Use Regression 5 to make a forecast of attendance for a Saturday night game against the Tigers that is not on opening day. Also, the temperature is 89 degrees, there are full-price tickets, a promotional giveaway, and school is out of session. Provide a $95 \%$ prediction interval for your answer. Do you have any concerns about your forecast?
PART B: NON-LINEARITIES
Noelle has been studying Regression 5. She is concerned about the Breusch-Pagan Test, which indicates a heteroskedasticity problem with the model. She becomes more concerned after conducting a semi-log model, Regression 6 , which failed to fix the problem. Noelle suspects that a linear model may not be the most appropriate fit to the data; in particular, she is worried about the large number of games that are pushing the stadium's capacity limits.
Both linear and logarithmic models are unbounded, meaning they don't have an upper limit. Regression 1, for instance, predicts more than 42,000 fans for a Saturday afternoon game with a temperature of around 88 degrees (not unreasonable for a summer day) even though that exceeds the capacity of the stadium by more than a thousand people, A regression of $\ln$ (Attendance) using the same independent variables predicts more than 43,000 fans.
The problem as Noelle sees it is that none of the models she has learned about seems right for the pattern she observed in the dataset: attendance getting closer and closer to a maximum value as "conditions" improve. Taking temperature as the independent variable, Noelle plots Attendance versus Temperature with two different fits. These fits include one linear and one curving up toward the capacity. These plots are seen in Figures 1 and 2 .
Looking at Figure 2 gives Noelle an idea. Though a semi-log model, \(Y=a \cdot e^{b x}\) does not have a maximum when the constant a is positive, it does have a minimum. \(Y\) will never fall below zero.
Flipping Figure 2 upside-down by plotting Empty Seats versus Temperature gives Noelle the graph in Figure 3, which looks just like the kind of graph where a semi-log model fits perfectly! Taking a log of the empty seats and plotting it versus Temperature gives her Figure 4. Empty seats were computed using 41,000 as the capacity. Regression 7 uses the same dependent variable but adds the entire collection of independent ones as Noelle had done previously. - How does the semi-log model of empty seats used in Regression 7 compare to the models used in Regressions 5 and 6 ? Briefly discuss the pros and cons of using this last model.
- Use Regression 7 to predict attendance for a Saturday night game against the Tigers that is not opening day. Also, the temperature is 89 degrees, there are fullprice tickets, a promotional giveaway, and school is out of session. In addition to a single attendance number, provide a \(95 \%\) prediction interval for your answer.
Deliverable: Word Document
