[1](20) The owner of a single-family home in a suburban county in the northeastern United States would
[1](20) The owner of a single-family home in a suburban county in the northeastern United States would like to develop a model to predict electricity consumption in his all-electric house (lights, fans, heat, appliances, and so on) based on outdoor atmospheric temperature (in degrees Fahrenheit). Monthly billing data and temperature information were available for a period of 24 consecutive months. The data are in the file (data_prob_1.xls).
| Month | Kilowatt usage (KW) | Average Atmospheric Temperature (F) | Month | Kilowatt usage (KW) | Average Atmospheric Temperature (F) |
| 1 | 126 | 30 | 13 | 123 | 27 |
| 2 | 132 | 25 | 14 | 121 | 33 |
| 3 | 114 | 29 | 15 | 138 | 28 |
| 4 | 87 | 42 | 16 | 99 | 39 |
| 5 | 67 | 48 | 17 | 64 | 47 |
| 6 | 50 | 61 | 18 | 52 | 63 |
| 7 | 39 | 69 | 19 | 49 | 69 |
| 8 | 45 | 78 | 20 | 41 | 73 |
| 9 | 39 | 72 | 21 | 44 | 70 |
| 10 | 43 | 62 | 22 | 53 | 64 |
| 11 | 61 | 45 | 23 | 59 | 53 |
| 12 | 92 | 36 | 24 | 118 | 27 |
- With atmospheric temperature on the X -axis and kilowatt usage on the Y -axis, set up a scatter diagram. Attach your work.
- Does there appear to be a relationship between atmospheric temperature and kilowatt usage? If so, is the relationship positive or negative? Find the regression equation.
- Interpret the meaning of the slope, b 1 , in this problem.
- Predict the average kilowatt usage when the average atmospheric temperature is 50 degrees Fahrenheit.
- Compute the coefficient of determination, r 2 , and interpret its meaning.
- Compute the standard error of the estimate.
- Plot the residuals versus the average atmospheric temperature.
- Plot the residuals versus the time period.
- Compute the Durbin-Watson statistic. At the 0.05 level of significance, is there evidence of positive autocorrelation among the residuals?
- Based on the results of (g)-(i), is there reason to question the validity of the model?
[2] A certain college administrator is interested in seeing whether math or verbal SAT scores are the better predictor of performance in a required freshman history course. The data are in the file (data_prob_2.xls).
- Test whether there is a statistically significant relationship (at the 5% level) between math SAT and history course scores. Is it a positive or negative relationship?
- When establishing a relationship, does the value on the R squared bother you? Explain why or why not?
- Test whether there is a statistically significant relationship (at the 5% level) between verbal SAT and history course scores. Is it a positive or negative relationship?
- When establishing a relationship, does the value on the R squared bother you? Explain why or why not?
- Is the math or verbal SAT score a better predictor of success in this history course? Explain.
[3](20) A certain college administrator is interested in seeing whether math (X 1 ) and verbal (X 2 ) SAT scores are the better predictor of performance in a required freshman history course. The data are in the file (data_prob_2.xls).
- State the multiple regression equation.
- Interpret the meaning of the slopes b 1 and b 2 in the model.
- Interpret the meaning of the regression coefficient b 0 .
- Test H 0 : 2 = 0 against H 1 : 2 > 0. Interpret your finding.
- Use a 95% confidence interval to estimate 2 . Interpret the p -value corresponding to the estimate 2 . Does the confidence interval support your interpretation in d)?
- Determine the coefficient of multiple determination r 2 Y.12 and interpret its meaning.
- Perform a residual analysis on your results and determine the adequacy of the fit of the model.
- Plot the residuals against the prices. Is there evidence of a pattern in the residuals? Explain.
- When you compare the results in [2] with (a), what do you think the effect of math scores on history scores? For example, if you and I both have 570 verbal SATs, but you have a 740 math SAT and I have only a 520 SAT, what do you think would happen if simple regression or multiple regression analyses are applied?
[4](10) Suppose that you fit the model with 5 independent variables \[Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+{{\beta }_{3}}{{x}_{3}}+{{\beta }_{4}}{{x}_{4}}+{{\beta }_{5}}{{x}_{5}}+\varepsilon \] to n = 30 data points, and you obtain SSE = 0.46 and R-Square = 0.87.
Is the model of any use in predicting y ? Using = 0.05, test the null hypothesis:
\[{{H}_{0}}:{{\beta }_{1}}={{\beta }_{2}}={{\beta }_{3}}={{\beta }_{4}}={{\beta }_{5}}=0\]against the alternative hypothesis
\[{{H}_{1}}:\text{At}\,\,\text{least}\,\,\text{one}\,\,\text{of}\,\,\text{the}\,\,\text{parameters}\,\,{{\beta }_{1}},\,\,{{\beta }_{2}},\,\,{{\beta }_{3}},\,\,{{\beta }_{4}},\,\,{{\beta }_{5}}\,\,\,\text{is}\,\,\text{not}\,\,\text{zero}\][5](30) A collector of antique grandfather clocks believes that the price (in dollars) received for the clocks at an antique auction increases with the age of the clocks and with the number of bidders. Thus the model is hypothesized is
\[Y={{\beta }_{0}}+{{\beta }_{1}}{{x}_{1}}+{{\beta }_{2}}{{x}_{2}}+\varepsilon \] where y = auction price, x 1 = age of clock (years) and x 2 = number of bidders.
A sample of 32 auction prices of grandfather clocks, along with their ages and the number of bidders, is given below.
| Age ( x 1 ) | Bidders ( x 2 ) | Price ( y ) | Age ( x 1 ) | Bidders ( x 2 ) | Price ( y ) |
| 127 | 13 | 1235 | 170 | 14 | 2131 |
| 115 | 12 | 1080 | 182 | 8 | 1550 |
| 127 | 7 | 845 | 162 | 11 | 1884 |
| 150 | 9 | 1522 | 184 | 10 | 2041 |
| 156 | 6 | 1047 | 143 | 6 | 854 |
| 182 | 11 | 1979 | 159 | 9 | 1483 |
| 156 | 12 | 1822 | 108 | 14 | 1055 |
| 132 | 10 | 1253 | 175 | 8 | 1545 |
| 137 | 9 | 1297 | 108 | 6 | 729 |
| 113 | 9 | 946 | 179 | 9 | 1792 |
| 137 | 15 | 1713 | 111 | 15 | 1175 |
| 117 | 11 | 1024 | 187 | 8 | 1593 |
| 137 | 8 | 1147 | 111 | 7 | 785 |
| 153 | 6 | 1092 | 115 | 7 | 744 |
| 117 | 13 | 1152 | 194 | 5 | 1356 |
| 126 | 10 | 1336 | 168 | 7 | 1262 |
- State the multiple regression equation.
- Interpret the meaning of the slopes b 1 and b 2 in the model.
- Interpret the meaning of the regression coefficient b 0 .
- Test H 0 : 2 = 0 against H 1 : 2 > 0. Interpret your finding.
- Use a 95% confidence interval to estimate 2 . Interpret the p -value corresponding to the estimate 2 . Does the confidence interval support your interpretation in d)?
- Determine the coefficient of multiple determination r 2 Y.12 and interpret its meaning.
- Perform a residual analysis on your results and determine the adequacy of the fit of the model.
- Plot the residuals against the prices. Is there evidence of a pattern in the residuals? Explain.
- At = 0.05, is there evidence of positive autocorrelation in the residuals?
- Suppose the collector, having observed many auctions, believes that the rate of increase of the auction price with age will be driven upward by a large number of bidders. In other words, the collector believes that the age of clock and the number of bidders should interact. Is there evidence to support his claim that the rate of change in the mean price of the clocks with age increases as the number of bidders increases? Should the interaction term ( x 1 x x 2 ) be included in the model? If so, what is the multiple regression equation? You need to read Section 14.6 for this .
Deliverable: Word Document
