(Solution Library) [20 marks] The file GRP.TXT contains the data on the following variables for a particular region in New Zealand: Name Description NoCons Number
Question: [20 marks]
The file GRP.TXT contains the data on the following variables for a particular region in New Zealand:
| Name | Description |
| NoCons | Number of building consents issued for new dwellings |
| ValCons | Value of building consents issued for new dwellings (million $) |
| Unemp | Number of registered unemployed |
| House | Number of dwellings sold |
| Car | Number of New Car Registrations (contains missing data) |
| Exp | Value of exports (million $) |
| Imp | Value of imports (million $) |
| GRP | National Bank Index of Gross Regional Product (GRP) |
-
Obtain the correlation matrix for the variables and the matrix plot of the variables. Discuss
the uses of these two outputs in the context. [4 marks] - Regress GRP on all the explanatory variables and obtain the full regression output. Discuss the statistical significance of the regression coefficients considering the P values. What conclusion would you draw from the Analysis of Variance part of the regression output? Explain your answers in the context. Perform suitable residual diagnostics and discuss the implications. [6 marks]
-
Carry out a complete
forward stepwise
regression of
GRP
on the explanatory variables. Also
perform a complete backward predictor elimination procedure . Compare the outputs.
Which step (model) you will recommend for predicting the GRP? Explain your answer.
[5 marks]
(d) Explore the appropriateness of polynomial regression in the context. [5 marks]
(b) The following regression results are obtained:
The multiple regression model is:
GRP = 101 - 0.117 NoCons + 1.37 ValCons - 0.000144 Unemp + 0.0319 House
+ 0.0048 Car - 0.0118 Exp + 0.0150 Imp
This model is significant overall, F(7, 15) = 11.17, p = 0.000 . Also this model explains approximatelty 76.4% of the variation in GRP, which indicates that it is a relatively good model. Notice that only NoCons (p = 0.012), ValCons (p = 0.010) and House (p = 0.020) are individually significant. All the other predictors are not individually significant.
Notice that the normal probability plot and the histogram of residuals don’t exhibit any clear pattern indicating a lack of normality of residuals. On the other hand, the plot of residuals versus predicted values doesn’t show any pattern indicating a serious heteroskedasticity problem.
(c) Forward stepwise regression
Using forward stepwise selection, we find that the best model only includes House as a predictor and the model is
GRP = 92.02 + 0.0553*House
This model explains 70.81% of the variation in GRP.
Backward stepwise regression
Using forward stepwise selection, we find that the best model only includes House as a predictor and the model is
GRP = 96.56 – 0.102*NoCons + 1.37*Valcons + 0.0358*House
This model explains 77.85% of the variation in GRP.
Based on the standard error and the amount of explained variation, the "best" model is
GRP = 96.56 – 0.102*NoCons + 1.37*Valcons + 0.0358*House
(d) Based on the matrix plot, a polynomial regression approach wouldn’t be justified, considering that none of the predictor as a clear non-linear (quadratic, cubic, etc) pattern when plotted against GRP.
Question 2 [30 marks]
An experiment was conducted to relate Yield in a chemical plant to temperature and pressure.
The following table gives the experimental data, which was originally published in the text "Introduction to Linear Models and the Design and Analysis of Experiments" by Mendenhall, W., Duxbury Press.
You need to enter the data manually to perform the analysis.Pressure Temperature Yield 50 100 21 50 200 23 50 300 26 80 100 22 80 200 23 80 300 28 50 100 22 50 200 23 50 300 27 80 100 21 80 200 23 80 300 27 - Discuss the basic principles of experimentation in the context. You need to discuss how the experimenter would have applied the basic principles, for example, how the principle of randomisation must have been applied. [6 marks]
(b) Perform one-way ANOVA tests to see whether there is any temperature or pressure effect.
Discuss your answer stating the limitations of this test. [4 marks]
(c) Perform two-way ANOVA tests (with and without interactions) to see whether there is any temperature and/or pressure effects. Explore the residuals of the fitted models and suggest whether or not you obtain any clues for improving the model. [8 marks]
(d) Perform a multiple linear regression of Yield on temperature and pressure. Interpret the t
and F-tests done in the context. [4 marks]
(e) Compare the regression and ANOVA analyses and comment. Which approach is reliable,
and why? Explain your answer in the context. [4 marks]
(f) Build a more appropriate model relating Yield with temperature and pressure. Note that this question is open ended, and you need to provide necessary justifications in your answer. [4 marks]
Deliverable: Word Document 