A study of the manufacturing value added for 28 counties with GDP has been conducted. The data is on the
- A study of the manufacturing value added for 28 counties with GDP has been conducted. The data is on the class web page:
prob5data.xls
-
Estimate the following model and explain the output (check the residuals, and the residual plots).
\(MANUVA_{i}=\beta_{0}+\beta_{1} GDP_{i}+\varepsilon_{i}\) - Find any violations of the classical assumptions from the residual plots and state the problems.
- Perform the Park and White tests, and compare the tests results.
- Assuming the population of the country can be a good proportional factor of heteroskedasticity, estimate the WLS (weighted least square), and discuss about your findings and compare the results with the OLS results.
- Let's consider using per capita variables instead of level, thus the model will be:
\(\frac{MANUVA_{i}}{POP_{i}}=\beta_{0}+\beta_{1} \frac{GDP_{i}}{POP_{i}}+\varepsilon_{i}\)
Estimate above model, and check the residual plots. Do we resolve the problem of heteroskedasticity?
2. Let's work on the Life Expectancy using the impact of Income and Access to health care. We collected data on a sample of 80 countries. The data is available at index files/prob 5 data.xls.
-
Estimate the following model, explain the economic meanings of output, and check the residuals plots. Explain any possible violations of OLS assumptions.
\(Life Expectancy_{i}=\beta_{0}+\beta_{1} Income_{i}+\beta_{2}Access_{i}+\varepsilon_{i}\) - Perform the Park and the White tests, and explain the tests results.
- Explain the limitations of the Park and the White tests.
- Explain the consequences of heteroskedasticity.
Solution:
- We use SPSS to run a regression analysis to the specified model. We get the following results:
-
The scatter plot above shows a positive linear trend, so we’ll perform a regression analysis.
As we see in the table above, the correlation is significantly positive with
\[R=0.943\]
The coefficient of determination is
\[{{R}^{2}}=0.890\]
which indicates that approximately 89% of the variance in MANUVA is explained by the regression in GDP .
-
The ANOVA table above shows that the model is significant overall (
p
= 0.000).
The table with the coefficients is:
The table shows that GDP is a significant predictor of MANUVA ( p = 0.000). The model is written as
\[MANUVA=603.875+0.194\text{ }GDP\]
(2) OLS Assumptions:
Here we exhibit the some plots involving residuals:
The histogram is reasonably bell-shaped, so we don’t have strong of indication of lack of normality.
The plot above corresponds to the scatter plot of standardized predicted values versus the standardized residuals. There is a pattern in the plot, which suggests that we are in the presence of heteroskedasticity.
(3) Let’s perform the White test:
We need to run a regression for the model:
\[{{\hat{u}}^{2}}={{\delta }_{0}}+{{\delta }_{1}}GDP+{{\delta }_{2}}GD{{P}^{2}}+\varepsilon \] (*)
Using SPSS we get the following results:
The ANOVA table for the model (*) shows that the White test is not significant at the 0.05 significance level, because the p-value is p = 0.051. Nevertheless, the result gives a clear confirmation of the possible heteroskedasticity in data. -
Let’s now perform the Park Test.
We need to run a regression for the model:
\[\log \left( {{{\hat{u}}}^{2}} \right)={{\delta }_{0}}+{{\delta }_{1}}\log \left( GDP \right)+\varepsilon \] (*)
Using SPSS we get the following results:
The table shows that the slope coefficient \({{\hat{\delta }}_{1}}\) is significantly different from zero ( p = 0.001), which indicates that we have heteroskedasticity.
For this example, the White test is more conservative in detecting Heteroskedasticity.
(4) The WLS regression is performed with SPSS, and we get the following results:
The WLS model is
\[MANUVA=3657.769+0.189\text{ }GDP\]
This model is reasonable similar to the OLS one.
The plot above of standardized residual and predicted values still shows signs of heteroskedasticity, which suggest that P opulation may not be a good proportional factor.
(5) We get the following results for the model
\[\frac{MANUV{{A}_{i}}}{PO{{P}_{i}}}={{\beta }_{0}}+{{\beta }_{1}}\frac{GD{{P}_{i}}}{PO{{P}_{i}}}+{{\varepsilon }_{i}}\]
-
The variable \(\frac{GDP}{POP}\) is a significant predictor (p = 0.000). The regression model is
\[\frac{MANUV{{A}_{i}}}{PO{{P}_{i}}}=203.404+0.177\frac{GD{{P}_{i}}}{PO{{P}_{i}}}\]
The scatter plot of the adjusted predicted values and residuals show that there is no noticeable trend. This suggests that heteroskedasticity is neither present anymore.
Problem 2:
1) The regression analysis is shown below:
As seen in the table above, the correlation is significantly positive with
\[R=0.881\]
The coefficient of determination is
\[{{R}^{2}}=0.775\]
which indicates that approximately 77.5% of the variance in Life Expectancy is explained by the regression in the predictors Income and Access . The Durbin-Watson statistics is close to 2 which suggests the lack of serial correlation.
The table above shows that both predictors are significant ( p = 0.000 in both cases). The model is
\[\text{Life Expectancy}=39.168+0.001Income+0.284Access\] -
ASSUMPTIONS:
The histogram of residuals shows a clear bell-shaped tendency, which indicates that the normality assumption is most likely satisfied.
The plot of standardized residuals shows a slight trend which is a mild sign of heteroskedasticity.
(b) White Test :
We need to estimate the model:
\[{{\hat{u}}^{2}}={{\beta }_{0}}+{{\beta }_{1}}Income+{{\beta }_{2}}Access+{{\beta }_{3}}Incom{{e}^{2}}+{{\beta }_{4}}Acces{{s}^{2}}+{{\beta }_{5}}Access*Income+\varepsilon \]
The results are shown below:
The White test is significant, since the p-value is p = 0.012, which indicates heteroskedasticity. -
Park Test
:
We need to find a proportionality factor. Based on the following scatter plots:
we conclude that both variables have an effect in the variance, so we choose arbitrarily Income to the proportionality factor. We need to run the model:
\[\ln \left( {{{\hat{u}}}^{2}} \right)={{\beta }_{0}}+{{\beta }_{1}}\ln \left( Income \right)+\varepsilon \]
The results are shown below:
Since the slope coefficient \({{\hat{\beta }}_{1}}\) is significantly different from zero ( p = 0.000), we conclude that there is heteroskedasticity.
(3) The limitations of the tests are specified below: - Park Test : Needs a proportionality factor, which is not necessarily easy to find
- White Test: It can be very computational intensive due to the fact that include all the quadratic and interaction terms.
(4) Consequences of Heteroskedasticity:
- The OLS estimators will be unbiased, but not BLUE since they don’t minimize the variance.
- The OLS estimators therefore will not be efficient or consistent.
- OLS underestimate the true variance and overestimate the t-statistics, which makes the p-value unreliable.
Price: $28.12
Solution: The downloadable solution consists of 19 pages, 912 words and 23 charts.
Deliverable: Word Document
Deliverable: Word Document
