You have been asked to develop a hedonic model to explain differences in (or to predict the) the prices
Question 1 (40 points)
You have been asked to develop a hedonic model to explain differences in (or to predict the) the prices of homes on the basis of different characteristics. Download the file with data on "Homes". The data that you have selected for this project include the following variables:
PRICEi : The average price of the i-th home sold last year expressed in dollars;
SQFi : The area of the i-th home in square feet;
BEDi : The number of bedrooms of the i-th home;
BATi : The number of baths of the i-th home;
GARi : The number of garages of the i-th home;
You think that the following theoretical model would be appropriate for the prediction of the prices of homes:
Model A : PRICE i : = β A0 + β A1 *BED i + β A2 *BAT i + β A3 *GAR i + ε Ai
- Give a brief interpretation of the expected signs in this model. (3 points)
- Transform the dependent variable into thousands of dollars , estimate the model indicated above, and present your estimates in full reporting mode (including standard errors, t-ratios, Adjusted R 2 , F-ratio, number of observations, and sum of residuals). Not cut and paste . (4 points)
- Without actually running significance tests, comment on the strength of estimated coefficients and the validity of the model. (5 points)
- You want to make sure that the model is properly specified. In particular, you want to ensure that there is not any relevant variable missing. Run the RESET equation and present your estimates in full reporting mode. (5 points)
- Run a F-test to check if there are missing variables in Model A above and state your conclusions. (4 points)
- Assuming that your model is missing relevant variables, you decide to add to the explanatory variables the home area in squared feet. What could be the sign expected of this variable? (2 points)
- Estimate a Model B that includes the variables of Model A as well as the square footage and present your findings in a full reporting mode. (4 points)
- Run a regression of the variable that has been excluded from Model A against the variables that are included in Model A and report your results. Using the results of this regression, perform an Expected Bias analysis on the slope coefficients of Model A. (5 points)
- Using the four specification criteria, evaluate the inclusion of the fourth variable (square footage) in Model B. (4 points)
- Without running any additional regressions, could you say if you have solved the problem of misspecification (the problem of missing important variables)? You will have to fully justify your answer. (4 points)
Question 2 (30 points)
You have been asked to study pay equity in the market place, in particular, if there is any evidence of discrimination in salaries paid to persons of similar qualifications in the United States. Download the file with data on "Salaries". The file contains a sample of the data that have been collected by the Bureau of Labor Statistics for a specific year. The following variables are included:
WAGEi : Average hourly earnings of the i-th person (in $);
EDUCi : Level of education of the i-th person expressed in number of years of schooling;
EXPRi : Level of experience of the i-th person expressed in number of years;
FEMLi : A dummy variable taking the value of 1 if the i-th person is a woman
You think that the following theoretical model would be appropriate for your study on discrimination:
WAGE i = β 0 + β 1 *EDUC i + β 2 *EXPR i + β 3 * EXPR i 2 + β 4 *FEML i + β 5 *(FEML i *EXPR i ) + ε i
- Justify the choice of the variables used here (give a brief interpretation of what we are trying to test with these variables) and state your expectations about the signs of the coefficients. (4 points)
- Create the necessary variables and estimate the model specified above and report your results including all important statistics. (4 points)
- Without actually running significance tests, comment on the actual signs and the strength of the estimated coefficients and the validity of the model. (4 points)
- Take the model that you have estimated in (b) above and write down separately the estimated salary functions for men and women with 12 years of schooling. (4 points)
- Use the 2 equations found in (d) above to calculate the maximum average hourly earnings (WAGE) for men and women (of a 12-year education level) and the years of experience required by these employees to reach their respective maximum WAGE levels. (5 points)
- Make a rough graph of the two equations found in (d) above to show the relationship between (WAGE) and (EXPR) for men and women with 12 years of schooling. Make sure to mark the years that it takes to reach the salary plateau (turning point) that you have found in (e) above. (4 points)
- On the basis of your findings, state briefly your conclusions with respect to the existence of gender discrimination in the market place? (5 points)
Question 3 (30 points)
You want to estimate the relationship between body fat and various physical characteristics of men. Download the file with data on "Body fat". The file contains a sample of the data that have been collected by a major American hospital for a specific year. The following variables are included:
PCBFi : The percent of body fat in the weight of the i-th person;
WEITi : The weight of the i-th person in pounds;
HEITi : The height of the i-th person in inches;
WAISi : The circumference of the waist of the i-th person in inches
You think that the following theoretical model would be appropriate for your study on body fat:
PCBF i = β 0 + β 1 *WEIT i + β 2 *HEIT i + β 3 * WAIS i + ε i
- Briefly hypothesize the signs of the coefficients. (2 points)
- Run the regression shown above and report your results in full reporting mode. (3 points)
- Without actually running the specific tests, evaluate the significance of the coefficients taken individually (signs and sizes), and collectively (F-ratio) (3 points)
- Evaluate the coefficient of determination. (2 points)
- Do you think that there is a problem with this model and why? (3 points)
- Suppose that you suspect that the model suffers from multicollinearity. Run a correlation test to measure the correlation coefficients of the regressors taken by two. What do you conclude? (2 points)
- Calculate the VIF coefficients for all three regressors and comment on them. (4 points)
- What can you do to cure multicollinearity? Would you drop one of the two collinear variables? (3 pts)
- Assuming that you think that, in spite of the multicollinearity, all variables are needed. You decide to create another variable that is a ratio of inches of Waist per inch of Height. Run the Percent of Body Fat against Weight and the ratio variable and report your results. (3 points)
- Run a correlation test of the new variables. What do you conclude? (2 points)
- Which regression you prefer? The one in (b) or the one in (i)? (3 points)
Deliverable: Word Document
