Assignment 5. Multiple Regression Using SPSS and data from the 1992 Health and Retirement Study (available

Assignment 5. Multiple Regression

Using SPSS and data from the 1992 Health and Retirement Study (available on the course web page), create dummy variables for sex and race. There are too few people in the Native American, Asian, and Other categories to analyze separately, so your final race dummies should include White, Black, Hispanic, and Other (where other is a combination of Native American, Asian, and Other). I want you to create a dummy variable for every category of race - even though one of these four would have to be omitted from a regression model.
I want to see frequency distributions for the original variables, your syntax for creating the dummy variables, and frequency distributions for the new dummy variables.
Note: The HRS is a nationally representative biennial longitudinal survey (i.e., a panel study) of persons born between 1931 and 1941 (i.e., a cohort). We are using data from the 1992 wave only - thus, our data are cross-sectional. The HRS sample was selected using probability sampling.
Using SPSS and data from the 1992 HRS, grand mean center education and experience AFTER SELECTING CASES WITHOUT MISSING DATA. I want to see frequency distributions and statistics (the mean, minimum, maximum, and standard deviation) for the original and grand mean centered variables. I also want to see your syntax for creating the centered variables.
Using SPSS and data from the 1992 Health and Retirement Study, select respondents for your analysis who are currently working for pay. Hint: Click on Data $\rightarrow$ Select Cases $\rightarrow$ If condition is satisfied $\rightarrow$ If $\rightarrow$ and enter (by typing or clicking) "Work4pay $=2$ " in the empty box (top right) $\rightarrow$ Click Continue and verify that "unselected cases are filtered" (and not deleted) $\rightarrow$ Click on Paste and run the syntax from your syntax file or click on OK.
Using SPSS, the newly created variables, and the filter (from step 3), examine the relationships between income, sex, race, education, and experience. Specifically, run two multiple regression models in SPSS. In the first, regress income (the dependent variable) on sex and race (the independent variables). In the second, regress income (the-dependent variable) on sex, race, education, and experience (the independent variables). Be careful when specifying the models in SPSS - one of the four race dummy variables must be omitted or the models will not run. I would suggest omitting the white dummy variable.

Note - you can run the models in one or two steps. You should already know how to do it in two steps. To run the model in one step, enter the sex and race dummies as usual. Then click on the "Next" button and add education and experience. This will create two 'blocks' of variables and will allow you to perform a special type of $F$ test (see below-part h). If you choose this one-step option, be sure to select " $R$ squared change" in the Statistics box (this will generate the results for the special $F$ test).

Begin by interpreting the following statistics from the second model (including sex, race, education, and experience):

Bivariate correlations between all relevant variables (i.e., strength and direction, characterize the relationship, statistical significance). I would suggest creating a bivariate correlation matrix to aid your discussion (see the handout from Topic 6 for an example).
R-squared/adjusted r-squared (whichever is most appropriate).
The global $F$ test (#3 in the handout from Topic 9) in the ANOVA table (i.e., identify the null and research hypotheses, make and explain your decision, etc.).
The y-intercept (you do not need to test for significance from 0 ).
The slopes for all variables (i.e., direction, significance, characterize the relationship, statistical significance).
Use the standardized coefficients to identify the variable with the strongest effect.
After addressing a-f above, examine the output from the first model (including only sex and race). In particular, focus on the slope coefficients for the sex and race dummies. Address the following:
Interpret the slope coefficients for the sex and race dummy variables from the first model (i.e., direction, significance, characterize the relationship, statistical significance).
Conduct an $\mathrm{F}$ test (#2 in the handout from Topic 9) in order to compare the two models (i.e., identify the null and research hypotheses, make and explain your decision, etc.).
Respond to the following questions: how might your conclusions regarding the relationships between sex and income and race and income differ if you had ignored education and experience? Would you say that the relationships between sex and income and race and income are spurious?