Using 2010 GSS data, regress education (EDUC, Y) upon gender (FEMALE, X1), age (AGE, X2), and # of siblings


Problem: Using 2010 GSS data, regress education (EDUC, Y) upon gender (FEMALE, X1), age (AGE, X2), and # of siblings (SIBS, X3). Recode SIBS (to SIBSNEW) so that everyone with 8 or more siblings has a value of 8 for the variable.

  1. Run frequencies of all variables you will use for the regression. Show that there is no "funny" values (such as -9 or 9 for 1-5 responses).
  2. Prepare histograms of all variables involved, and show where they are in the printout. Are there any outliers? Are there any gaps in the data? Are there skews in any variables? Is the dependent variables normally distributed? Very briefly comment.
  3. Obtain the correlation matrix. Show where in the printout. Very briefly comment on it.
  4. Fit the regression model to the data. State the estimated regression function. Briefly interpret b1 through b3.
  5. Plot the residuals against Y^, X1, X2, and X3. Also prepare a normal probability plot. Interpret the plots and summarize your findings.
  6. Test whether there is a regression relation at the .05 level. State the alternatives, decision rule, p-value, and conclusion.
  7. Calculate or identify adjusted R2 (R squared). How does this compare to R2? From the formula, when is there a large difference between them? What’s the implication for researchers?

7.101 Refer to 6.101. above.

  1. Obtain the ANOVA table that decomposes the regression sum of squares into extra sums of squares associate with X1, with X3 given X1, with X2 given X1 and X3.
  2. Test whether X1 can be dropped from the regression model given that X2 and X3 are retained at the .01 level. Use the F* test statistic. State the alternatives, decision rule, p-value, and conclusion.
  3. Obtain t* for the same test along with the p-level. Show how t* and F* are related to each other.
  4. Test whether X1 and X2 can be dropped from the model given that X3 is retained. Use alpha=.01.

State the alternative, decision rule, p-value, and conclusion.

7.103

  1. Identify and interpret standardized regression coefficients, b1*, b2*, and b3*.
  2. Calculate b1* from b1 and verify that this value is the same as b1* obtained in a. above.
  3. Identify which variable has the strongest relationship with Y. Explain your choice in a single

sentence.

7.104

  1. Regress (Y) education on Age (X2) only. State the fitted simple regression function.
    Hint: Since you want to compare this result to the multiple regression result, you want to include Female (X1) and Sibsnew (X3) in the regression statement (e.g. regression var=educ age femalesibsnew/statistics=....) even though you don’t include these variables in the equation. Otherwise, you end up having different numbers of cases.
  2. Compare this b2 to b2 obtained in 6.101.d. How would you explain the difference?
  3. Does SSR(X2) equal SSR(X2|X1,X3) here? If not, is the difference substantial?
  4. Refer to the correlation matrix in 6.101.d. Briefly comment on the inclusion of which variable brought the change in b2, Female or Siblings?

8.14

In a regression study of factors affecting learning time for a certain task (measured in minutes), gender of learner was included as a predictor variable (X2) that was coded X2 = 1 if male and 0 if female. It was found that b2 = 22.3 and s{b2} = 3.8. An observer questioned whether the coding scheme for gender is fair because it results in a positive coefficient, leading to longer learning times for males than females. Comment.

8.101 Go back to the estimated equation in 6.101.d.

  1. For each gender (male and female), estimate the equation to explain education (Y) from age (X2) and the number of siblings (X3). Are the effects of age and the number of siblings on education assumed the same or different for male and female? Briefly comment.
  2. Now someone claims that the effect of gender may depend on age. Briefly support this conjecture. No statistical language is necessary.
  3. Fit the model including this interaction (along with Female, Age, and Sibsnew). Show the estimated equation. Test whether the interaction term can be dropped from the model at the .05 level.
    State the alternative, decision rule, p-value, and conclusion. Is this interaction "meaningful?" Briefly explain why or why not.
  4. Suppose that we retained this interaction term. State the estimated regression equations for each gender. Explain in plain English the nature of this interaction. Are the effects of age and the number of siblings on education assumed the same or different for male and female? Briefly comment.

8.27

An analyst wishes to include number of older siblings in family as a predictor variable in a regression analysis of factors affecting maturation in eighth graders. The number of older siblings in the sample observations ranges from 0 to 4. Discuss whether this variable should be placed in the model as an ordinary quantitative variable or by means off our 0, 1 indicator variables.

8.102

  1. Now someone claims that the effect of age on education is non-linear. Briefly support this conjecture. No statistical language is necessary.
  2. Fit the model including the squared term of this variable. Do not include the interaction term you examined in 8.101 above. Show the estimated equation. Test whether the squared term can be dropped from the model at the .05 level. State the alternative, decision rule, p-value, and conclusion.
    Is this effect of the squared term "meaningful?" Briefly explain why or why not.
  3. Suppose that we retained this squared term. For a male who is the only child, show the estimated equation in a graphic form (second-order polynomial response).
Price: $47.66
Solution: The downloadable solution consists of 26 pages, 2166 words and 24 charts.
Deliverable: Word Document


log in to your account

Don't have a membership account?
REGISTER

reset password

Back to
log in

sign up

Back to
log in