Assignment Note: All answers must completed in SAS. All SAS input and output must pasted in to the solutions.
Assignment
Note: All answers must completed in SAS. All SAS input and output must pasted in to the solutions.
Data Set C.10 Disease Outbreak
This data set provides information from a study based on 196 persons selected in a probability sample within two sectors in a city. Each line of the data set has an identification number and provides information on 5 other variables for a single person. The 6 variables are:
2. In this question you will illustrate some of the ideas related to the extra sums of squares.
- Create a variable called SUM, which equals to the summation of any two predictors and run the following two regression models without the variables you used to create SUM:
- predict the response using all the explanatory variables;
- predict the response using all the explanatory variables including SUM.
Calculate the extra sum of squares for the comparison of these two analyses. Use it to construct the F-statistic - in other words, the general linear test statistic - for testing the null hypothesis that the coefficient of the SUM variable is zero in the model with all predictors. What are the degrees of freedom for this test statistic?
b. Use the test statement in proc reg to obtain the same test statistic. Give the statistic, degrees of freedom, p-value and conclusion.
c. Compare the test statistic and p-value from the test statement with the individual t-test for the coefficient of the SUM variable in the full model. Explain the relationship.
3. Use the \(C_{p}\) criterion to select the best subset of variables for your data (i.e. use the options " / selection = cp b; "). Use the original and transformed variables, not SUM. Summarize the results and explain your choice of the best model.
Deliverable: Word Document
