Assignment Note: Unless otherwise stated, use 𝛼 = 0.05 . Manual calculation is acceptable Using the matrices
Assignment
Note:
- Unless otherwise stated, use 𝛼 = 0.05 .
- Manual calculation is acceptable
- Using the matrices from least squares estimation, what are the entries of the vector 𝑋′ 𝜖 ?
- For a model with 4 predictors and 50 observations, if 𝑅2 (R square) is 0.35 what is the test statistic for the ANOVA F test?
- From an observed data set it is found that the correlation coefficient between X and Y is 0. Does it means there is no relation between the two variables at all? Explain why or why not?
- You are using the 𝐶𝑝 criterion to select a subset of explanatory variables in a multiple regression. The best subset of size 3 gives 𝐶𝑝 = 8. Interpret this result.
- In a one-way ANOVA with three groups, you are interested in the difference twice the mean of the first group and the sum of the means of the other two groups. If the MSE is 50, and the sample sizes are 15, 6, and 15, find the standard error for this difference. (Specify the contrast you use.)
6. (6 pts) Suppose you run a regression with response vector \(Y\) and design matrix \(\mathrm{X}\). Suppose further that \(\hat{Y}\) is the vector of predicted values based on the least squares estimates. If you then regress \(\hat{Y}\) on \(\mathrm{X}\), what would be the new predicted values \(\hat{Y}_{n e w}\) ? (Hint: \(\hat{Y}=\mathrm{HY}\).)
7. In multiple testing, the concept of coherence means that if any single parameter hypothesis (e.g., H0 : 𝛽1 = 0) is rejected, then all joint null hypotheses containing that null hypothesis (e.g.,H0 : 𝛽1 =𝛽2 = 0) are rejected. On the other hand, the concept of consonance means that if a joint null hypothesis is rejected, then at least one component hypothesis is rejected. (That is, if H0 : 𝛽1 =𝛽2 = 0 is rejected, then either H0 : 𝛽1 = 0 is rejected or H0 : 𝛽2 = 0 is rejected or both are rejected.) Which of these two concepts is violated in the presence of serious multicollinearity?
8. Two different laboratory procedures were used to find the amount of calcium in a collection of orange juice samples. The correlation between the two measures was 0.99. Does this imply that the value given by the first procedure was approximately the same as the value given by the second procedure for each of the samples analyzed? Explain why or why not.
9. The 95% confidence interval for a regression coefficient is (4.2, 9.6). What can you say, if anything, about the p-value of the hypothesis test that this coefficient is zero?
10. (6 pts) When running a quadratic regression, you "center" the explanatory variable by subtracting its mean \(\bar{X}=1\). The estimated coefficients for the model in this form are \(b_{0}=1.2, b_{1}=3, b_{2}=\) $4.5$, and \(b_{3}=-1\). Write the fitted model in terms of \(X, X^{2}\), and \(X^{3}\), i.e., give a 0 , a1, a2, and a3 in \(\hat{Y}=a 0+a_{1} X+a_{2} X^{2}+a_{3} X^{3}\). (Hint: use the expansion of \((x-c)^{2}\) and \((x-c)^{3}\).
11. Data were collected on a sample of 25 women and 25 men. Separate linear regressions were run for each group. For women, the slope was 30 with a standard error of 6; for men the slope was20 with a standard error of 10.
- Calculate the value of the t-statistic that you would use to compare the two slopes and give its degrees of freedom.
b. Explain how you would model the two group together in one single model. Define the additional variable you would need to adjust for two subgroups.
12. If you see that the square of the group means, and the group variance are linearly related, which transformation should you use to stabilize the error variance?
Solution: I would use a log transformation, which will have the effect of stabilizing the distribution of the variable.
13. After transforming the response variable, which diagnostic values for checking influence will not change?
14. (8 pts) In the cell means model parameterization of one-way ANOVA, the regression design matrix \(\mathrm{X}\) is such that \(X^{\prime} X=n I\) where \(I\) is identity matrix. If one applies ridge regression shrinkage to the regression, thereby producing the parameter estimates via the formula
$ \(\hat{\beta}_{\text {ridge }}=\left(X^{\prime} X+\lambda I\right)^{-1} X^{\prime} y\) $
- What will be the resulting cell means estimates \(\hat{\mu}_{i}(i=1, \ldots, p)\) based on the regression parameter estimates \(\hat{\beta}_{\text {ridge }}\) ?
- Explain how does ridge regression helps in solving the problem multicollinearity?
15. (6 pts) You run a one-way ANOVA with three groups and calculate an F-statistic, say \(F_{3-1}\), for comparing the means of groups 1 and 3. An examination of the residuals reveals a very large outlier in group 2. It turns out that this outlier was due to data mishandling so, a sensible choice would be drop it from the analysis. If you rerun the analysis without the outlier, what change in F3-1 and its p-value do you expect? Explain why you expect this change.
Deliverable: Word Document
