A substance used in biological and medical research is shipped by airfreight to users in cartons of 1000
Problem: A substance used in biological and medical research is shipped by airfreight to users in cartons of 1000 ampules. The data below, involving 6 shipments, were collected on the number of times the carton was transferred from one aircraft to another over the shipment route (X) and the number of ampules found to be broken upon arrival (Y). Assume that a first order regression model (i.e. no quadratic or higher terms included) is appropriate.
| i : 1 2 3 4 5 6 |
| X i 1 0 2 1 0 2 |
| Y i 3 4 5 3 4 5 |
- Using matrix algebra obtain the estimated regression line. (10 points)
- Conduct a t-test and an F-test with α=0.05 to decide whether there is a linear association between Y and X. State the hypotheses and the distribution of the test-statistic under Ho. Compare the results of the two tests. (5 points)
- How much of the variance of Y can be explained by our model (5 points)
Problem: Another researcher copied the study using different data, a different n, and introducing two additional variables which measure the strength of the carton material (X 2 ) and a resistance factor of the glass of the ampules (X 3 ).
Below are some outputs of different models the researcher fitted to the data.
ANOVA
| Model | Sum of Squares | df | Mean Square | F | Sig. | ||
| 1 | Regression | 160.000 | 1 | 160.000 | 72.727 | .000 | |
| Residual | 17.600 | 8 | 2.200 | ||||
| Total | 177.600 | 9 |
a Predictors: (Constant), X1
b Dependent Variable: Y
ANOVA
| Model | Sum of Squares | df | Mean Square | F | Sig. | ||
| 1 | Regression | 160.417 | 2 | 80.208 | 32.675 | .000 | |
| Residual | 17.183 | 7 | 2.455 | ||||
| Total | 177.600 | 9 |
a Predictors: (Constant), X2, X1
b Dependent Variable: Y
Coefficients
| Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||||
| Model | B | Std. Error | Beta | ||||
| 1 | (Constant) | 11.283 | 2.721 | 4.146 | .004 | ||
| X1 | 3.958 | .506 | .939 | 7.828 | .000 | ||
| X2 | -.417 | 1.011 | -.049 | -.412 | .693 |
a Dependent Variable: Y
ANOVA
| Model | Sum of Squares | df | Mean Square | F | Sig. | ||
| 1 | Regression | 160.993 | 3 | 53.664 | 19.388 | .002 | |
| Residual | 16.607 | 6 | 2.768 | ||||
| Total | 177.600 | 9 |
a Predictors: (Constant), X3, X1, X2
b Dependent Variable: Y
a. Find SSR(X 2 | X 1 ) and SSR(X 2 ,X 3 | X 1 ) (5 points)
b. The researcher fitted a full model including all predictors. Use extra sum of squares to decide whether both X 2 and X 3 can be dropped simultaneously, alpha = .05. State the hypotheses and the distribution of the test-statistic under Ho. (5 points)
c. Find R 2 Y2|1 and r Y2|1 and R 2 Y3|1 , 2 (5 points)
Problem: Judge whether the following statements are correct. If a statement is incorrect, correct it. If it is correct, indicate this. Justify your answer in three sentences or fewer.
A matrix which is idempotent is its own transpose.
Problem: Judge whether the following statements are correct. If a statement is incorrect, correct it. If it is correct, indicate this. Justify your answer in three sentences or fewer
A series of t-tests for β k ’s is equivalent to performing an F test for multiple β k ’s if we use Bonferroni correction.
Problem: Judge whether the following statements are correct. If a statement is incorrect, correct it. If it is correct, indicate this. Justify your answer in three sentences or fewer
Confidence limits for an average response are more sensitive to departures from normality than prediction limits for new observations.
Problem: Judge whether the following statements are correct. If a statement is incorrect, correct it. If it is correct, indicate this. Justify your answer in three sentences or fewer
A paired-sample t-test will have greater power than a two independent samples t-test.
Problem: Judge whether the following statements are correct. If a statement is incorrect, correct it. If it is correct, indicate this. Justify your answer in three sentences or fewer
For multiple regression, staying within the ranges of each of the predictor variables will avoid extrapolation.
Q uestion # 3 : Diagnostics (8 points total )
a) Can you diagnose any problem regarding the fitted model from the residual plot a?
What kind of remedies do you suggest? (4 points)
b) Can you diagnose any problem regarding the fitted model from the residual plot b?
What kind of remedies do you suggest? (4 points)
Question #4 : Finger Dexterity ( 20 points)
A researcher is interested in the relationship between the measure of finger dexterity (X) and another measure representing general muscular coordination (Y). A random sample of 10 subjects was collected.
| X | Y |
| 75 | 84 |
| 77 | 94 |
| 75 | 90 |
| 76 | 90 |
| 73 | 87 |
| 72 | 80 |
| 77 | 100 |
| 76 | 91 |
| 74 | 85 |
| 75 | 99 |
From the above table, we obtained standard error of intercept is 69.29540.
\[{{b}_{0}}=?\] \[s\{{{b}_{0}}\}=69.30\]\[{{b}_{1}}=2.92\] \[s\{{{b}_{1}}\}=0.92\]
a) If given X = 76, the predicted Y = 93.17, what is \[{{b}_{0}}\] ? (3 points)
b) Complete the ANOVA table below. (6 points)
| ANOVA | ||||
| Source | Degree of freedom | Sum of square | Mean Square | F-statistic |
| Regression | 1 | 204.17 | 204.17 | 9.969 |
| Error | 8 | 163.83 | 20.48 | |
| Total | 9 | 368 | ||
c) Construct 95% confidence interval for the slope. (5 points)
d) Perform F test for lack of fit. State null and alternative hypothesis, perform the test and state your conclusion. \[\alpha =0.05\] . (6 points)
Given: \[SSE=163.83\] \[SSPE=132.50\]
Question #5 Model Design : (7 points total)
(a) You have conducted a survey in class of introduction to statistics. You obtained 60 participants information such as gender, year of study. You also collect some data about how many hours per week spending on study; number of homework turned in and the scores for the midterm exam. You are interested in constructing a regression model to predict students’ final exam scores.
State a regression model (For simplicity, assume there are no interactions). (4 points)
(b) Indicate the type of every variable in the model. (3 points)
Question #6 Multiple Choice (2 points each, 20 points total)
Choose the best answer for each of the following.
1. When planning a two independent sample study for which you plan on using the pooled variance sample t test, you should be confident that:
A) the sample variances, s, are be the same.
B) the populations variances are the same.
C) the degrees of freedom are as low as possible.
D) Both A and B.
2. You have found a statistically significant linear relationship between variables X and Y in a simple linear regression. In fact, you have a very small p .
A) You can state you have a strong relationship between X and Y.
B) You can state that you have strong support for a linear relationship between X and Y.
C) The slope of the regression line is closer to 1 than to 0.
D) All of the above.
3. Place the following approaches to calculating degrees of freedom for a two independent sample t-test into order of increasing power:
A) pooled two sample, conservative estimate, Welch-Satterthwaite solution.
B) Welch-Satterthwaite solution, conservative estimate, pooled two sample.
C) Welch-Satterthwaite solution, pooled two sample, conservative estimate (smaller).
D) conservative estimate, Welch-Satterthwaite solution, pooled two sample.
4. A simple linear regression model assumes that:
A) errors are independent and mutually exclusive.
B) both the predictor and response variables are random variables.
C) errors have identical variance.
D) all of the above.
5. In the phrase "general linear regression model," linear means:
A) predictor variables are not raised to a power.
B) the response surface is linear, or flat.
C) the beta parameters are linear.
D) the response variable is linear.
6. The assumption of normality in the errors of a general linear model allows us to:
A) state that our estimators for our betas are of minimum variance.
B) state that our estimators for our betas are unbiased.
C) use least squares to find our beta estimates.
D) state that our errors are uncorrelated.
7. Multicollinearity results in:
A) bias in the estimates of Y-hats.
B) explosion in the size of the determinate of (X’X) -1 .
C) bouncing betas.
D) all of the above.
8. If you wish to make a prediction for a new observation, you should:
A) arrange your study so the new observation is in a dense cluster of instances of predictors.
B) arrange your study so the new observation is as close as possible to the y-intercept.
C) arrange your study so the new observation is in the center of the instances of predictors.
D) minimize SSTO by clustering instances of predictors.
9. For multiple regression, if you add predictor variables, R 2 adjusted will trend
A) down, while SSE goes down.
B) up, while SSE goes up.
C) neither up nor down, while SSE goes down.
D) neither up nor down, and SSE remains the same.
10. The normal equations
A) provide the basis for the assumption of normality in linear regression.
B) provide a mechanism to find beta estimates.
C) are valid only under the assumption of non-zero beta’s.
D) all of the above.
Deliverable: Word Document
