The data below are also contained in the emailed file gasbaby.txt : For each of 14 consecutive months,
- The data below are also contained in the emailed file gasbaby.txt : For each of 14 consecutive months, the variables shown are
gaslead = total gasoline lead in metric tons sold in the state of Massachusetts
bloodlead = mean lead concentration (micrograms/liter) in babies' blood (umbilical cords) the following month at a major Boston hospital.
Month gaslead bloodlead
1 141 6.4
2 166 6.1
3 161 5.7
4 170 6.9
5 148 7.0
6 136 7.2
7 169 6.6
8 109 5.7
9 117 5.7
10 87 5.3
11 105 4.9
12 73 5.4
13 82 4.5
14 75 6.0
For answering the questions below, you may assume that the assumptions of the Normal-errors Simple Linear Regression (SLR) model (using bloodlead as the response variable, gaslead as the regressor) have been checked and deemed reasonable.
- Construct and interpret a 95% confidence interval for the slope of the true regression line, 1 . Your interpretation should be phrased in such a way as to be understandable to as wide an audience as possible.
- Is there strong evidence here that 1 > 0? Provide hypotheses, a test statistic, P-value, and interpretation understandable to as wide an audience as possible.
- If in fact the error standard deviation for this SLR is = 0.6, and the true slope of the regression line is 1 = 0.01, what is the power of the test of the hypotheses of part (b) if using = 0.05?
- If in the first month following this study, 150 metric tons of lead in gasoline were sold in Massachusetts, obtain an interval which should contain (with 90% confidence) the mean lead concentration of babies’ umbilical cord blood the following month at this Boston hospital. Interpret your interval for as broad an audience as possible.
- Independent random samples of (1) chemical diabetics and (2) "normal" patients were selected, and blood sugar levels measured (units unknown). The normal population assumption was deemed reasonable for both samples. Summary statistics for the samples are shown below.
| Population | Sample size | mean | standard deviation |
|
36 | 99.306 | 9.489 |
| 2. Normal | 76 | 91.184 | 8.228 |
- Construct and interpret, for a general audience, a 95% confidence interval for 1 .
- Construct and interpret, for a general audience, a 95% confidence interval for 1 .
- Test the hypothesis of homoscedasticity between these two populations. Provide a test statistic, P-value, and an interpretation for a fellow statistician.
- Is there strong evidence here that 1 > 2 ? Test, giving a test statistic, P-value, and interpretation for a general audience; please assume homoscedasticity holds.
3) Below is a cumulative frequency table of ages for a class of 50 students in Statistics. That is, for each age, the "Cum. Freq." value gives the number of students in the class who are that age or younger.
| Age | Cum. Freq. |
| 17 | 1 |
| 18 | 17 |
| 19 | 30 |
| 20 | 40 |
| 21 | 46 |
| 22 | 50 |
- What is the median age of the students in this class?
- What is the mean age of the students in this class?
4) Consider a basic simple linear regression experiment with ten observations on equally-spaced X-values, X 1 = 11, X 2 = 12, X 3 = 13,…, X 10 = 20. Suppose that the Normal-errors SLR assumptions hold for the response variable Y, and the error standard deviation is = 5. How many times should this 10-run experiment be repeated in order to estimate the SLR slope parameter 1 accurate to within approximately 0.5 with a 95% confidence interval? (of course, once the experiment size has been decided, all runs will be done in completely random order).
5) An old-fashioned way to examine SLR assumptions is to plot residuals e i versus responses Y i , i =1,2,…,n. Suppose that you are serving as a consultant to a scientist who has carefully and apparently correctly performed a regression experiment and obtained the plot below. He is concerned that the plot suggests non-random behavior in the regression’s errors. What would you tell him?
6) Carefully justify the following statement in complete generality: if t is any random variable following a (central) Student’s t distribution with m degrees of freedom, where m is an integer, then F = t 2 has an F distribution with 1 and m degrees of freedom
Deliverable: Word Document
