STATS For Questions 1 2 , use the following R commands to download and read the data into R, or you can
STATS
For Questions 1 & 2 , use the following R commands to download and read the data into R, or you can follow the links provided in the questions below.
Question 1 data:
Q1data = read.csv("Devore/TH/q1.txt")
Question 2 data:
Q2data = read.csv("TH/q2.txt")
Question 1: To study the relationship between soil pH (x) and A1 (y), use the data found here .
- Find the best polynomial model for this data.
- Using the exponential family of transformations of X and/or Y fit the appropriate model.
- Which of these models in parts a) and b) is better, and how do you know?
- Find a 94% Confidence interval for each of the j ’s in your model from part a) and interpret.
- Find a 93% Prediction Interval for y, when x = 4.25 using the model in part b). Does the point estimate for y when x = 4.25 using model a) fall in this interval?
Question 2: The pull strength of a wire bond is an important characteristic. The data found here gives information on pull strength (y), die height (x 1 ), post height (x 2 ), loop height (x 3 ), wire length (x 4 ), bond width on the die (x 5 ), and bond width on the post (x 6 ).
- Analyze this data to find which linear model is the best fit for this data (give details on how you decided which model is best.
- Report the amount of variation explained by the model you chose in part a).
- Report your estimate of
- Find a 97% Confidence interval for each of the j ’s in your model, and interpret.
- Holding all else fixed, how does a unit change in x 4 change the average value of y?
- For a specimen with x 1 = 5.5, x 2 = 19.3, x 3 = 30.2, x 4 = 90, x 5 = 2, and x 6 =1.85 find the predicted value of y (hint: you may not need all of these values since you may not include all of these variables in you model).
Questions 3 – 12 are multiple choice questions. Please write your answer in the blank space provided in front of the question number.
___C______3. In a single-factor ANOVA problem involving five populations or treatments, which of the following statements are true about the alternative hypothesis?
- All five population means are equal.
- All five population means are different.
- At least two of the population mean are different.
- At least three of the population mean are different.
- At most, two of the population means are equal.
___D______4. The distribution of the test statistic in single-factor ANOVA is the
- binomial distribution
- normal distribution
- t distribution
- F distribution
- None of the above answers are correct.
____E_____5. In the simple linear regression model \(Y={{\beta }_{0}}+{{\beta }_{1}}x+\varepsilon ,\text{ }\) which of the following statements are not required assumptions about the random error term \(\varepsilon \) ?
- The expected value of \(\varepsilon \) is zero.
- The variance of \(\varepsilon \) is the same for all values of the independent variable x .
- The error term is normally distributed.
- The values of the error term are independent of one another.
- All of the above are required assumptions about \(\varepsilon \).
____D_____6. Which of the following statements are not true?
- The predicted value \({{\hat{y}}_{i}}\) is the value of y that we would predict or expect when using the estimated regression line with \(x={{x}_{i}}\,.\)
- The predicted value \({{\hat{y}}_{i}}\) is the height of the estimated regression line above the value \({{x}_{i}}\) for which the \({{i}^{th}}\) observation was made.
- The residual \({{y}_{i}}-{{\hat{y}}_{i}}\) is the difference between the observed \({{y}_{i}}\) and the predicted \({{\hat{y}}_{i}}\,.\)
- If the residuals are all large in magnitude, then much of the variability in observed y values appears to be due to the linear relationship between x and y , whereas many small residuals suggest quite a bit of inherent variability in y relative to the amount due to the linear relation.
____C_____7. A data set consists of 15 pairs of observations \(({{x}_{1}},{{y}_{1}}),({{x}_{2}},{{y}_{2}}),........({{x}_{15}},{{y}_{15}}).\) If each \({{x}_{i}}\) is replaced by \(3{{x}_{i}}\) and if each \({{y}_{1}}\) is replaced by \(4{{y}_{i}},\) then the sample correlation coefficient r
- increases by 3/15
- increases by 4/15
- remains unchanged
- decreases by 3/15
- decreases by 4/15
____D_____8. A multiple regression model has
- One independent variable.
- Two dependent variables
- Two or more dependent variables.
- Two or more independent variables.
- One independent variable and one dependent variable.
___C______9. The coefficient of multiple determination R is
- SSE / SST
- SST / SSE
- 1- SSE / SST
- 1- SST / SSE
- ( SSE + SST ) / 2
_________10. For a multiple regression model, \(\sum{{{({{y}_{i}}-\bar{y})}^{2}}=250}\), and \(\sum{{{({{y}_{i}}-{{{\hat{y}}}_{i}})}^{2}}=60}\), then the proportion of the total variation in the observed \({{y}_{i}}\) ’s that is not explained by the model is
- .76
- .24
- 310
- 190
- .52
_________11. In multiple regression analysis with n observations and k predictors (or equivalently k +1 parameters), inferences concerning a single parameter \({{\beta }_{i}}\) are based on the standardized variable \(T=({{\hat{\beta }}_{i}}-{{\beta }_{i}})/{{S}_{{{{\hat{\beta }}}_{i}}}}\), which has a t -distribution with degrees of freedom equal to
- n - k +1
- n - k
- n - k -1
- n + k -1
- n + k +1
_________12. If you have the following multiple regression model, \(\hat{Y}=5+3{{x}_{1}}+2{{x}_{2}}\). As \({{x}_{1}}\) decreases by 2-units, while holding \({{x}_{2}}\) fixed, then y will be expected to
- increase by 8
- increase by 6
- increase by 3
- decrease by 3
- decrease by 6
Deliverable: Word Document
