STATS For Questions 1 2 , use the following R commands to download and read the data into R, or you can

STATS

For Questions 1 & 2 , use the following R commands to download and read the data into R, or you can follow the links provided in the questions below.

Question 1 data:

Q1data = read.csv("Devore/TH/q1.txt")

Question 2 data:

Q2data = read.csv("TH/q2.txt")

Question 1: To study the relationship between soil pH (x) and A1 (y), use the data found here .

Find the best polynomial model for this data.
Using the exponential family of transformations of X and/or Y fit the appropriate model.
Which of these models in parts a) and b) is better, and how do you know?
Find a 94% Confidence interval for each of the  _j ’s in your model from part a) and interpret.
Find a 93% Prediction Interval for y, when x = 4.25 using the model in part b). Does the point estimate for y when x = 4.25 using model a) fall in this interval?

Question 2: The pull strength of a wire bond is an important characteristic. The data found here gives information on pull strength (y), die height (x ₁ ), post height (x ₂ ), loop height (x ₃ ), wire length (x ₄ ), bond width on the die (x ₅ ), and bond width on the post (x ₆ ).

Analyze this data to find which linear model is the best fit for this data (give details on how you decided which model is best.
Report the amount of variation explained by the model you chose in part a).
Report your estimate of  ^
Find a 97% Confidence interval for each of the  _j ’s in your model, and interpret.
Holding all else fixed, how does a unit change in x ₄ change the average value of y?
For a specimen with x ₁ = 5.5, x ₂ = 19.3, x ₃ = 30.2, x ₄ = 90, x ₅ = 2, and x ₆ =1.85 find the predicted value of y (hint: you may not need all of these values since you may not include all of these variables in you model).

Questions 3 – 12 are multiple choice questions. Please write your answer in the blank space provided in front of the question number.

___C______3. In a single-factor ANOVA problem involving five populations or treatments, which of the following statements are true about the alternative hypothesis?

All five population means are equal.
All five population means are different.
At least two of the population mean are different.
At least three of the population mean are different.
At most, two of the population means are equal.

___D______4. The distribution of the test statistic in single-factor ANOVA is the

binomial distribution
normal distribution
t distribution
F distribution
None of the above answers are correct.

____E_____5. In the simple linear regression model $Y={{\beta }_{0}}+{{\beta }_{1}}x+\varepsilon ,\text{ }$ which of the following statements are not required assumptions about the random error term $\varepsilon $ ?

The expected value of $\varepsilon $ is zero.
The variance of $\varepsilon $ is the same for all values of the independent variable x .
The error term is normally distributed.
The values of the error term are independent of one another.
All of the above are required assumptions about $\varepsilon $.

____D_____6. Which of the following statements are not true?

The predicted value ${{\hat{y}}_{i}}$ is the value of y that we would predict or expect when using the estimated regression line with $x={{x}_{i}}\,.$
The predicted value ${{\hat{y}}_{i}}$ is the height of the estimated regression line above the value ${{x}_{i}}$ for which the ${{i}^{th}}$ observation was made.
The residual ${{y}_{i}}-{{\hat{y}}_{i}}$ is the difference between the observed ${{y}_{i}}$ and the predicted ${{\hat{y}}_{i}}\,.$
If the residuals are all large in magnitude, then much of the variability in observed y values appears to be due to the linear relationship between x and y , whereas many small residuals suggest quite a bit of inherent variability in y relative to the amount due to the linear relation.

____C_____7. A data set consists of 15 pairs of observations $({{x}_{1}},{{y}_{1}}),({{x}_{2}},{{y}_{2}}),........({{x}_{15}},{{y}_{15}}).$ If each ${{x}_{i}}$ is replaced by $3{{x}_{i}}$ and if each ${{y}_{1}}$ is replaced by $4{{y}_{i}},$ then the sample correlation coefficient r

increases by 3/15
increases by 4/15
remains unchanged
decreases by 3/15
decreases by 4/15

____D_____8. A multiple regression model has

One independent variable.
Two dependent variables
Two or more dependent variables.
Two or more independent variables.
One independent variable and one dependent variable.

___C______9. The coefficient of multiple determination R is

SSE / SST
SST / SSE
1- SSE / SST
1- SST / SSE
( SSE + SST ) / 2

_________10. For a multiple regression model, $\sum{{{({{y}_{i}}-\bar{y})}^{2}}=250}$, and $\sum{{{({{y}_{i}}-{{{\hat{y}}}_{i}})}^{2}}=60}$, then the proportion of the total variation in the observed ${{y}_{i}}$ ’s that is not explained by the model is

_________11. In multiple regression analysis with n observations and k predictors (or equivalently k +1 parameters), inferences concerning a single parameter ${{\beta }_{i}}$ are based on the standardized variable $T=({{\hat{\beta }}_{i}}-{{\beta }_{i}})/{{S}_{{{{\hat{\beta }}}_{i}}}}$, which has a t -distribution with degrees of freedom equal to

n - k +1
n - k
n - k -1
n + k -1
n + k +1

_________12. If you have the following multiple regression model, $\hat{Y}=5+3{{x}_{1}}+2{{x}_{2}}$. As ${{x}_{1}}$ decreases by 2-units, while holding ${{x}_{2}}$ fixed, then y will be expected to