Assignment Note: All answer s must completed in SAS . All SAS output must pasted in to the solutions.
Assignment
Note: All answer s must completed in SAS . All SAS output must pasted in to the solutions. Also, SAS input file should be attached.
Question 1
A regression analysis relating scores in an aptitude test after training (Y) to average daily training hours (X) produced the following fitted question:
= 25 + 1.4x.
- What is the fitted value of the response variable corresponding to x = 5?
- What is the residual corresponding to the data point with x = 8 and y = 36.2? Is the point above or below the line?
-
If x increases 2 units, how does
change?
- An additional test score is to be obtained for a new observation at x = 7. Would the test score for the new observation necessarily be 25 ? Explain.
- The error sums of squares (SSE) for this model was found to be 9. If there were n = 20 observations, provide the best estimate for 𝜎2.
Question 2
Explain the difference between the following two equations: Also, describe the distribution of 𝑌𝑖 and, 𝜖𝑖 . What is the meaning of 𝛽0 and 𝛽1?
Yi = b0 + b1Xi
𝑌𝑖 =𝛽0+𝛽1𝑋𝑖 +𝜖𝑖
Question 3
For this problem, use the "grade point average" data described in KNNL Problem #1.19. The data are on the disk that accompanies the text and can also be found as (CH01PR19.DAT) file. (See attachment). Make sure you understand which column is X and which is Y and read in the data accordingly. See Topic 1 or knnl054.sas for an example of how to read in a data file. (See attachment)
- Plot the data using proc gplot. Include a smoothed function on the plot by preceding the plot statement with "SYMBOL1 v = square i = smNN" where NN is a number between 1 and 99. Note that larger numbers cause greater smoothing. Make sure to indicate the smoothing number in the title of the plot. Is the relationship approximately linear?
- Run a linear regression to predict GPA based on the entrance exam. Give the complete ANOVA table for this regression.
- Give a point estimate and a 90% confidence interval for the slope and intercept and interpret each of these in words. (Point estimate is another word for parameter estimate.
- Would it be reasonable to consider inference on the intercept for this problem? Please provide justification for your answer.
Problem 1.19 from the book
Question 4
For this problem, use the "plastic hardness" data described in the text with problem 1.22 (CH01PR22.DAT) Make sure you understand which column is X and which is Y and read in the data accordingly.
- Plot the data using PROC GPLOT. Include a regression line on the plot (v= square i = rl). Is the relationship approximately linear?
-
Run the linear regression to predict hardness from time. Give
- the linear model used in this problem and,
- the estimated regression equation.
- Describe the results of the significance test for the slope. Give the hypotheses being tested, the test statistic with degrees of freedom, the p-value, and your conclusion in sentence form.
- Explain why or why not inference on the intercept is reasonable (i.e. of interest) in this problem.
Problem 1.22 from the book
Question 5
An investigative study collected 150 samples of observations from the river Wabash, at random locations near Lafayette. Each observation consisted of a measure of water pH (X) and fish count (Y). The researchers are interested in how the acidity of the water affects the number of fish. Complete the following ANOVA table for the regression analysis. State the null and alternative hypotheses for the F-test as well as your conclusion in sentence form. You may use the critical F (critical t) approach or the p-value approach.
| Source | Degrees of freedom | SS | MS | F-Value |
| Model | 46.7 | |||
| Error | 83.7 | |||
| Total |
Deliverable: Word Document
