Final Project Your assignment is to create a report analyzing the determinants of student GPA. The dataset
Final Project
Your assignment is to create a report analyzing the determinants of student GPA. The dataset is a cross-sectional data set with 50 observations, corresponding to 50 randomly selected students at State University in the year 2012. (Note: this is a fictional college. Do not draw any conclusions about the real determinants of GPA at real colleges from this data set!)
Download the file project.csv from within Blackboard and complete the analysis in R. Please note, each of you will be working from a similar, yet unique, dataset. I will grade your project using the data generated uniquely for you - this means you cannot "share" data.
The definitions for the variables are as follows:
- GPA = student's grade point average at end of freshman year, on a 4.0 scale.
- SAT = student's score on the Scholastic Aptitude Test (SAT), verbal + math score. This is a measure of academic ability prior to enrollment in college, with a maximum score of 1600 .
- STUDY = number of hours per week the student reported studying.
- WORK = number of hours per week the student reported working.
- MAJOR = student's declared major.
Generally, your assignment is to create a document that includes:
- Description of data used.
- Model results within a single nicely output table. Be sure to include estimated parameters, standard errors, regression summary statistics, and notes regarding significance.
- Discussion of model results - estimated coefficients, model significance, coefficient significance, etc. Specifically, you must include the following within the narrative of your report:
- Describe each of the variables (please use summary statistics, tables, and/or plots as appropriate).
- Discuss how you would expect each explanatory variable to affect grades.
- Perform a multiple regression (saving the results to "model1") using the following model:
\(GPA=\beta_{0}+\beta_{1} S T U D Y+\beta_{2} W O R K+\epsilon\)
- Test the hypothesis that science majors (physics, biology, and physics) have a different GPA than non-science majors (use the \(5 \%\) significance level). You will need to create an indicator variable based on the variable "MAJOR." The new variable should equal 1 for any student in a natural science major (physics, biology, or chemistry) and 0 otherwise. Call this variable "SCIENCE."
- Perform a regression as above, including SCIENCE as an additional variable (saving as model2).
- Suppose you believe there might be a nonlinear effect of studying on grades. Accordingly, estimate an additional model (model3) that includes "study time squared" as an additional variable (name the new variable STUDYSQ).
- Using the results from model3, explain the meaning of STUDY and STUDYSQ. What effect will studying one more hour per week have for a student who currently only studies 1 hour per week? What effect will it have for a student who already studies 6 hours per week?
- Interpret all regression results. Specifically, you should discuss (a) the meaning of each slope coefficient, including an explanation of the effect of the variable on GPA and whether the coefficient has the sign you would expect, (b) the statistical significance of each coefficient, and (c) the explanatory power of the regression as a whole, using both R-squared and F-stat. (Be sure to put your explanations of slope coefficients in terms of the original units of measure, as given in the original data file.)
- Think carefully about the possible determinants of a student's GPA. What other variables do you think might be relevant? How would you go about including them in the regression?
- Can you think of any other problems or difficulties with the approach we've used?
Deliverable: Word Document
