Project_A Use of the best-subsets approach to model building Consider the file UNIVCOL.xls showing data
Project_A
Use of the best-subsets approach to model building
Consider the file UNIV&COL.xls showing data about universities and colleges concerning the type of term, location, the type of school, the average total SAT score, TOEFL score (less than 550, at least 550) for applicants of non-English speaker, room and board expenses, annual total cost, and the average indebtedness at graduation. The objective of this project is to find out if there is any relationship among variables using regression analysis techniques. You are to write a report about your findings after analyzing the data set. The following is a minimum guideline about what you should analyze.
State your statistical objective for this data set.
- Perform exploratory data analysis, such as numerical measures and/or the box-and-whisker plot for this data set.
- Construct scatter diagrams for pairs of variables. Describe the relationship that you may see. Do these appear to have some association (linear or non-linear)?
- Does the linear model appear to hold for some pair? You may want to run some testing to substantiate why or why not.
- Apply the best-subsets approach to model building to see if there is any variable that shouldn’t be used for this model.
- You observe that some universities on the east coast use higher SAT score or TOEFL score for admission. If you introduce one more variable by its location, east or west, divided along with Mississippi River to the data set, and use a dummy variable for these qualitative data, will this give you a meaningful (better) output for this model? Or, is there any new variable that you think can improve your analysis?
- Once you determine which variables are to be used, perform a multiple regression analysis, including collinearity, on this subset of variables.
- Summarize and comment on your results.
Deliverable: Word Document
