Instructions Your project will consist of finding a data set of your choice. You should have a data set
Instructions
Your project will consist of finding a data set of your choice. You should have a data set that contains at least three \(X\) variables. The \(X\) variables can be continuous and/or categorical variables.
In the case of a regression problem, you should identify one additional continuous variable (dependent variable) which you will be able to make predictions based on any observation vector on your X variables. You should then use either additional data or partitioned test set to estimate your mean squared error (MSE), correlation coefficient and coefficient of determination of the observed and predicted values. You should also report the MSE, correlation coefficient and coefficient of determination for your trained model.
The Data Set
To be appropriate for this project, the data set must contain at least three $X$ variables (independent variable) and one additional variable (dependent variable). The data set should also have at least 50 cases. When you have chosen your data set, you should write a brief description (a paragraph) of the data set, including a description of all the basic variables and the number of observations to show me and make sure the data set is appropriate.
Building the Model
Regression
We have discussed correlation analysis briefly and regression modeling in more detail. You should use this information build and evaluate your regression model. You should build your linear regression model (80/20 partition split if you choose), checking at least roughly for normality and transforming variables as needed, then evaluating your model results (evaluating the MSE, Correlation Coefficient and Coefficient of Determination) as well as using the partitioned test set to estimate the MSE, Correlation Coefficient and Coefficient of Determination.
The Memo
Each person will write, and turn in a project report/memo. It should contain the following parts:
- A two to three printed page summary of the problem your model attempts to solve, the data set used, the final model, and your main findings. The focus here is to write a coherent narrative that states the high points of what you found in your project.
- If necessary, include appendices that give the most important printouts or reproduces parts of those printouts of any computer programs or spreadsheets used to do the analysis.
- In addition, the project report, data and computer code should be put in a zip file and emailed to me.
Deliverable: Word Document
