Multiple linear regression: Select an outcome (dependent) variable and at least 3 predictor (independent)
- Multiple linear regression:
Select an outcome (dependent) variable and at least 3 predictor (independent) variables appropriate for a multiple linear regression. Ensure that at least one of your predictor/independent variables is a variable that you would include in the model as dummy (indicator) variable(s).
- Run descriptive statistics to explore the variables, and recode your dummy variable(s). Interpret your results. ( 3 points)
- Run a multiple linear regression including appropriate diagnostic tests to examine for assumptions. Interpret your results. (2 points analysis; 5 points discussion of results and diagnostics)
2. Nonparametric tests:
Pick 2 of the following tests, run on appropriate data, and interpret your results ( 2 points each):
- Spearman's rho,
- Mann-Whitney U,
- Kruskal-Wallis analysis of variance
- Pearson's chi-square.
Hand in your output with annotations (copied to a Word document preferred) and include the syntax. Generally, more description of your logic/work and results is better than less.
Solution:
-
For this model, we’ll use
Educ
,
Size of Place
, and
Marital Status
to predict the variable
Respondent’s Income
. We’ll use
Marital Status
as the dummy variable by recoding it in the following way:
maritcat = 1 if the respondent is married, 0 if not.
Hence, we are going to estimate the following regression equation:
Using SPSS we get the following results:
The descriptive statistics above show a picture of the main characteristics of the data. Since the three variables we are analyzing are measured at the ratio interval, we use the mean as a measure of central tendency, and the standard deviation as a measure of dispersion.
For the first variable, Educ , the mean is 13.78 years, and the standard deviation is 2.889.
For the second variable, Income , the mean is 13.10, and the standard deviation is 5.754.
Finally, for the third variable, Size of Place , the mean is 416,050, and the standard deviation is 1,309,339.
Now we show below the distribution of the categorical variable used.
Now, we perform a regression analysis:
The amount of variance explained is approximately 13.1%. This is a bit low, but yet the regression is significant overall, with F = 25.998 and p = 0.000.
We have the following table with regression coefficients:
The model is therefore:
Notice that all the predictors are significant, except for Size , which is not significant (p = 0.457).
We have the following histogram of residuals:
Now we test for normality:
The p-value for Shapiro-Wilk test is p = 0.000, which means that we have enough evidence to reject the null hypothesis of normality.
We have the plot of residuals by predicted;
There is some kind of a pattern, which indicates that the heteroskedasticity assumption may be violated. - We are interested in testing whether or not Father’s Highest degree is independent from Race . We obtain the following Crosstabulation:
The Chi-Square statistics is 21.797, and the corresponding p-value is p =0.005, which means that we reject the null hypothesis of independence, and hence, we have enough evidence to claim that they are related.
- We are going to test whether or not the median income of male respondents is different from the median income of female respondents. For that purpose we’ll a Mann-Whitney test. Using SPSS we get
The p-value for the test is 0.000, which means that we reject the null hypothesis.
Solution:

Deliverable: Word Document
