(Step-by-Step) Just as with point estimates in an univariate context, we can create sampling distributions for our estimates in a bivariate (and also multivariate)


Question:

Just as with point estimates in an univariate context, we can create sampling distributions for our estimates in a bivariate (and also multivariate) context. This problem will walk you through creating a sampling distribution for OLS estimates in particular. Load the trusty subprime data from the course website. Recall that these are data collected by the U.S. government on all home lending transactions in Cape Coral and Fort Myers. They contain information on each loan applicant and give information on whether that applicant received a subprime loan (high.rate) as well as on the amount of the loan (loan.amount). They also contain basic demographic information such as race, gender, and income. Assume the data represent the \truth" (i.e., an entire population). We are going to look at the (fairly boring) regression in which we use income (income) to predict loan amounts (loan.amount).

  1. Our first step is to and the "true" intercept and slope of the true population. Do so by regressing loan.amount on income. (You are welcome to use lm function in R here.) Report the coe_cients and interpret what they mean.
  2. Set the seed to (12345) and draw a sample of size 250 (without replacement). Regress loan.amount on income for the sample. Report the coe_cients, standard errors, R2 and sample size in a nicely formatted table. Include an informative caption.
  1. Interpret the coefficients.
  2. Interpret the statistical significance of each coefficient. As you answer, consider the following questions: What are the null hypotheses? What are the test statistics? Can we reject the null hypotheses at the 95% level? With what level of confidence can we reject the null hypotheses?

C) Regression requires a number of assumptions. List the four overarching assumptions and explain briefly what they mean. Do you think each of these assumptions are justified in this case? Why or why not? Do we need all four assumptions given the sample size?

D) Note that part D has a Gov 2000/E-2000 and Gov 2000e/1000 version.

Gov 2000/E-2000: After setting the seed to (12345), conduct 1000 simulations such that on every iteration you:

Draw a sample of size 250 without replacement (Note: remember that each observation has two values, one for income and one for loan.amount.)

Regress loan.amount on income for each sample and store the intercept and slope.

Plot the sampling distribution for the intercept and the sampling distribution for the slope. Describe these distributions and offer a guess at why we might we finding these distributions. (Really, if you don't know, just make an educated guess!)

Gov 2000e/1000:

Get the intuition of creating sampling distributions for regression coefficients by drawing 20 different samples

(of size 250 each, without replacement). For each sample, regress loan.amount on income and record / store the intercept and slope. Use these values to create density plots of the approximate sampling distributions of the intercept and slope on your 20 samples. Describe these distributions and o_er a guess at why we might we finding these distributions. (Really, if you don't know, just make an educated guess!)

E) Gov 2000/E-2000 ONLY - EXTRA CREDIT For the _rst 100 samples you drew above, plot the OLS regression lines on the same plot. What does this plot represent? What information does it provide?

Price: $2.99
Solution: The downloadable solution consists of 8 pages
Deliverable: Word Document

log in to your account

Don't have a membership account?
REGISTER

reset password

Back to
log in

sign up

Back to
log in