9.2. The dean of a graduate school wishes to predict the grade point average in graduate work for recent


9.2. The dean of a graduate school wishes to predict the grade point average in graduate work for recent applicants. List a dozen variables that might be useful explanatory variables here.

9.7. Be very brief with your answer. An engineer has stated: "Reduction of the number of explanatory variables should always be done using the objective forward stepwise regression procedure." Discuss.

10.3. Be brief with your answer. A student suggested: "If extremely influential outlying cases are detected in a data set, simply discard these cases from the data set." Comment.

10.4. Be brief with your answer. Do NOT give any definitions of relevant concepts here. Describe several informal and formal methods that can be helpful in identifying multicollinearity among the X variables in a multiple regression model.

10.9. Refer to Brand preference Problem 6.5.

Problem 6.5:

In a small-scale experimental study of the relation between degree of brand liking \((Y)\) and moisture content \(({{X}_{1}})\) and sweetness \(({{X}_{2}})\) of the product, the following results were obtained from the experiment based on a completely randomized design (data are coded): [Note X n in 1 st column and 2 nd row should read X i1 AND X n in 1 st column and 3 rd row should read X i2 ]

i: 1 2 3 14 15 16
\({{X}_{n}}:\) 4 4 4 10 10 10
\({{X}_{n}}:\) 2 4 2 4 2 4
\({{Y}_{i}}: \) 64 73 61 95 94 100
  1. Obtain the studentized deleted residuals and identify any outlying Y observations at the .01 level.
    State the decision rule and conclusion.
  2. Obtain the diagonal elements of the hat matrix, and provide an explanation for the pattern in these elements.
    Hint: Since SPSS lists "centered leverage values," you need to add 1/n to all values obtained in SPSS to get "leverage values." Compare HO#37, numbers on the last page (.151, .009, etc.) to those in Table 10.3. You need to add 1/20 (.050) to the former to get the latter.
    HO#37

    Table 10.3:
  3. Are any of the observations outlying with regard to their X values according to the rule of thumb stated in the chapter?

e. Obtain the DFFITS, DFBETAS, and Cook’s D for all cases. Identify possibly influential cases and assess the influence of them. What do you conclude?

10.101.

Refer to Q.6.18.

Q. 6.18: Commercial properties. A commercial real estate company evaluates vacancy rates, square footage, rental rates, and operating expenses for commercial properties in a large metropolitan area in order to provide clients with quantitative information upon which to make rental decisions.

The data below are taken from 81 suburban commercial properties that are the newest, best located, most attractive, and expensive for five specific geographic areas. Shown here are the age (X 1 ), operating expenses and taxes (X 2 ), vacancy rates (X 3 ), total square footage (X 4 ), and rental rates (Y).

i 1 2 3 ….. 79 80 81
X i1 1 14 16 ….. 15 11 14
X i2 5.02 8.19 3.00 ….. 11.97 11.27 12.68
X i3 0.14 0.27 0 ….. 0.14 0.03 0.03
X i4 123,000 104,079 39,998 ….. 254,700 434,746 201,930
Y i 13.50 12.00 10.50 ….. 15.00 15.25 14.50

Suppose this real estate company came up with a "rental quality index" and added this variable to the regression equation. The data are attached below and sent by e-mail (the rental quality index is on the last column). Please note that the total square footage is divided by 100,000.

13.5 1 5.02 0.14 1.23 25.48

12 14 8.19 0.27 1.04 11.15

10.5 16 3 0 0.4 6.25

15 4 10.7 0.05 0.57 18.26

14 11 8.97 0.07 0.6 13.17

10.5 15 9.45 0.24 1.01 10.87

14 2 8 0.19 0.31 18.90

16.5 1 6.62 0.6 2.48 31.73

17.5 1 6.2 0 2.15 30.09

16.5 8 11.78 0.03 2.51 23.98

17 12 14.62 0.08 2.91 22.55

16.5 2 11.55 0.03 2.08 29.99

16 2 9.63 0 0.82 23.02

16.5 13 12.99 0.04 3.6 24.64

17.23 2 12.01 0.03 2.66 31.65

17 1 12.01 0 2.99 34.16

16 1 7.99 0.14 1.89 30.30

14.63 12 10.33 0.12 3.66 26.25

14.5 16 10.67 0 3.5 19.92

14.5 3 9.45 0.03 0.85 19.78

16.5 6 12.65 0.13 2.36 26.35

16.5 3 12.08 0 1.3 23.73

15 3 10.52 0.05 0.41 20.11

15 3 9.47 0 0.41 18.67

13 14 11.62 0 0.46 6.86

12.5 1 5 0.33 1.2 25.35

14 15 9.89 0.05 0.81 9.86

13.75 16 11.13 0.06 1.54 11.33

14 2 7.96 0.22 0.97 23.55

15 16 10.73 0.09 2.76 17.34

13.75 2 7.95 0 0.9 22.29

15.63 3 9.1 0 1.84 25.36

15.63 3 12.05 0.03 1.85 27.98

13 16 8.43 0.04 0.96 11.16

14 16 10.6 0.04 1.06 9.00

15.25 13 10.55 0.1 1.36 13.49

16.25 1 5.5 0.21 1.8 27.69

13 14 8.53 0.03 3.15 21.19

14.5 3 9.04 0.04 0.43 20.73

11.5 15 8.2 0 0.3 5.87

14.25 1 6.13 0 0.6 21.63

15.5 15 8.32 0 0.74 8.61

12 1 4 0 0.5 23.39

14.25 15 10.1 0 0.51 6.66

14 3 5.25 0.16 0.32 17.68

16.5 3 11.62 0 1.68 27.53

14.5 4 5.31 0 0.7 18.44

15.5 1 5.75 0 0.27 19.13

16.75 4 12.46 0.03 1.3 21.35

16.75 4 12.75 0 1.3 22.94

16.75 2 12.75 0 1.3 24.77

16.75 2 11.38 0 2.09 28.01

17 1 5.99 0.57 2.2 32.28

16 2 11.37 0.27 0.6 22.46

14.5 3 10.38 0 1.1 22.99

15 15 10.77 0.05 1.01 10.09

15 17 11.3 0 2.89 16.30

16 1 7.06 0.14 1.05 23.22

15.5 14 12.1 0.05 2.76 20.45

15.25 2 10.04 0.06 0.33 19.72

16.5 1 4.99 0.73 2.1 27.96

19.25 0 7.33 0.22 2.4 32.29

17.75 18 12.11 0 2.82 16.54

18.75 16 12.86 0 4.21 25.25

19.25 13 12.7 0.04 4.84 31.96

14 20 11.58 0 2.34 11.85

14 18 11.58 0.03 2.31 12.61

18 16 12.97 0.08 2.97 16.86

13.75 1 4.82 0 0.32 19.92

15 2 9.75 0.03 0.39 19.39

15.5 16 10.36 0.02 1.1 9.99

15.9 1 8.13 0.23 2.36 32.02

15.25 15 13.23 0.05 2.43 17.86

15.5 4 10.57 0.04 1.22 21.48

14.75 20 11.22 0 1.28 5.84

15 3 10.34 0 0.72 21.00

14.5 3 10.67 0 0.43 19.36

13.5 18 8.6 0.08 0.59 5.58

15 15 11.97 0.14 2.55 16.90

15.25 11 11.27 0.03 4.35 31.49

14.5 14 12.68 0.03 2.02 16.5

  1. Why was the total square footage divided by 100,000? What would have happened if I didn’t do this.
    Answer in a single sentence.
  2. Fit the regression equation including all five predictors. Make sure to include the option, "VIF/Tolerance." State the estimated function of the equation.
  3. Present the correlation matrix. Is there any unusual feature in the matrix? Can you detect a potential problem here?
  4. Briefly comment on unusual features of this estimate. Pay attention to the overall F*, individual t*, standardized coefficients, and VIF.
  5. Run a regression analysis without the "rental quality index" and present the estimated equation.
    Compare this to the equation in b. in terms of R2s, coefficient estimates, standard error, etc. Is there anything noteworthy in the comparison of these two equations?
  6. Regress the rental quality index on the four other predictors and present the estimated equation.
    Check R 2 for this equation. State how this value is related to any of the VIF’s you obtained in b.
  7. What is "wrong" with this equation? Identify the problem with this equation and make a guess on how the "rental quality index" has been created.

10.102

  1. FILL IN THE BLANK : In Y^ = b 1 X 1 + b 2 X 2 where X 1 and X 2 are not correlated, VIF for X 1 is _____________ and
    VIF for X 2 is ________________.
  2. In Y^ = b 1 X 1 + b 2 X 2 , VIF for X 1 is always the same as VIF for X 2 . Very briefly explain why.

Factor Analysis:

101.1

In 2006 GSS, they asked some questions on attitudes toward gun control. They asked the following

seven questions.

  1. GUNSALES should background check be required for private gun sale
  2. GUNSDRUG should penalty for illegal gun sale be tougher than drug
  3. SEMIGUNS should semi-auto gun sale be limited to military
  4. GUNS911 should gun control law be stricter after 911
  5. RIFLES50 should high power rifle sale be limited to military
  6. OTHGUNS number of adult owing a gun in household
  7. GUNSDRNK should carrying a firearm drinking alcohol be illegal

You are trying to come up with a scale called "attitude toward gun control."

  1. GUNSDRUG is coded in a wrong order. Verify it, and recode by: recode gunsdrug (2=3)(3=2).
    Hint: If you find "preferences" or "options," "pivot table labeling," and change it to "values and labels,"
    you see 1, 2, and 3 (values) in addition to "tougher," "less tough," and "about as tough."
  2. Run a frequency of all seven variables to make sure that all variables are coded correctly.
  3. Make a guess on which of the following items would qualify for this single scale. Simply pick #’s out of seven (something like 1, 2, 3, 5, 7). There is no single correct answer. Explain your choice in a single sentence.
  4. Run a factor analysis for 2006 respondents (select if year=2006), using ML estimation and OBLIMIN rotation. Which questions are retained for "attitude toward gun ownership?"
  5. Does the empirical factor structure match your guess? Briefly explain.
  6. You should have three questions to make up the scale. Check the mean and standard deviation of these three.
  7. Do you need to "reverse code?" Check the direct of the three scale items and discuss if you need to reverse-code any. If you need to, do so.
  8. Now create a scale which ranges from 1-5. Remember that these three questions are on different scales (1-5, 1-2, etc.). Change "123" to "135" and "12" to "15" (or "24") to make three standard deviations approximately equal before making the scale. Show the process and SPSS commands.
    Program so that you retain cases when they answered at least two questions.
  9. Name the scale (implying a direction), and obtain the mean and standard deviation.

14.4

  1. Plot the logistic mean response function: E {Y i } = π i = F L 0 + β 1 X i ) = exp(β 0 + β 1 X i ) / 1 + exp(β 0 + β 1 X i ), when β 0 = -27 and β 1 = 0.2
    Hint: Find the X-value which makes E(Y) = π = .50 to start the plot. Take at least 10 different X values to obtain a decent-looking line.
  2. For what value of X is the mean response equal to 0.5?
  3. Find the odds when X= 150, when X= 151, and the ratio of the odds when X= 151 to the odds when X= 150.

Is this odds ratio equal to exp(β 1 ) as it should be??

Is the change in odds close to β 1 ? Should it? Briefly explain.

14.13.

Car purchase. A marketing research firm was engaged by an automobile manufacturer to conduct a pilot study to examine the feasibility of using logistic regression for ascertaining the likelihood that a family will purchase a new car during the next year. A random sample of 33 suburban families was selected. Data on annual family income (X 1 , in thousand dollars) and the current age of the oldest family automobile (X 2 , in years) were obtained. A follow-up interview conducted 12 months later was used to determine whether the family actually purchased a new car (Y = 1) or did not purchase a new car (Y = 0) during the year.

i: 1 2 3 . . . 31 32 33

X i1 : 30 45 60 . . . 21 32 17

X i2 : 2 2 2 . . . 3 5 1

Y j : 1 0 1 . . . 0 1 0

Multiple logistic regression model: E {Y i } = π i = exp(X’ i β) / 1 + exp(X’ i β) with two predictor variables in first-order terms is assumed to be appropriate.

  1. Find the maximum likelihood estimates of o , 1 , and 2 . State the fitted response function.
  2. Obtain exp( b 1 ) and exp( b 2 ) and interpret these numbers.
  3. What is the estimated probability that a family with annual income of $50 thousand and an oldest car of 3 years will purchase a new car next year?
  4. Which independent variable is most strongly related to Y? How do you know?
    14.19. Refer to problem 14.13 above. Assume that the fitted model is appropriate and that large-sample inferences are applicable.
    b. Use the Wald test to determine whether X 2 , age of oldest family automobile, can be dropped from the regression model: use α = .05. State the alternatives, decision rule, and conclusion. What is the approximate P – value of the test?
    c. Use the likelihood ratio to determine whether X 2 , age of oldest family automobile, can be dropped from the regression model: use α = .05. State the full and reduced models, decision rule, and conclusion. What is the approximate P – value of the test? How does the result here compare to that obtained for the Wald test in (b)?
  5. Show -2logLR for this model and degrees of freedom.
    Hint: This is an indicator of the difference in fit between this model and the saturated model.
  6. How well does this model explain Y? Show four measures of the model fits.

14.101.

Someone found a logistic regression coefficient of -.08. She said "This means that one unit increase in X decreases the probability of Y being 1 by exactly 8%." Briefly discuss if this is correct or not. If not correct, what's the correct interpretation?

Price: $49.99
Solution: The downloadable solution consists of 26 pages, 3655 words and 22 charts.
Deliverable: Word Document


log in to your account

Don't have a membership account?
REGISTER

reset password

Back to
log in

sign up

Back to
log in