I Please use the sample data from students (links to two versions of the data are listed below) to perform


Problem I

Please use the sample data from students (links to two versions of the data are listed below) to perform the following tasks. Delete the extremely unreliable value s as you have done in the previous assignment before doing analysis (remove just the extreme value not the whole case) .

Link to Data: http://people.ysu.edu/~gchang/stat/Classdata_13f.xls

Link to Data: http://people.ysu.edu/~gchang/stat/Classdata_13f.csv (Mac users)

Note: Classdata_13f. csv data file uses comma as the separator .

  1. Test whether there is significant correlation between "eating fried food" and "exercise regularly" variables, at 5% level of significance. You must state null and alternative hypothesis, check large sample assumption, report p-value, test statistic value, and draw a proper conclusion.
    Null hypothesis:
    Alternative hypothesis:
    Report the value of the test statistic =
    Report p-value from the Chi-square test =
    Report p-value from the Fisher Exact test =
    Conclusion:
  2. Describe and conclude your analysis on Cochran’s large sample conditions for the chi-square test above and comment on whether Chi-square is good to use for the analysis above.
  3. Explain the difference between Chi-square and Fisher Exact tests. (Check on Internet, if you do not know what Fisher Exact test is.)

Problem II. Linear Regression

In a research study for the effect of smoking on the increase in health score for subjects in a diet program, four variables were recorded, and they are the followings.

  • iihs: Increase in Health Score
  • time: Length of time in a diet program (in weeks)
  • smoke: Subject Smoked or Not
    (1=smoked; 0=did not smoke)
  • age: Subject’s Age

The data is in the file stored in the following address:

http://people.ysu.edu/~gchang/stat/reg-increaseinhealthscore-13.sav [SPSS]

http://people.ysu.edu/~gchang/stat/reg-increaseinhealthscore-13.xls [EXCEL]

http://people.ysu.edu/~gchang/stat/reg-increaseinhealthscore-13.csv [Comma separated value]

Use the linear regression modeling technique to answer the following questions:

  1. Make a scatter plot for displaying correlation between "Increase in Health Score" and "Length of time in a diet program" and a scatter plot for "Increase in Health Score" versus "Age", and describe the relation between each pair of variables.
    [Paste the graphs here!]
  2. Report the Pearson correlation coefficient between "Increase in Health Score" and "Length of time in a diet program", and use p-value to conclude whether the correlation is significantly different from zero.
  3. Make a scatter plot for "Increase in Health Score" versus "Length of time in a diet program" with smoking status as the other categorical factor variable (markers variable). Does the smoking variable appear to be a significant factor on "Increase in Health Score"?
    [Paste the graph here!]
  4. Run the regression analysis and check the multicollinearity condition using VIF, with a cutoff at 4. Are Length of time in a diet program, age, and subject’s smoking status significant factors in predicting subject’s "Increase in Health Score"? Is there a multicollinearity problem? (Show the coefficient table from software and interpret.)
  5. Find the linear regression equation for predicting the average Increase in Health Score using only the significant predictor variables.
    [Write the linear regression equation here!]
  6. Estimate the average Increase in Health Score from individual aged 25 who smoked and in the diet program for 38 weeks with a 95% confidence interval. [Use the model with only the significant predictor variables mentioned in 4.]
  7. Estimate the Increase in Health Score from individual aged 25 who smoked and in the diet program for 38 weeks with a 95% confidence interval. [Use the model with only the significant predictor variables mentioned in 4.]
  8. Perform a two independent samples t-test to see if there is significant difference between the average Increase in Health Score for subjects who smoked versus those who did not smoke. Does the result contradict with the result in 3? If yes, why?

Problem III . Logistic Regression on Illicit Drug Use Study

In an Illicit Drug Use study on high school students in a certain region in US, the following variables were observed:

Risk Behavior Variable:

  • Used illicit drug in high school (0=did not use illicit drug, 1=used illicit drug)

Risk Factor Variables:

  • Sex (0=Female, 1=Male)
  • Race (1=White, 2=Black, 3=Other)
  • School Type (0=Private, 1=Public)

The data is in the following address:

http://people.ysu.edu/~gchang/stat/log-druguse-13.sav [ SPSS ]

http://people.ysu.edu/~gchang/stat/log-druguse-13.xls [EXCEL]

http://people.ysu.edu/~gchang/stat/log-druguse-13.csv [Comma separated value]

Use the logistic regression technique to answer the following questions: (Do not use stepwise regression.)

  1. Report the frequency and percentage distribution for each of the four variables observed in this study. (Create frequency distribution tables using Word to report the results.)
  2. Perform a logistic regression with all three risk factors in it to model the drug use. Find the significant factor(s) that affects whether student used illicit drug or not by performing an analysis with all three risk factor variables in the model and report p-values.
  3. Use the odds ratios from a logistic regression model that includes all three risk factors, and explain how each of the significant factors affecting the illicit drug use. Must report the odds ratios and explain the meaning of these odds ratios.
  4. Perform the logistic regression modeling with only significant factors in it and use it to estimate the probability of a randomly selected white female student from public school that will use illicit drug.

(If you use statistical software to solve a problem, please attach the software output right after your answer to support it. If you do not want to use software then you need to show your work.)

Problem IV

In a random sample of 600 subjects from a population, 54 of them smoked cigarette. Find the 95% confidence interval for estimating the percentage of smokers in this population. (Use the method covered in our biostatistics course.)

Problem V

In a random sample of 600 subjects from a population, 54 of them smoked cigarette. At 5% level of significance, test to see if the percentage of people who smoked cigarette in this population is different from 7%. (Must state the hypotheses and use p-value to answer this question.)

P = 54/600

P = 0.09

Problem VI

A group of researchers investigated the effect of a type of frontier medicine called MIT. It is about using music, image, and touch in healing cardiac care patients. The study compared the outcomes from using standard care and MIT after 4 months of treatment. The result is shown in the following table.

Outcome
Treatment
# of Patients with Major Cardiovascular Event # of Patients with No Cardiovascular Event
MIT 23 82
Standard 31 94

At 5% level of significance, test to see if there is significant association between treatment and outcomes variables. (Must state the hypotheses and use p-value to answer this question.)

H 0 =

H A =

Problem VII

Use the data in the contingency table for Question 4, to compute the odds ratio of having cardiovascular event for those who were treated with MIT versus Standard treatment and interpret the meaning of this number.

Price: $37.63
Solution: The downloadable solution consists of 18 pages, 1963 words and 25 charts.
Deliverable: Word Document


log in to your account

Don't have a membership account?
REGISTER

reset password

Back to
log in

sign up

Back to
log in