Q1 . In a health study to investigate an epidemic outbreak of a disease X that is spread by mosquitoes,


Q1 . In a health study to investigate an epidemic outbreak of a disease X that is spread by mosquitoes, individuals were randomly sampled within two sectors in a city to determine if the person had recently contracted the disease under study. This was ascertained by the interviewer, who asked pertinent questions to assess whether certain specific symptoms associated with the disease were present during the specified period. The response variable "disease" was coded 1 if this disease was determined to have been present, and 0 if not.

Three predictors were included in the study, representing known or potential risk factors. They are age, socioeconomic status (SES) of household, and sector within city. Age is a quantitative variable, SES is a categorical variable with three levels. It is represented by two indicator variables (SESc1 and SESc2), as follows:

Social Class SESc1 SESc2

Upper 0 0

Middle 1 0

Lower 0 1

City sector is also a categorical variable. Since there were only two sectors in the study, one indicator variable (city) was used, defined so that city=0 for sector A and city=1 for sector B.

In dataset hw 4_q1 .sav , you should see the following variables:

Variable name Definition

case id of the participant

age age of the participant

SES Socioeconomic status (1=upper class, 2=middle, 3=lower)

SESc1 SES indicator variable 1 (1= middle class in SES, 0=others)

SESc2 SES indicator variable 2 (1=lower class in SES, 0=others)

city City sector indicator variable (0=sector A, 1=sector B).

disease 1= with disease, 0 = without disease

  1. Based on the given data, what is the odds of contracting disease X for city sector A
    and city sector B? And what is the odds ratio of contracting the disease X between
    these two cities? [6 marks]
  2. Fit the following models (by SPSS) and complete the table in next page. [6 marks]
    M0: log [/(1-)] = 0
    M1: log [/(1-)] = 0 + 1*age
    M2: log [/(1-)] = 0 + 1*age + 2*city
    M3: log [/(1-)] = 0 + 1*age + 2*city + 3*SESc1 + 4*SESc2
    [* denotes the probability of contracting disease X].
    Models Deviance
    (-2LogLikelihood)
    No. of regression coefficient parameters Pseudo R 2
    M0
    M1
    M2
    M3
  3. Based on (b), test the following hypotheses by likelihood ratio test and complete the
    table below. [8 marks]
    Hypothesis Model pairs to be compared Chi-square df of the
    Chi-square test
    Significant at 0.05 level or not?
    In M1, H0: 1 = 0
    In M2, H0: 2 = 0
    In M3, H0: 3 = 4 = 0
    In M3: H0: 1=2= 3 = 4 = 0
  4. Based on the SPSS output for M3, report the fitted regression model. Interpret the
    regression coefficient estimates (in terms of odds). [8 marks]
  5. Based on (d), what can you conclude about the three potential risk factors? Are they
    all relevant? Remember support your conclusion with the relevant test statistic.
    [8 marks] [10 marks]
  6. Based on M3, perform a classification on the cases with and without disease (with
    cutoff = 0.50). What is the accuracy rate (show your calculations)? [8 marks]
  7. Use discriminant analysis method to perform the classification of the two disease
    groups with the same predictors used in (f).
    (Note: For discriminant analysis, use "equal prior probabilities" and "pooled covariance matrix", i.e. choose "Within-groups" covariance matrix option for classification in SPSS).
    Compare the results with what you find in (f). Which approach you think is more suitable and why? [10 marks]
  8. Do you think K-means clustering procedure can be used to classify the cases as in (f)

and (g)? Explain your answers. [8 marks]

Price: $21.87
Solution: The downloadable solution consists of 11 pages, 1087 words and 12 charts.
Deliverable: Word Document


log in to your account

Don't have a membership account?
REGISTER

reset password

Back to
log in

sign up

Back to
log in