Statistics - Hypothesis Testing Projects

PROJECT : BIOSTATISTICS Low birth weight, defined as birth weight less than 2500 grams, is an outcome

PROJECT : BIOSTATISTICS

Low birth weight, defined as birth weight less than 2500 grams, is an outcome that has been of concern to physicians for years. This is due to the fact that infant mortality rates and birth defect rates are very high for low birth weight babies. A woman's behavior during pregnancy (including diet, smoking habits, and receiving prenatal care) can greatly alter the chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight.

In this project, you are asked to prepare a short data analysis on low birth weight . You will use the data set ‘birth_wt0.sav". This dataset is part of a larger study at Bay State Medical Center in Springfield, Massachusetts. The dataset contains variables that are believed to be associated with low birth weight in the obstetrical literatures. Actual observed variable valued have been modified to protect subject confidentiality. The goal of the current study was to determine whether these variables were risk factors in the clinic population being served by Bay State Medical Center.

The following sections will guide you through the analysis. The questions are listed by bullets, but your report needs to be in narrative format . Please include ONLY the necessary SPSS output to support your analysis. Please limit your project report to 4 pages (You should use fonts no smaller than "Times New Roman, size 12". But you can use either single or double space). There will be no penalty for creating a conci se report that is shorter than 4 pages as long as it answers all the questions. If you feel compelled to include the original SPSS output, please attach them as an Appendix. The appendix should not be longer than 2 pages. Each extra page will result in 0.5 point deduction.

Section One:

Present the histogram of the birth weight distribution and display the normal curve on top of the histogram. Report the mean and standard deviation of the birth weight. Does the histogram largely follow a normal distribution? Calculate the standardized score for 2,500 grams (the cutpoint for low birth weight) based on the mean and the standard deviation and find the percentage of births that are low birth weights using Table 8.1(distributed in class). [2pts]
Generate a new variable "lbw_grp" that indicates whether the baby is low birth weight. The rule for recoding is as follows: lbw_grp=1 if the birth weight is less than 2,500 grams and =0 if the birth weight is greater than 2,500 grams. What percentage of births in this sample actually are low birth weight babies and what percentage of the births are not low birth weights? [1 pt ]
In a report released by CDC, there are 347,209 babies born in 2009 weighting less than 2,500 grams. This counts for 8.2% of the total births in 2009. Using this national statistic, would you say the clinic population being served by Bay State Medical Center has significantly higher low birth weight than the national rate in 2009? Answer this question by [2 pts]
1. Stating the null and alternative hypotheses.
2. Derive the appropriate test statistic.
3. Report the p-value.
4. Make the conclusion using a 5% level of significance.
  Section Two:
  Smoking during pregnancy is considered as one of the major risk factors for low birth weight. In this dataset, variable "Smoke" records the smoking status for each woman.

Summarize the birth weight for smokers and non-smokers and fill out table below [1pt]

Birth weight	N	Mean	Standard Deviation	Standard Error of Sample Mean
Non-Smokers
Smokers

Which group has higher mean birth weight? and by how many grams? Non-smokers have a larger standard deviation in birth weight than smokers. But the standard error of the sample mean for non-smokers is smaller than that of smokers, can you explain briefly why? [2pts]

In order to test whether the mean birth weight is the same for smokers and non-smokers, we can conduct a two-sample t test. Identify which type of two-sample t test is appropriate for this data (assuming the patients are not related to each other.) [1pt]
Conduct the appropriate two-sample t test in SPSS and answer the following questions: [3pts]
1. What are the null and alternative hypotheses for this test?
2. Report the test statistic and the correct p-value (corresponding) to the hypotheses.
3. What’s your conclusion at the 5% significance level?
Construct a 95% confidence interval for the mean difference in birth weights between smokers and non-smokers. What do we learn about the mean difference in birth weights from this confidence interval? [1 .5 pt s ]
In this sample, what percentage of women smoke during pregnancy? The national statistic finds that the percentage of women who smoke is 18% (the percentage of women who smoke during pregnancy is less than 18% but the exact statistic is unknown). Based on your findings about smoking and birth weight, can you come up a reasonable explanation for why the average birth weight in this sample is lower than the national average? [1.5 pt s ]

Section Three :

Race is also reported as an important risk factor for low birth weight. In this dataset, women are categorized into three racial groups: White, Black and Other races. The mean birth weight for each racial group is summarized below.

Mothers race	Mean	N	Std. Deviation	Std. Error of Sample Mean
White	3103.7396	96	727.72424	74.27304
Black	2719.6923	26	638.68388	125.25621
Other	2804.0149	67	721.30115	88.12096

Rank the three racial groups in terms of average birth weight from high to low. What’s the difference in mean birth weight between White and Black, between White and Other race and between Other race and Black? [2 pts]
What test can we use if we want to test the difference in mean birth weight between two racial groups? Conduct a two-sided test to compare mean birth weight between White and Black. Report the test statistics and p-value. Using 5% significance level, what’s your conclusion? [1 .5 pt s ]
Explain in your own words "the problem of simultaneously testing multiple hypotheses" in this context. You can use level 5% as an example. [1pt]
If we are interested in testing whether there is any significant difference in birth weight across all racial groups , what would be an appropriate pair of null and alternative hypotheses? [1 pt ]
To test the above hypothesis, conduct a one-way ANOVA in SPSS, report the following values [3pts]
1. The between group sum of squares, the within group sum of squares and the total sum of squares. What percentage of variance is explained by the between group variation?
2. Report the F statistics and its degrees of freedom, and the corresponding p-value
3. What conclusion can you draw at the 5% significance level?
Conduct a post-hoc analysis for this ANOVA, using Tukey’s HKD adjustment. At 5% significance level, which pairs of racial groups are deemed to have significantly different mean birth weights after adjusting for the multiple comparison problem? [1.5pts]