Statistics - Hypothesis Testing Projects

Assignment 4 T-tests Analyze Data to Determine if Sample Means Vary Significantly For this activity, you

Assignment 4 T-tests

Analyze Data to Determine if Sample Means Vary Significantly
For this activity, you will be analyzing data in order to determine whether sample means are significantly different from each other. Review chapters 5 and 9 in Field and Miles (2010). Work through the examples found in these chapters of your required readings. Assemble all of requirements for the activity into one Word document for submission. Also, be sure to download and review the Activity 4 Tutorial file in the Additional Resources section.

Instructions: Input the file lipid.sas7bdat from the sample data warehouse as discussed previously. This data set has been collected from blood lipid screenings as well as patient history. Information such as gender, age, weight, total cholesterol, level, blood pressure, coffee consumption, and history of heart disease was collected. The blood lipid screenings were conducted three months after the initial screenings.

1. Test the hypothesis that the mean weight of the population is 150 pounds with a confidence level of 95%.

a) State the null and alternative hypotheses for the test about to be performed.

b) Open the Analysis, Anova, and then t-test menu. Select one sample. Then move to the data window. Move weight to the analysis variable task role. Select Analysis and set the null hypotheses value equal to 150. Leave the equal tailed box checked and the confidence level at 95%. Move to the plots menu and select summary, histogram, box plot, and normal quartile-quartile (Q-Q) plot. On the title menu, unclick the default title and manually type T-test of a Single Sample. Click run.

c) Review the summary statistics, box plot, Q-Q plot and histograms. Do the required assumptions for the t-test appear to be met?

d) From the t-test output, determine what decision to make regarding the null hypotheses and explain your rational. If you reject the null, what are the 95% confidence limits for the actual population weight?
e) Copy what you consider to be the relevant output to your assignment document and include a detailed written analysis of the results.

2. Test the hypothesis that there is no difference in the mean cholesterol levels between males and females with a confidence level of 95%.

a) State the null and alternative hypotheses for the test about to be performed.

b) Open the Analysis, Anova, and then t-test menu. Select two sample. Then move to the data window. Move cholesterol to the analysis variable task role. Move gender to the classification variable task role. Select Analysis and leave the null hypotheses value equal to 0. Leave the equal tailed box checked and the confidence level at 95%. Move to the plots menu and select summary, histogram, box plot, and normal quartile-quartile (Q-Q) plot. On the title menu, unclick the default title and manually type T-test of a Single Sample. Click run.

c) Review the summary statistics, box plot, Q-Q plot and histograms. Do the required assumptions for the t-test appear to be met?

d) From the t-test output, determine what decision to make regarding the null hypotheses and explain your rational. If you reject the null, what are the 95% confidence limits for the actual difference in cholesterol between males and females?

e) Copy what you consider to be the relevant output to your assignment document and include a detailed written analysis of your results.

Solution: (a) We need to test

\[\begin{aligned} & {{H}_{0}}:{{\mu }_{Males}}={{\mu }_{Females}} \\ & {{H}_{A}}:{{\mu }_{Males}}\ne {{\mu }_{Females}} \\ \end{aligned}\]

(b) The following graphs are obtained:

Also, the following descriptive statistics are obtained:

Gender	N	Mean	Std Dev	Std Err	Minimum	Maximum
female	24	194.6	37.3221	7.6183	131.0	285.0
male	71	190.1	35.2990	4.1892	115.0	277.0
Diff (1-2)		4.5405	35.8100	8.4553

Gender	Method	Mean	95% CL Mean		Std Dev	95% CL Std Dev
female		194.6	178.9	210.4	37.3221	29.0073	52.3540
male		190.1	181.7	198.4	35.2990	30.2968	42.2952
Diff (1-2)	Pooled	4.5405	-12.2501	21.3311	35.8100	31.3206	41.8136
Diff (1-2)	Satterthwaite	4.5405	-13.0619	22.1429

(c) The histograms for males and females look relatively similar, but yet, from the box-plot, it is clear that the histogram for females have some right outliers, which could be indicating a violation of the normality assumption, especially considering that the sample size is not large enough to apply normal approximation (for females). The assumption for the two-independent samples t-test may be violated. The assumption of equality of variances appears to be met (considering that the sample standard deviations don’t differ by that much, but we’ll perform a formal F-test for the equality of variances on the next part).

From the descriptive statistics table, it is found that for females, the mean is M = 194.6 and SD = 37.3221 and for males the mean is M = 190.1 and SD = 35.2990 .

(d) The following is obtained:

Method	Variances	DF	t Value	Pr > \|t\|
Pooled	Equal	93	0.54	0.5926
Satterthwaite	Unequal	37.874	0.52	0.6045

Equality of Variances
Method	Num DF	Den DF	F Value	Pr > F
Folded F	23	70	1.12	0.6992

First, it is observed that the variances are assumed to be equal, F(23, 70) = 1.12, p = .6992 . Hence, under the assumption of equal variances, it is found that t(93) =0.54, p = .05926 , which indicates that we fail to reject the null hypothesis of equal means for males and females.

(e) See the output below:

SAS Output

t Test of Two Samples

The TTEST Procedure

Variable: Cholesterol

Gender	N	Mean	Std Dev	Std Err	Minimum	Maximum
female	24	194.6	37.3221	7.6183	131.0	285.0
male	71	190.1	35.2990	4.1892	115.0	277.0
Diff (1-2)		4.5405	35.8100	8.4553

Gender	Method	Mean	95% CL Mean		Std Dev	95% CL Std Dev
female		194.6	178.9	210.4	37.3221	29.0073	52.3540
male		190.1	181.7	198.4	35.2990	30.2968	42.2952
Diff (1-2)	Pooled	4.5405	-12.2501	21.3311	35.8100	31.3206	41.8136
Diff (1-2)	Satterthwaite	4.5405	-13.0619	22.1429

Method	Variances	DF	t Value	Pr > \|t\|
Pooled	Equal	93	0.54	0.5926
Satterthwaite	Unequal	37.874	0.52	0.6045

Equality of Variances
Method	Num DF	Den DF	F Value	Pr > F
Folded F	23	70	1.12	0.6992

Generated by the SAS System ('SASApp', SunOS) on January 24, 2012 at 2:16:06 PM

3. Test the hypothesis that there has been a weight loss between the start of the program and the 3-month follow up screening with a confidence level of 95%.

a) State the null and alternative hypotheses for the test about to be performed.
b) Open the Analysis, ANOVA, and then One-Way ANOVA menu. Select paired sample. Then move to the data window. Move Weight and Weight3 to the analysis variable task role. Select Analysis and leave the null hypotheses value equal to 0. Leave the equal tailed box checked and the confidence level at 95%. Move to the plots menu and select summary, histogram, box plot, and normal quartile-quartile (Q-Q) plot. On the title menu, unclick the default title and manually type T-test of a Single Sample. Click run.

c) Review the summary statistics, box plot, Q-Q plot and histograms. Do the required assumptions for the t-test appear to be met?

e) Copy what you consider to be the relevant output to your assignment document and include a detailed written analysis of your results.

Solution: (a) We need to test

\[\begin{aligned} & {{H}_{0}}:{{\mu }_{D}}=0 \\ & {{H}_{A}}:{{\mu }_{D}}\ne 0 \\ \end{aligned}\]

where ${{\mu }_{D}}={{\mu }_{START}}-{{\mu }_{3\text{ MONTHS}\,\text{AFTER}}}$

(b) The following graphs are obtained:

(c) The distribution of the differences appears to be slightly right skewed, but even if the departure from normality is significant, the sample size is large enough to use normal approximation, in order to ensure that the sampling distribution of mean differences are at least approximately normally distributed.

The following descriptive statistics are obtained:

N	Mean	Std Dev	Std Err	Minimum	Maximum
43	-1.9070	8.0262	1.2240	-23.0000	17.0000

Mean	95% CL Mean		Std Dev	95% CL Std Dev
-1.9070	-4.3771	0.5631	8.0262	6.6179	10.2014

The mean difference is M _D = -1.9070 , and the standard deviation of the difference is s _D = 8.0262 .

(d) Now, the following is obtained

DF	t Value	Pr > \|t\|
42	-1.56	0.1267

The t-statistic is t(42) = -1.56, and the one-tailed p-value p = .1267/2 = 0.06335 , which indicates that we fail to reject the null hypothesis.

Hence, we don’t have enough evidence to claim that there has been a weight loss between the start of the program and the 3-month follow up screening with a confidence level of 95%.

(e) See the output below:

SAS Output