Getting Started For the first exam, you described data (using univariate techniques). For the second
Getting Started
For the first exam, you described data (using univariate techniques). For the second exam,
you made estimates and inferences (using ideas about probability distributions). For this third
exam, you will attempt to explain relationships between variables (using bivariate techniques
appropriate to each level of measurement).
This third exam has more questions than the first two, but should take you less time, for
three reasons: First, you should now be comfortable working in SPSS at a slightly faster
pace. Second, SPSS now does almost all the work for you, so you won’t need to calculate
much – just get the output, and "write up" the results. Third, some of the questions are the
same on every exam, so you have had experience answering some of them twice already.
You will need six variables – two nominal, two ordinal, and two interval – from any data set.
If none of the data sets available are to your liking, tell me and I’ll find you something else. If
there’s a topic you’d like to study and don’t have data on it, tell me and I’ll help you find it.
(In a more advanced course, such as 497, you would have gained the skills and the
responsibility to acquire data on your own. For this course, the focus is on using the data,
and I’d rather you not get distracted with trying to find or collect your own.)
Be certain to resolve any missing values.
(Note: You might choose to study the missing cases, as a nominal variation – for instance,
those who refused to give an opinion about a certain topic, or those who declined to give
their age – but make sure there are enough cases in that category for a bivariate analysis.)
I: Introduction
Define your population of interest, and define the available sample. What is the sample size,
and what is the population size? Comment on any strengths or weaknesses of the sample
for your study, including the sample size, any biases you might suspect, any advantages or
disadvantages of the sampling procedure, and anything you would change about that
procedure.
II: Nominal Association
-
For your two nominal variables, state an argument about how they might be related. Make a
case that there is a causal relationship between them (specifically stating which is the
independent variable and which is the dependent variable, even though you will be using a
symmetric measure) and state expectations about the form of that relationship. -
For each nominal variable, identify its variable name, variable label, operational definition
(including value labels, if appropriate), and level of measurement, noting any changes
introduced in recoding. Provide a concise (brief but complete) univariate analysis of each
variable. Pay special attention to missing values. (If you have two variables with many
missing cases, you may not have enough cases which are valid for both variables.) -
Briefly (a few sentences) describe the pattern and size of any relationship observed in the
crosstabulation, using a comparison of modal percentages. -
Conduct a statistical test for the hypothesis concerning your two nominal variables, using an
alpha of 0.05. List all steps taken and all assumptions made in testing the null hypothesis,
including statements of both hypotheses (in Ho and Ha notation and prose explanations),
interpretations of the test statistic and the p-value, and a sound and complete decision
regarding both hypotheses. (A general interpretation of the test statistic is satisfactory). You
do not have to compute the test statistic by hand, nor do you need to address percentage
comparisons. - Assess the strength of the relationship between the two variables, using a measure
appropriate to address this association. Also provide an interpretation of this measure in
terms of prediction errors.
III: Ordinal Association
-
For your two ordinal variables, state an argument about how they might be related. Make a
case that there is a causal relationship between them (specifically stating which is the
independent variable and which is the dependent variable) and state expectations about the
form of that relationship. -
For each ordinal variable, identify its variable name, variable label, operational definition
(including value labels, if appropriate), and level of measurement, noting any changes
introduced in recoding. Provide a concise (brief but complete) univariate analysis of each
variable. Pay special attention to missing values. (If you have two variables with many
missing cases, you may not have enough cases which are valid for both variables.) -
Briefly (a few sentences) describe the pattern and size of any relationship observed in the
crosstabulation, using a comparison of modal percentages. -
Assess whether there might be a dependent relationship between these two variables. You
do not have to compute the test statistic by hand (nor, for this question, do you need to
address percentage comparisons) nor do you need to list all steps and assumptions
involved, nor do you have to specify hypotheses ... but you must report the significance level
of the test statistic, interpret that value, and make a conclusion about dependence. - Assess the strength and direction of the relationship between the two variables, using
gamma to address this association. Also provide an interpretation of this measure in terms
of prediction errors.
IV: Interval Covariation
-
For your two interval variables, state an argument about how they might be related. Make a
case that there is a causal relationship between them (specifically stating which is the
independent variable and which is the dependent variable) and state expectations about the
form of that relationship. -
For each interval variable, identify its variable name, variable label, operational definition, and
level of measurement, noting any changes introduced in recoding. Provide a concise (brief
but complete) univariate analysis of each variable. Pay special attention to missing values.
(If you have two variables with many missing cases, you may not have enough cases which
are valid for both variables.) -
Make a scatter plot of the relationship between these two variables. Give a general
description of the plot – does it suggest that a relationship exists, and if so what type does it
suggest? (Make certain to make all possible inferences.) Are there any outliers evident in the
diagram? -
Report the parameters ("y-intercept" and "slope") of the regression equation, explain their
meanings in general terms, and give an interpretation of the particular statistics calculated
from your data. -
Extra credit:
What is the value of the standard error of the estimate? Give a general
interpretation of this statistic, and tell how it assesses the efficiency of your estimator of the
slope. -
Calculate (by hand) predicted values of the dependent variable for two values of the
independent variable. Interpret these two predicted values, label them on the scatter plot, and
plot the regressions line between these two points. -
Extra credit:
Which two statistics demonstrate the extent to which the dependent variable is
affected by the independent variable? Give their values and interpretations, and say how they
differ in meaning. -
Is the relationship statistically significant? How do you know? What is the null hypothesis?
Give the value of the appropriate test statistic and its significance level, and interpret both.
(You need not conduct a full hypothesis test – answer only these questions.) -
Interpret the correlation coefficient for the association of these two variables. How does this
statistic differ in meaning from the regression coefficient (for the "slope")? -
Extra credit:
What is the difference between the Sum of Squares of Regression and the Sum
of Squares of the Error (in words, not just the statistical difference)? What is the sum of
these two, and what does that sum assess? - Report the PRE statistic that describes how strongly your values of your independent
variable predict values of your dependent variable. Provide both a general interpretation of
this measure and an interpretation of your observed (calculated by SPSS) statistic.
Deliverable: Word Document
