Example 1: Inferential Statistics- With inferential statistics, you are trying to reach conclusions that
-
Example 1: Inferential Statistics- With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.
Example 2: Descriptive Statistics- Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to simply large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary. For instance, consider a simple number used to summarize how well a batter is performing in baseball, the batting average. This single number is simply the number of hits divided by the number of times at bat (reported to three significant digits). A batter who is hitting .333 is getting a hit one time in every three at bats. One batting .250 is hitting one time in four. The single number describes a large number of discrete events. Or, consider the scourge of many students, the Grade Point Average (GPA). This single number describes the general performance of a student across a potentially wide range of course experiences. - the term non-parametric statistics covers a range of topics:
- distribution free methods which do not rely on assumptions that the data are drawn from a given probability distribution . As such it is the opposite of parametric statistics . It includes non-parametric statistical models , inference and statistical tests .
Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.
- A histogram is a simple nonparametric estimate of a probability distribution
- Kernel density estimation provides better estimates of the density than histograms.
- Nonparametric regression and semiparametric regression methods have been developed based on kernels , splines , and wavelets .
- Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate Analysis without any distributional assumption.
Also known as classical or standard tests , these are statistical tests which make certain assumptions about the parameters of the full population from which the sample is taken; it is assumed, for example, that the data show a normal distribution , and that, where populations are compared, they show the same variance . If these assumptions do not apply, non-parametric tests must be used. Parametric tests normally involve data expressed in absolute numbers or values rather than ranks; an example is the Student's t-test .
3. Internal validity is the degree to which the results of a particular study can be used to apply to inferences based on a controlled cause-and-effect experimentation, however; it does not apply necessarily to things in the real world. External validity is the degree to which the results of a study can be generalized to apply to real-world situations.
4. When a researcher has an accidental or convenience sample, what kind of population can he or she try to make inferences about?
A: In an accidental or convenience sample, the participants are not randomly sampled, meaning the targeted population is the one of interest.
5. Assume that a population of thousands of people whose responses were used to develop an anxiety test had scores that were normally distributed with m = 30 and s = 10. What proportion of people in this population would have anxiety scores within each of the following ranges of scores?
- Below 20.
- Above 30.
-
Between 10 and 50.
6. What is SE M ? What does the value of SE M tell you about the typical magnitude of sampling error? - As s increases, how does the size of SE M change (assuming that N stays the same)?
- As N increases, how does the size of SE M change (assuming that s stays the same)?
7. Under what circumstances should a t distribution be used rather than the normal distribution to look up areas or probabilities associated with distances from the mean?
8. Under what circumstances should a
t
distribution be used rather than the normal distribution to look up areas or probabilities associated with distances from the mean?"
To complete questions 9 and 10, use Table 4.1 "Data for the Blood Pressure/Social Study" on page 127–128 of your Warner textbook to create a PASW data file. Save this PASW file to your PC with the name bpstudy.sav , and submit it along with your assignment.
Select three variables from the dataset bpstudy.sav Two of the variables should be good candidates for correlation / regression, and the other variable should be a poor candidate. Good candidates are variables that meet the assumptions (such as normally distributed, reliably measured, interval-ratio level of measurement). Poor candidates are variables that do not meet assumptions or that have clear problems (such as restricted range, extreme outliers, gross non-normality of distribution shape).
- Use the FREQUENCIES procedure to obtain a histogram and all univariate descriptive statistics for each of the three variables.
- Create a scatter plot for the two "good candidate" variables.
- Create a scatter plot for the "poor candidate" variable using one of the two good variables.
Note: In addition to the variables given in the PASW file, you can also use variables that are created by compute statements, such as scale scores formed by summing items, such as Hostility = H1 + H2 + H3 + H4.
8. Under what circumstances should a t distribution be used rather than the normal distribution to look up areas or probabilities associated with distances from the mean?
- Explain which variables are good and poor candidates for a correlation analysis and give your rationale. Comment on empirical results from your data screening—both the histograms and scatter plots—as evidence that these variables meet or do not meet the basic assumptions necessary for correlation to be meaningful and honest. What other information would you want to have about the variables in order to make better informed judgments?
- (Optional.) Is there anything that could be done (in terms of data transformations or eliminating outliers for instance) to make your poor candidate variable better? If so, what would you recommend?
Deliverable: Word Document
