Part 1 What are population, sample and sampling distributions? Explain how the confidence level and sample
Part 1
- What are population, sample and sampling distributions?
- Explain how the confidence level and sample size affect the width of confidence interval.
- Suppose in a sample, the correlation coefficient between salary and education is 0.7. If we run the bivariate regression using salary as dependent variable and education as independent variable, what is standardized coefficient of education? What is coefficient of determination (R=square)?
- Suppose we run a regression using salary as dependent variable and a set of dummy variables to describe race. The set of dummy variable include White, Black, Asian and other. The regression equation is:
Sal-hat=23568+2345white+1453Asian+987other
What is the reference group? If I change the reference group to Asian, write down the regression equation using Asian as reference group. What is the average salary for Black in this sample?
Part 2 Number Crunching
-
The property tax appraiser is concerned about possible horizontal inequity (equals not being treated equally) in two developments in West Dade. N for the first development is 15. X-bar (assessed value in thousands of dollars) is 285, s=14. For the second development, N = 11, X-bar is 272, s= 12.
- Using a two-tailed analysis, a=.05, address the possibility that development one and two are being assessed differently. Set up the null, the alternative, and the appropriate critical t value before testing the hypothesis.
-
A regional manager for Wendy’s is attempting to ascertain "drivers" of support for the "Value Menu." A telephone survey reveals the following distribution:
Choice Under Age 62 Over Age 62 Total Like the Menu 55 65 120 Don’t like the Menu 63 35 98 118 100 218
Set up the appropriate hypothesis test. Has the phone survey revealed a possible determinant of "Value Menu" sales? - A junior budget analyst in Hopeville, VA thinks that it is time to figure out if the city’s car pool should adopt a single brand for its operations outside of policing, where the Ford Crown Victoria remains standard issue. The analyst thinks that having a single car brand will make repairs easier and allow for standardized modification and procurement, while allowing for lower costs. She thinks that Southwest Airlines approach to jets (they only fly Boeing 737’s) is an example the city should follow.
Currently the city deploys Dodges, Toyotas, and Saturns for its civilian operations. The analyst believes that annual repair costs should be the principal deciding factor on the brand to choose, though she acknowledges that safety, comfort, and other factors should play a role when the city manager weighs in. That said, she wants annual repairs to be the initial variable tested for.
The analyst finds that Total Sums of Squares is (SST) in dollars is $200,000. The Sums of Squares Between (SSG) is 125,000. The analyst has used 12 cars for each brand (n=36).
- Set up the null and alternative hypotheses
- Test at 0.05 for equivalence of means
-
What conclusion might the junior budget analyst draw from the findings to pass along to the manager?
PART III : Multiple Regression Using SPSS (use the SPSS attachment to answer this)
You are asked to examine data on service delivery across substance abuse treatment providers in three cities: Chicago, IL; Washington, D.C.; and, Los Angeles, CA. Each provider in this data set treats individuals with alcohol and/or drug use problems. In addition to basic information about the characteristics of clients, the data file contains some basic information on the dimensions of services delivery or of services offered. The dataset also includes a measure of program success—the percent of clients free of substance use for 6 months (both alcohol and drugs). This has been determined from several random drug tests given the clients during the time. (Clients had to agree to the drug tests to participate in the program. Ignore the legal issues surrounding this, and assume no attrition of clients.) You are now ready to look at the impact that the different dimensions of these programs have on the clients remaining substance free. All of the persons in the dataset have experienced some type of treatment, so there is no comparison group, which received no treatment. However, the clients have attended different providers, which provide a different array of services. Unfortunately, the data is only available at the provider level, not individual client. You have collected some control variables to reflect the differences in the client population served by each center.
You should begin by converting all your data on clients into percent of total clients. You can do this using the " transform, compute variable " command in SPSS. For example, you could define "palcohol" = number of clients treated for alc. abuse/total number of clients. You need to do the following analysis in SPSS, and provide answers to questions 1-4 in a memo. You should attach the SPSS output you used to answer the questions.
1) How does service delivery vary across the three cities? There are eight measures or dimensions of service delivery in this data set. Use cross tabulations (frequencies) or bar charts to describe service delivery in these three cities. You can do this in SPSS with either:- Graph s , Legacy Dialogs, Bar, Clustered commands. Under the Category A xis put the type of service, and under the Define C lusters by put the city id.
- Analyze, Descriptive Statistics, Crosstab s . Under Rows put all the service variables, and under Columns put the City id variable. You can change what is shown in the cross tab (counts, row percent, column percent, total percent) by clicking the Cells button . You can test whether these differences are statistically significant from zero using a "chi-square test" by clicking the Statistics button and selecting chi-square in the upper left hand corner.
2) How do client characteristics vary across the three cities? There are sixteen measures or characteristics of clients in the data set. Using the percent of client variables you created and the comparison of means procedure, examine how client characteristics vary across these three cities. Use Analyze, Compare Means, Means commands. Under Dependent List put City id, and under the Independent list put the client variables. If you want to check whether these differences across cities are statistically significant, you can use the Analyze, Compare Means, One-Way ANOVA . Under the Dependent list put the client variables, and under Factor put the City id.
3) Run correlations between your dependent variable (free6m), and measures of program service, total clients, and percent of client variables you created. Which factors are statistically significant from zero and weakly (.3 or lower), moderately (.3-.7), or strongly (.7 or higher) associated with the dependent variable? Which would you start within your regression model?
4) Run the following multiple regression; Free6m = hotline, prevntn, socserv, total clients, percent white, percent age_u18, percent single mom.
a) Interpret the coefficients on percent white, percent single mom, and hotline.
b) Which of the coefficients on these variables can you conclude with confidence are not equal to zero?
The output required for this assignment includes;
- a three-page (double-spaced) memo discussing your results (12-point type, 1-inch margins).
- summary tables for questions 1-4 (I have attached an example of a typical summary table for regression output.)
- an appendix with your SPSS output.
Deliverable: Word Document
