Final Project Chapter 1-3: (40pts.) Use the Data Set 11(Appendix B): Ages of Oscar Winners- Best Actresses
Final Project
Chapter 1-3:
(40pts.) Use the Data Set 11( Appendix B): Ages of Oscar Winners- Best Actresses set to solve the following problems:
- Use a calculator or software to find the following statistics values-mean, median, mode, range, midrange, and the standard deviation for the data set.
- Organize the data set in a frequency table using bins with width 10, starting from 20-29, 30-39 and so on. Include columns for the relative and cumulative frequency.
- Construct a histogram based on the table of step 2. Analyze the distribution curve- number of pea ks, the symmetry, and variation.
- Use the frequency table and identify class midpoints and class boundaries. Calculate the mean, median and mode of the data set. Co mpare these values with the values you found in step 1
-
Calculate the standard deviation of the data set using the frequency table and the formula,
where x represents the class midpoint, f represents the class frequency, and n represents the total number of sample values. Also, compare this value of standard deviation with the value found in step 1. (Are they the same? Did we expect these two values to be the same?) - Calculate the coefficient of variation of the data set (based on the mean and standard deviation calculated in step 1).
- Use range rule of thumb to estimate the standard deviation of the data set. Compare this approximation with the values found in step 1 and 5. ( Are they the same? Did we expect the values to be the same? Why or why not? )
- Use a calculator or software to find the 5- Number Summary of the data set. Construct a boxplot. Calculate IQR, semi-interquartile range, and midrange.
- Identify outliers if there are any and construct a modified boxplot.
- Identify the data value of the 18 th percentile –P 18 . Find in which percentile is the value 40.
You can solve this problem manually or use statistical software. If you use software, attach all software calculations, graphs, tables, diagrams. Include analysis and answer all questions. Just print outs from software without a written report and analysis will bring you only half of the points for this problem.
Chapter 4:
-
(15
pts.)
Classic Birthday Problem:
Find the pro bability that among 25 randomly selected people, at least 2 have the same birthday.
To solve this problem you have to use a simulation. A simulation of a procedure is a process that behaves the same way as the procedure so that similar results are produced. For the above classic birthday problem, a simulation begins by representing birthdays by integers from 1 through 365, where 1 represents a birthday of January 1, and 2 represents January 2, and so on. We can simulate 25 birthdays by using a calculator or computer to generate 25 random numbers (with repetition allowed) between 1 and 365. Those numbers can then be sorted, so it becomes easy to examine the list the list to determine whether any 2 of the simulated birth days are the same. (After sorting equal numbers are adjacent.) We can repeat the process as many times as we like, until we are satisfied that we have a good estimate of the probability.
TI-83/84 Plus: Press MATH, select PRB, the choose randInt . Enter the minimum of 1, the maximum of 365, and 25 for the number of values, all separated by commas, as in randInt (1, 365, 25). Press ENTER. You can store he data in list L1, then you can sort L1 by pressing STAT and selecting SortA .
StatCrunch : Click on Open StatCrunch , then click on Data and select the menu item of Simulate data . Among the options available, select Discrete Uniform . In the dialog box that appears, enter 25 for the number of rows and enter 20 for the number of columns (as required for the following simulation). Enter 1 for the minimum and enter 365 for the maximum, then click on Simulate . You can sort columns by clicking on Data , then selecting the menu item of Sort columns . The sorted columns will appear to the right of the original columns.
Use the above simulation method to randomly generate 20 different groups of 25 birthdays. Use the result to estimate the probability that among 25 randomly selected people, at least 2 have the same birthday. Count the number of columns in which there are at least two people with the same birthday and divide this number by 20(the total number of columns).
Again, to solve this problem, you have to use a simulation! Don’t ca lculate the probability using formulas . Include a copy of the simulated data ( 20 columns with 25 numbers between 1-365 in each of them) -
(20 pts.) Use the data in the accompanying table to answer the following questions. Express the probability as
decimal numbers.
Show your work!
| Guilty Plea | Plea of Not Guilty | |
| Sentenced to Prison | 382 | 62 |
| Not Sentenced to Prison | 574 | 10 |
- If one of the 1028 subjects is randomly selected, find the probability of selecting someone sentenced to prison.
- Find the probability of being sentenced to prison, given that the subject entered a plea of guilty.
- Find the probability of being sentenced to prison, given that the subject entered a plea of not guilty.
- After comparing the results of questions b and c, what do you conclude of wisdom of entering a guilty plea?
- If 1 of the subjects is randomly selected, find the probability of selecting someone who was sentenced to prison or entered a plea of guilty.
- If 2 different subjects are randomly selected, find the probability that they both were sentenced to prison.
- If 2 different subjects are randomly selected, find the probability that they both entered pleas of not guilty.
- If 1 of the subjects is randomly selected, find the probability of selecting someone who entered a plea of not guilty or was not sentenced to prison.
- If 1 of the subjects is randomly selected, find the probability of selecting someone who was sentenced to prison and entered a plea of guilty.
- If 1 of the subjects is randomly selected, find the probability of selecting someone who was not sentenced to prison and did not entered a plea of guilty.
Chapter 5:
- (10 pts.)The digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 are randomly selected for applications including the selection of lottery numbers and the selection of telephone numbers to be called as part of a survey. In the following tables, the first one summarizes actual results from 100 randomly selected digits, and the other one summarizes the probabilities of the different digits.
| Digit | Frequency | Digit x | P(x) | |
| 0 | 9 | 0 | 0.1 | |
| 1 | 7 | 1 | 0.1 | |
| 2 | 12 | 2 | 0.1 | |
| 3 | 10 | 3 | 0.1 | |
| 4 | 10 | 4 | 0.1 | |
| 5 | 11 | 5 | 0.1 | |
| 6 | 8 | 6 | 0.1 | |
| 7 | 8 | 7 | 0.1 | |
| 8 | 14 | 8 | 0.1 | |
| 9 | 11 | 9 | 0.1 |
- What is the table at the left called?
- What is the table at the right called?
- Use the table at the left to find the mean. Is it a statistics or a parameter?
- Use the table at the right to find the mean. Is it a statistics or a parameter?
- If you were to randomly generate 1000 such digits, would you expect the mean of these 1000 digits to be close to the result from part ( c ) or part ( d )? Why?
-
(10pts.)Based on a USA today poll, assume that 12% of the population believes that the college is no longer a good investment.
a) Find the probability that among 16 randomly selected people; exactly 4 believe that college is no longer a good investment.
b) Find the probability that among 16 randomly selected people, at least 1 believes that college is no longer a good investment.
c) The poll results were obtained by Internet users logged on the USA Today web site, and the Internet users decided whether to ignore the posted survey or respond. What type of sample is this? What does it suggest about the validity of the results?
Don’t use the binomial probability formula! Use a calculator or software to solve the problem.
3. (10pts.) The analysis of the last digits of data can sometimes reveal whether the data have been collected through actual measurements or reported by subjects. Refer to an almanac or the Internet and find a collection of data (such as lengths of rivers in the world – at least 15), then analyze the distribution of the last digits to determine whether the values were obtained through actual measurements.
Create a table that shows the distribution of the last digit.
Include a copy of the data.
Chapter 6:
-
(15pts.)
Birth Weights
Birth weights in the United States have a distribution that is approximately normal with a mean of 3369g and a standard deviation of 567g (based on data from "Comparison of Birth Weight Distribution between Chinese and Caucasian Infants," by Wen, Kramer , Usher, American Journal of Epidemiology, Vol.172, No.10).
a) One definition of a premature birth is that the birth weight is below 2500g. If a baby is randomly selected, find the probability of a birth weight below 2500g.
b) Another definition of a premature birth weight is in the bottom 10%. Find the birth weight that is the cutoff between the bottom 10% and the top 90%.
c) A definition of a "very low birth weight" is one that is less than 1500g. If a baby is randomly selected, find the probability of a "very low birth weight."
d) If 25 babies are randomly selected, find the probability that their mean birth weight is grater that 3400g.
Chapter 7:
- (10pts.)You have been hired by a college foundation to conduct a survey of graduates.
- If you want to estimate the percentage of graduates who made a donation to the college after graduation, how many graduates must you survey if you want 98% confidence that your percentage has a margin of error of 5 percentage point?
-
If you want to estimate the mean amount of charitable contributions made by graduates, how may graduates must you survey if you want 98% confidence that your sample mean is in error by no more than $50? (Based on result from a pilot study, assume that the standard deviation of donations by graduates is $337.)
-
(15pts.) In a survey of 1003 people, 59% said that they have never hesitated to give a handshake because they had fear of germs. The survey results were reported in USA Today, and the survey is conducted by Wakefield Research for Purell, a supplier of had sanitizer products.
a) Construct a 95% confidence interval estimate of the proportion of people in the population who have never hesitated to give a handshake because of a fear of germs.
b) Is there anything about the survey that might make the results questionable?
c) If an independent pollster wanted to conduct another survey to confirm or refute the results, how many people must be surveyed? Assume we want 90% confidence that the sample percentage is within 2.5 percentage points of the true population proportion.
Chapter 8:
Show all your work. Follow the Steps of the Hypothesis Testing Procedure Practiced in Class.
-
(1
0
pts.)
Designing an aircraft cockpit.
In designing a cockpit for a Boeing aircraft, the overhead grip reach of a selected pilot is being considered as important factor for placement of landing light switches to be located directly above the pilot. Listed below are the measured overhead grip reaches (cm) of a simple random sample of women. Use a 0.01 significance level to test a claim that the mean overhead grip reach of women is less than the value of 123 cm. that is being planned for the aircraft.
120, 115, 130, 123, 118, 118, 118, 116, 121, 119, 131, 125, 119, 124, 122, 121, 129, 125, 126, 115, 122. - (15pts.) Pennsylvania Lottery . In Pennsylvania Match 6 Lottery, six numbers between 1 and 49 are randomly drawn. Use a calculator or a computer to generate 100 random numbers between 1 and 49 (with replacement) and calculate the mean and the standard deviation of your sample. Use a 0.01 significance level to test the claim that the sample is selected from a population with a mean equal to 25, which is the mean of the population of all drawn numbers.
Include a copy of the sample you are using.
Chapter 10:
(30pts.) Use the Data Set 7 (Appendix B): Bears (measurements from anesthetized wild bears) to:
-
Create two different scatter plots.
a) Graph1: Age (x) and Weight (y)
b) Graph2: Chest (x) and Weight (y) -
For every scatter plot write up an analysis of all the information that you learn from the picture.
FOR EXAMPLE:
- State the correlation coefficient. Is there a significant correlation? (Why? Compare r with the critical r value, Use significance level 0.05.)
- If there appear to be a correlation, describe the correlation (direction, strength).
- What percentage of the variation in weight can be explained by the linear relation to age/chest -
Answer all of the following questions.
a) Based on your analysis in step 2, do you think it is possible to infer a bear’s weight from its age. Explain your answer.
b) Using the relationships that you calculated, determine the approximate age and chest of a bear with the following weights:
- 170 lb -70 lb
c) Suppose you measure the chest of a bear, you predict that the weight of the bear is 70 lb, and you latter find out that the weight of that bear is actually 75 lb. Give one possible reason that your prediction was incorrect.
d) Using the relationships that you calculated, determine the approximate weight of a bear with
- age 56 - age 101
- chest 30.0 - chest 53.5 -
Using the relationship between the age and the weight answer the following questions:
- What does the slope of the regression equation represent?
- What does the y-intercept represent? Is it meaningful?
Deliverable: Word Document
