The audience for movies. Here are data on the percent of people in several age groups who attended a movie
- 28: The audience for movies. Here are data on the percent of people in several age groups who attended a movie in the past 12 months:
- Display these data in a bar graph. What is the main feature of the data?
-
Would it be correct to make a pie chart of these data?
Why? - A movie studio wants to know what percent of the total audience for movies is 18 to 2.4 years old. Explain why these data do not answer this question.
1.34: Where are the doctors? Table 1.4 gives the number of active medical doctors per 100,000 people in each stare.
- Why is the number of doctors per 100,000 people a better measure of the availability of health care than a simple count of the number of doctors in a state?
- Make a histogram that displays the distribution of doctors per 100,000 people. Write a brief description of the distribution. Are there any outliers? If so, can you explain them?
2.24: Assets of young households. A report on the assets of American households says that the median net worth of households headed by someone younger than age 35 is $11,600. The mean net worth of these same young households is $90,700. What explains the difference between these two measures of center?
2.38: Athletes' salaries. In 2004, the Boston Red Sox won the World Series for the first time in 86 years. Table 2.4 gives the salaries of the Red Sox players as of opening day of the 2005 season. Describe the distribution of salaries both with a graph and with a numerical summary. Then write a brief description of the important features of the distribution.
3.30: Standard Normal drill. Use Table A to find the proportion of observations from a standard Normal distribution that falls in each of the following regions. In each case, sketch a standard Normal curve and shade the area representing the region.
- \(z \leq-2.25\)
- \(z \geq-2.25\)
- \(z>1.77\)
- \(-2.25<z<1.77\)
3.46: A surprising calculation. Charging the mean of a Normal distribution by a moderate amount can greatly charge the percent of observations in the tails. Suppose that a college is looking for applicants with SAT math scores 750 and above.
- In 2004 , the scores of men on the math SAT followed the \(N(537,116)\) distribution. What percent of men scored 750 or better?
Women's SAT math scores that year had the \(N(501,110)\) distribution. What percent of women scored 750 or better? You see that the percent of men above 50 is almost three times the percent of women with such high scores. Why this is true is controversial.
4.28: How many corn plants are too many? How much corn per acre should a farmer plant to obtain the highest yield? Too few plants will give a low yield. On the other hand, if there are too many plants, they will compete with each other for moisture and nutrients, and yields will fall. To find the best planting rate, plant at different rates on several plots of ground and measure the harvest. (Be sure to treat all the plots the same except for the planting rate.) Here are data from such an experiment:
- Is yield or planting rate the explanatory variable?
- Make a scatterplot of yield and planting rate. Use a scale of yields from 100 to 200 bushels per acre so that the pattern will be clear.
- Describe the overall pattern of the relationship. Is, it linear? Is there a positive or negative association, or neither? Is correlation \(r\) a helpful description of this relationship? Find the correlation if it is helpful.
- Find the mean yield for each of the five planting rates. Plot each mean yields against its planting rate on your scatterplot and connect these five points with lines. This combination of numerical description and graphing makes the
4.34: Teaching and research. A college newspaper interviews a psychologist about student ratings of the teaching of faculty members. The psychologist says, "The evidence indicates that the correlation between the research productivity and reaching rating of faculty members is close to zero. "The paper reports this as "Professor McDaniel said that good researchers tend to be poor teachers, and vice versa." Explain why the paper's report is wrong. Write a statement in plain language (don't use the word "correlation") to explain the psychologist’s meaning.
5.26: Sisters and brothers. How strongly do physical characteristics of sisters and brothers correlate? Here are data on the heights (in inches) of 11 adult pairs:
- Use your calculator or software to find the correlation and to verify that the least-squares line for predicting sister's height from brother’s height is \(\hat{y}=27.64+0.527 x\). Make a scatterplot that includes this line.
- Damien is 70 inches tall. Predict the height of his sister Tonya. Based on the scatterplot and the correlation \(r\), do you expect your prediction to be very accurate? Why?
5.32: Going to class. A study of class attendance and grades among first-year students at a state university showed that in general students who attended a higher percent of their classes earned higher grades. Class attendance explained \(16 \%\) of the variation in grade index among the students. What is the numerical value of the correlation between percent of classes attended and grade index?
5.34: Always plot your data! Table 5.1 presents four sets of data prepared by the statistician Frank Anscombe to illustrate the dangers of calculating without first plotting the data.
- Without making scatterplots, find the correlation and the least-squares regression line for all four data sets. What do you notice? Use the regression line co predict \(y\) for \(x=10\).
- Make a scatterplot for each of the data sets and add the regression line to. each plot.
6.30: Which hospital is safer? To help consumers make informed decisions about health care, the government releases data about patient outcomes in hospitals. You want to compare Hospital A and Hospital B, which serve your community. Here are data on all patients undergoing surgery in a recent rime period. The data include the condition of the patient ("good" or "poor") before the surgery. "Survived" means that the patient lived at least 6 weeks following surgery.
- Compare percents to show that Hospital A has a higher survival rate for both groups of patients.
- Combine the data into a single two-way table of outcome ("survived" or "died") by hospital (A or B). The local paper reports just these overall survival rates. Which hospital has the higher rate?
- Explain from the data, in language that a reporter can understand, how Hospital B car do better overall even though Hospital A does better for both groups of patients.
8.30 . Movie viewing. An opinion poll calls 2000 randomly chosen residential telephone numbers, then asks to speak with an adult member of the household. The interviewer asks, "How many movies have you watched in a movie theater in the past 12 months?"
- What population do you think the poll has in mind?
- In all, 1131 people respond. What is the rare (percent) of nonresponse?
- What source of response error is likely for the question asked?
8.36: Telephone area codes. There are approximately 371 active relephone area codes covering Canada, the United States, and some Caribbean areas. (More are created regularly.) You want to choose an SRS of 25 of these area codes for a study. of available telephone numbers. Label the codes 001 to 371 and use the Simple Random Sample applet or other software to choose your sample. (If you use Table B, start ac line 129 and choose only the first 5 codes in the sample.)
8.46 . Wording survey questions. Comment on each of the following as a potential sample survey question. Is the question clear? Is it slanted toward a desired response?
- "Some cell phone users have developed brain cancer. Should all cell phones come with a warning label explaining the danger of using cell phone?"
- Do you agree that a national system of health insurance should be favored because it would provide health insurance for everyone and would reduce administrative costs?\
- "In view of the negative externalities in parent labor force participation and pediatric evidence associating increased group size with morbidity of children in day care, do you support government subsidies for day care programs?
9.26: Treating breast cancer. The most common treatment for breast cancer discovered in its early stages was once removal of the breast. It is now usual to remove only the tumor and nearby lymph nodes, followed by radiation. To study whether these treatments differ in their effectiveness, a medical ream examines the records of 25 large hospitals and compares the survival times after surgery of all women who have had either treatment.
- What are the explanatory and response variables?
- Explain carefully why this study is nor an experiment.
- Explain why confounding will prevent this study from discovering which treatment is more effective. (The current treatment was in fact recommended after several large randomized comparative experiments.)
9.46: An herb for depression? Does the herb Saint-John's-wort relieve major depression? Here are some excerpts from the report of a study of this issue
The study concluded that the herb is no more effective than a placebo.
- "Design: Randomized, double-blind, placebo-controlled clinical trial..."A clinical trial is a medical experiment using actual patients as subjects. Explain the meaning of each of the other terms in this description.
- "Participants ... were randomly assigned to receive either Saint-John's-wort extract \((n=98)\) or placebo \((n=102)\)... The primary outcome measure was the rate of change in the Hamilton Rating Scale for Depression over the treatment period." Based on this information, use a diagram co outline the design of this clinical trial.
10.40 : Colors of M&M's. If you draw an M&M candy at random from a bag of the candies, the candy you draw will have one of six colors. The probability of drawing each color depends on the proportion of each color among all candies made. Here is the distribution for milk chocolate M&M's:
- What must be the probability of drawing a blue candy?
- What is the probability that you do not draw a brown candy?
- What is the probability that the candy you draw is yellow, orange, or red?
11.28: Roulette. A roulette wheel has 38 slots, of which 18 are black, 18 are red, and 2 are green. When the wheel is spun, the ball is equally likely to come to rest in any of the slots. One of the simplest wagers chooses red or black. A bet of $1 on red returns $2 if the ball lands in a red slot. Otherwise, the player loses his dollar. When gamblers bet on red or black, the two green slots belong to the house. Because the probability of winning $2 is \(18 / 38\), the mean payoff from a $1 bet is twice \(18 / 38\), or $94.7 cents. Explain what the law of large numbers tells us about what will happen if a gambler makes very many bets on red.
11.36: Glucose testing. Shelia's doctor is concerned that she may suffer from gestational diabetes (high blood glucose levels during pregnancy). There is variation both in the actual glucose level and in the blood test that measures the level. A patient is classified as having gestational diabetes if the glucose level is above 140 milligrams per deciliter (mg/di) ore hour after a sugary drink. Shelia’s measured glucose level one hour after the sugary drink varies according to the Normal distribution with \(\mu=125\) mg/dl and \(\sigma=10\) mg/dl
- If a single glucose measurement is made, what is the probability that Shelia is diagnosed as having gestational diabetes?
- If measurements are made or 4 separate days and the mean result is compared
with the criterion 140 mg/dl, what is the probability that Shelia is diagnosed as having gestational diabetes?
14.30: Student study times. A class survey in a large class for first-year college students asked "About how many minutes do you study on a typical weeknight?" The mean response of the 269 students was \(\bar{x}=137.\) minutes. Suppose that we know that the study time follows a Normal distribution with standard deviation. \(\sigma=65\) minutes in the population of all first-year students at this university.
- Use the survey result to give a \(99 \%\) confidence interval for the mean study time of all first-year students.
- What condition not yet mentioned is needed for your confidence interval to be valid?
14.36: 14.36 Crime. A Gallup Poll of 1002 adults found that \(25 \%\) of the respondents said that their household had experienced a crime in the past year. Among respondents aged 18 to 29 years, \(43 \%\) had been victims of a crime. Gallup says, "For results based on the total sample of national adults, one can say with \(95 \%\) confidence that the margin of sampling error is \(\pm 3\) percentage points." Is the margin of error for adults aged 18 to 29 smaller or larger than \(\pm 3\) percentage points? Why?
15.36: This wine stinks. Sulfur compounds cause "off-odors" in wine, so winemakers want to know the odor threshold, the lowest concentration of a compound that the human nose can detect. The odor threshold for dimethyl sulfide (DMS) in trained wine tasters is about 25 micrograms per liter of wine \((\mu \mathrm{g} / \mathrm{l})\). The untrained noses of consumers may be less sensitive, however. Here are the DMS odor thresholds for 10 untrained students:
Assume that the odor threshold for untrained noses is Normally distributed with \(\sigma=7 \mu \mathrm{g} / \mathrm{l}\). Is there evidence that the mean threshold for untrained casters is greater than \(25 \mu \mathrm{g} / \mathrm{l}\) ? Follow the four-step process, as illustrated in Example 15.8, in your answer.
15.44: 25.44 The wrong \(P\). The report of a study of seat belt use by drivers says, "Hispanic drivers were nor significantly more likely than White/non-Hispanic drivers to over report safety belt use (27.4 vs. \(2 \% 1 \%\), respectively; \(z=1.33, P>1.0\) )" How do you know that the \(P\) -value given is incorrect? What is the correct one-sided \(P\) -value for rest statistic \(z=1.33\) ?
16.42: Internet users. A survey of users of the Internet found that males outnumbered females by nearly 2 to 1 . This was a surprise, because earlier surveys had put the ratio of men to women closer to 9 to 1 . Later in the article we find this information:
Detailed surveys were sent to more than 13,000 organizations on the Internet; 1,468 usable responses were received. According to Mr. Quarterman, the margin of error is 2.8 percent, with a confidence level of 95 percent.
- What was the response rate for this survey? (The response rate is the percent of che planned sample that responded.)
- Do you think that the small margin of error is a good measure of the accuracy of the survey's results? Explain your answer.
17.30: Normal body temperature? Here are the daily average body temperatures (degrees Fahrenheit) for 20 healthy adults. Do these data give evidence that the mean body temperature for all healthy adults is not equal to the traditional 98.6 degrees? Follow the four-step process for significance tests (page 372). (Suppose that body temperature varies Normally with standard deviation 0.7 degree.)
| 98.74 |
| 98.63 |
| 96.8 |
| 98.12 |
| 97.89 |
| 98.09 |
| 97.87 |
| 97.42 |
| 97.3 |
| 97.84 |
| 100.27 |
| 97.9 |
| 99.64 |
| 97.88 |
| 98.54 |
| 98.33 |
| 97.87 |
| 97.48 |
| 98.92 |
| 98.33 |
18.26: Alcohol in wine. The alcohol content of wine depends on the grape variety, the way in which the wine is produced from the grapes, the weather, and other influences. Here are data on the percent of alcohol in wine produced from the. same grape variety in the same year by 48 winemakers in the same region of Italy:
- Make a histogram of the data, using class width 0.25. The shape of the distribution is a bit irregular, but there are no outliers or strong skewness. Than no reason to avoid use of \(t\) procedures for \(n=48\).
- Give a \(95 \%\) confidence interval for the mean alcohol content of wines.
18.34: Growing trees faster. The concentration of carbon dioxide \(\left(\mathrm{CO}_{2}\right)\) in the atmosphere is increasing rapidly due to our use of fossil fuels. Because plants use \(\mathrm{CO}_{2}\) to fuel photosynthesis, more \(\mathrm{CO}_{2}\) may cause trees and other plants to grow faster. An elaborate apparatus allows researchers to pipe extra \(\mathrm{CO}_{2}\) to a 30 -meter circle of forest. They selected two nearby circles in each of three parts of a pine forest and randomly chose one of each pair to receive extra \(\mathrm{CO}_{2}\). The response variable is the mean increase in base area for 30 to 40 trees in a circle during a growing season. We measure this in percent increase per year. Here are one year's data:
| Pair | Control Plot | Treated Plot | Difference |
| 1 | 9.753 | 10.5587 | 0.8057 |
| 2 | 7.263 | 9.244 | 1.981 |
| 3 | 5.742 | 8.675 | 2.933 |
- State the null and alternative hypotheses. Explain clearly why the investigators used a one-sided alterative.
- Carry out a test and report your conclusion in simple language.
- The investigators used the test you just carried out. Any use of the $t$ procedures with samples this size is risky. Why?
19.30: Active versus passive learning. A study of computer-assisted learning examined the leaning of "Blissymbols" by children. Blissymbols are pictographs (think of Egyptian hieroglyphs) that are sometimes used to help learning-impaired children communicate. The researcher designed two computer lessons that taught the same content using the same examples. One lesson required the children to interact with the material, while in the ocher the children controlled only the pace of the lesson. Call these two styles "Active" and "Passive." Children were assigned at random to Active and Passive groups. After the lesson, the computer presented a quiz that asked the children to identify 56 Blissymbols. Here are the numbers of correct identifications by the 24 children in the Active group:
The 24 children in the Passive group had these counts of correct identifications:
Is there good evidence that active learning is superior to passive learning? Follow the four-step process as illustrated in Examples 19.2 and 19.3. That is, state
hypotheses, make graphs to examine the data, discuss the conditions for inference.
19.32: Active versus passive learning, continued.
- Use the data in Exercise 19.30 to give a \(90 \%\) confidence interval for the difference in mean number of Blissymbols identified correctly by children after active and passive lessons.
- Give a \(90 \%\) confidence interval for the mean number of Blissymbols identified correctly by children after the active lessor.
19.46: Tropical flowers. Different varieties of the tropical flower Heliconid by different species of hummingbirds. Over time, the lengths of the flower take the form of the hummingbirds' beaks have evolved to match each other. Data on the lengths in millimeters of two color varieties of the same flower on the island of Dominica:
Is there good evidence that the mean lengths of the two varieties differ? Estimate the difference between the population means. (Use \(95 \%\) confidence.)
20.32: Running red lights. A random digit dialing relephone survey of 880 drivers asked, "Recalling the last ten traffic lights you drove through, how many of them were red when you entered the intersections?" Of the 880 respondents, 171 admitted that at least one light had been red. 19
- Give a \(95 \%\) confidence interval for the proportion of all drivers who ran one or more of the last ten red lights they met.
- Nonresponse is a practical problem for this survey-only \(21.6 \%\) of calls that reached a live person were completed. Another practical problem is that
people may not give truthful answers. What is the likely direction of the bias: do you think more or fewer than 171 of the 880 respondents really ran a red light? Why?
20.42: Online publishing. Publishing scientific papers online is fast, and the papers can be long. Publishing in a paper journal means that the paper will live forever in libraries. The British Medical Journal combines the two: it prints short and readable versions, with longer versions available online. Is this OK with authors? The journal asked a random sample of 104 of its recent authors several questions
One question was "Should the journal continue using this system?" In the sample,
72 said "Yes." What proportion of all authors would say "Yes" if asked? (Estimate
with \(95 \%\) confidence.) Do the data give good evidence that more than two-thirds
(67%) of authors support continuing this system? Answer both questions with
appropriate inference methods.
23.30: Child care workers. A large study of child care used samples from the data tapes of the Current Population. Survey over a period of several years. The result is close to an SRS of child care workers. The Current Population Survey has three classes of child care workers: private household, nonhousehold, and preschool teacher. Here are data on the number of blacks among women workers in these three classes:
| Black | Non-Black | |
| Household | 172 | 2283 |
| Non-household | 167 | 1024 |
| Teachers | 86 | 573 |
- What percent of each class of child care workers is black?
- Make a two-way table of class of worker by race (black or other).
- Can we safely use the chi-square test? What null and alternative hypotheses does \(\chi^{2}\) test?
- The chi-square statistic for this table is \(\chi^{2}=53.194\). What are its degrees of freedom? What is the mean of \(\chi^{2}\) if the null hypothesis is true? Use Table E to approximate the P-value of the test.
- What do you conclude from these data?
23.40: Where do young adults live? A survey by the National Institutes of Health asked a random sample of young adults (aged 19 to 25), "Where do you live now? That is, where do you stay most often?" We earlier (page 513) compared the proportions of men and women who lived with their parents. Here now is the full two-way table (omitting a few who refused to answer and one who claimed to be homeless): 22
What are the most important differences between young men and women? Are their choices of living places significantly different?
25.36: Durable press fabrics are weaker. "Durable press" cotton fabrics ate treated to improve their recovery from wrinkles after washing. Unfortunately, the treatment also reduces the strength of the fabric. A study compared the breaking strength of untreated fabric with that of fabrics treated by three commercial durable press processes. Five specimens of the same fabric were assigned at random to each group. Here are the data, in pounds of pull needed to tear the fabric: 19
The untreated fabric is clearly much stronger than any of the treated fabrics. We want to know if there is a significant difference in breaking strength among the three durable press treatments. Analyze the data for the three processes and write a clear summary of your findings. Which process do you recommend if breaking strength is a main concern? Use the four-step process to guide your discussion. (Although the standard deviations do not quite satisfy our rule of thumb, that rule is conservative and many statisticians would use ANOVA for these data.)
Deliverable: Word Document
