American League baseball teams play their games with the designated hitter rule, meaning that pitchers do not bat. The league believes that replacing the pitcher, typically a weak hitter, with another player in the batting order produces more runs and generates more interest among fans. Following are the average number of runs scored by each team in the 2016 season:
1. Create an appropriate graphical display of these data.
2. Write a few sentences comparing the average number of runs scored per game in the two leagues. (Remember: shape, center, spread, unusual features!)
3. The runs per game leaders were the Red Sox and the Rockies in the American and National League, respectively. Did either of those teams score an unusually large number of runs per game? Explain briefly.
4. Is the actual difference in mean runs per game between the leagues in 2016 different from what you might expect by chance if there really were no difference? Using software, shuffle the league labels 1000 times and create a histogram of the differences. What do you conclude?
> A researcher wants to compare the performance of three types of antacid in volunteers suffering from acid reflux disease. Because men and women may react differently to this medication, the subjects are split into two groups, by sex. Subjects in each gro
> Medical studies indicate that smokers are less likely to develop Alzheimer disease than people who never smoked. 1. Does this prove that smoking may offer some protection against Alzheimer? Explain. 2. Offer an alternative explanation for this associatio
> Readers Digest (April 2002, p. 152) reported results of several surveys that asked graduate students to examine photographs of men and women and try to guess their ages. Researchers compared these guesses with the number of times the people in the pictur
> Researchers at the Purina Pet Institute studied Labrador retrievers for evidence of a relationship between diet and longevity. At 8 weeks of age, 2 puppies of the same sex and weight were randomly assigned to one of two groups a total of 48 dogs in all.
> The table lists the amounts of rainfall (in acre-feet) from the 26 clouds seeded with silver iodide discussed in Exercise 40. (Data in Cloud seeding) 1. Why is acre-feet a good way to measure the amount of precipitation produced by cloud seeding? 2. Plot
> A study examined brain size (measured as pixels counted in a digitized magnetic resonance image [MRI] of a cross section of the brain) and IQ (4 performance scales of the Wechsler IQ test) for college students. The scatterplot shows the Performance IQ sc
> A college statistics class conducted a survey concerning community attitudes about the college large homecoming celebration. That survey drew its sample in the following manner: Telephone numbers were generated at random by selecting one of the local tel
> Mary Beth, Nigel, and Molly want to design an experiment to find the recipe for the best chocolate chip cookies. They will try to keep the size of the cookies the same, but use cooking times of 10 and 15 minutes. They will use three different temperature
> Sofie, Ryan, and Alessandra wanted to design an experiment to find out how distraction affects our ability to judge time. The experiment consisted of starting a clock (out of view of the subject) and then asking the subject to tell them when they thought
> An experiment to test a new laundry detergent, SparkleKleen, is being conducted by a consumer advocate group. They would like to compare its performance with that of a laboratory standard detergent they have used in previous experiments. They can stain 1
> In August 2011, a Sodahead.com voluntary response poll asked site visitors, Obama is on Vacation Again: Does He Have the Worst Timing Ever? 56% of the 629 votes were for Yes. During the week of the poll, a 5.8 earthquake struck near Washington, D.C., and
> In another experiment to see if getting candy after a meal would induce customers to leave a bigger tip, a waitress randomly decided what to do with 80 dining parties. Some parties received no candy, some just one piece, and some two pieces. Others initi
> In restaurants, servers rely on tips as a major source of income. Does serving candy after the meal produce larger tips? To find out, two waiters determined randomly whether or not to give candy to 92 dining parties. They recorded the sizes of the tips a
> Researchers at the Washington University School of Medicine randomly placed 480 rats into one of three chambers containing radio antennas. One group was exposed to digital cell phone radio waves, the second to analog cell phone waves, and the third group
> A paper published in 2017 in JAMA Internal Medicine (jamanetwork.com/journals/jamainternalmedicine/fullarticle/2623528) reported on a study of alternate-day fasting as a weight-loss method. One hundred obese persons were assigned at random to one of thre
> Use the statistics package of your choice or the simple sample tool at astools.datadesk.com to draw samples of the conditions from the New York bridges 2016 data file. Draw a sample of 50, a sample of 100, a sample of 200, and a sample of 500. Compare th
> Here are the same data you saw in Exercise 45 after re-expressions as the square root of assets (in $M) and the logarithm of assets (in $M): 1. Which re-expression do you prefer? Why? 2. In the square root re-expression, what does the value 50 actually i
> The journal Circulation reported that among 1900 people who had heart attacks, those who drank an average of 19 cups of tea a week were 44% more likely than nondrinkers to survive at least 3 years after the attack.
> The data file Commuter sample holds a sample drawn from the Population commute times data set. Using your statistics program make histograms of the sample and the population. Discuss how they are similar and how they differ.
> Does the use of computer software in introductory statistics classes lead to better understanding of the concepts? A professor teaching two sections of statistics decides to investigate. She teaches both sections using the same lectures and assignments,
> Older Americans with a college education are significantly more likely to be emotionally well-off than are people in this age group with less education. Among those aged 65 and older, 35% scored 90 or above on the Emotional Health Index, but for those wi
> A soft-drink manufacturer must be sure the bottle caps on the soda are fully sealed and will not come off easily. Inspectors pull a few bottles off the production line at regular intervals and test the caps. If they detect any problems, they will stop th
> An orange-juice processing plant will accept a shipment of fruit only after several hundred oranges selected from various locations within the truck are carefully inspected. If too many show signs of unsuitability for juice (bruised, rotten, unripe, etc.
> People aged 50 to 71 were initially contacted in the mid-1990s to participate in a study about smoking and bladder cancer. Data were collected from more than 280,000 men and 186,000 women from eight states who answered questions about their health, smoki
> Tests of gene therapy on laboratory rats have raised hopes of stopping the degeneration of tissue that characterizes chronic heart failure. Researchers at the University of California, San Diego, used hamsters with cardiac disease, randomly assigning 30
> An artisan wants to create pottery that has the appearance of age. He prepares several samples of clay with four different glazes and test fires them in a kiln at three different temperature settings.
> Widely used antidepressants may reduce ominous brain plaques associated with Alzheimer disease. In the study, mice genetically engineered to have large amounts of brain plaque were given a class of antidepressants that boost serotonin in the brain. After
> Students were asked how many songs they had in their digital music libraries. Here a display of the responses: 1. What aspect of this distribution makes it difficult to summarize, or to discuss, center and spread? 2. What would you suggest doing with the
> Some doctors have expressed concern that men who have vasectomies seemed more likely to develop prostate cancer. Medical researchers used a national cancer registry to identify 923 men who had had prostate cancer and 1224 men of similar ages who had not.
> Researchers identified 242 children in the Cleveland area who had been born prematurely (at about 29 weeks). They examined these children at age 8 and again at age 20, comparing them to another group of 233 children not born prematurely. Their report, pu
> Exercise R2.7 examined the correlation between BCI and pH in streams sampled in the Adirondack Mountains. Here is the corresponding regression model: Response variable is: BCI R-squared = 27.1% s = 140.4 1. Write the regression model. 2. What is the inte
> Marine biologists warn that the growing number of powerboats registered in Florida threatens the existence of manatees. The data in the table come from the Florida Fish and Wildlife Conservation Commission (myfwc.com/research/manatee/) and the U.S. Coast
> Biologists studying the effects of acid rain on wildlife collected data from 163 streams in the Adirondack Mountains. They recorded the pH (acidity) of the water and the BCI, a measure of biological diversity, and they calculated R2=27%. Here a scatterpl
> The Dow Jones stock index measures the performance of the stocks of America largest companies (finance.yahoo.com). A regression of the Dow prices on years 1972–2015 looks like this: Dependent variable is:
> In January 2012, the New York Times published a story called Twin Births in the U.S., Like Never Before, in which they reported a 76 percent increase in the rate of twin births from 1980 to 2009. Here are the numbers of twin births each year (per 1000 li
> How are a company profits related to its sales? Let examine data from 71 large U.S. corporations. All amounts are in millions of dollars. 1. Histograms of Profits and Sales and histograms of the logarithms of Profits and Sales are seen below. Why are the
> The Minnesota Department of Transportation hoped that they could measure the weights of big trucks without actually stopping the vehicles by using a newly developed weigh-in-motion scale. After installation of the scale, a study was conducted to find out
> Here are the average weights of the football team for the University of Texas for various years in the 20th century. 1. Fit a straight line to the relationship of Weight by Year for Texas football players. 2. According to these models, in what year will
> Here is a histogram of the assets (in millions of dollars) of 79 companies chosen from the Forbes list of the nation top corporations: (Data in Companies) 1. What aspect of this distribution makes it difficult to summarize, or to discuss, center and spre
> Find the predicted value of y, using each model for x=10. 1. y^=2+0.8 ln x 2. log y^=50.23x 3. 1y^=17.1 1.66x
> The Sears Cup was established in 1993 to honor institutions that maintain a broad-based athletic program, achieving success in many sports, both men and women. In the years following its Division III inception in 1995, the cup was won by Williams College
> Instead of Age, perhaps the Size of the vineyard (in acres) is associated with the price of the wines. Look at the scatterplot: 1. Do you see any evidence of an association? 2. What concern do you have about this scatterplot? 3. If the red + data point i
> Are people who use tobacco products more likely to consume alcohol? Here are data on household spending (in pounds) taken by the British government on 11 regions in Great Britain. Do tobacco and alcohol spending appear to be related? What questions do yo
> Since clean-air regulations have dictated the use of unleaded gasoline, the supply of leaded gas in New York state has diminished. The following table was given on the August 2001 New York State Math B exam, a statewide achievement test for high school s
> Does how long toddlers sit at the lunch table help predict how much they eat? The table and graph show the number of minutes the kids stayed at the table and the number of calories they consumed. Create and interpret a model for these data.
> Twins are often born at less than 9 months gestation. The graph from the Journal of the American Medical Association (JAMA) shows the rate of preterm twin births in the United States over the past 20 years. In this study, JAMA categorized mothers by the
> Consider the association between a student score on a French vocabulary test and the weight of the student. What direction and strength of correlation would you expect in each of the following situations? Explain. 1. The students are all in third grade.
> Here are the summary statistics for the Olympic jumps displayed in the previous exercise. 1. Write the equation of the line of regression for estimating High Jump from Long Jump. 2. Interpret the slope of the line. 3. In a year when the long jump is 8.9
> How are Olympic performances in various events related? The plot shows winning long-jump and high-jump distances, in meters, for the Summer Olympics from 1912 through 2016: 1. Describe the association. 2. Do long-jump performances somehow influence the h
> The September 1998 issue of the American Psychologist published an article by Kraut et al. that reported on an experiment examining the social and psychological impact of the Internet on 169 people in 73 households during their first 1 to 2 years online.
> Summary statistics for the data relating the Latitude and average January temperature for 55 large U.S. cities are given below. 1. What percent of the variation in January Temperature can be explained by variation in Latitude? 2. What is indicated by the
> The study of U.S. cities in Exercise R2.29 found the mean January Temperature (degrees Fahrenheit), Altitude (feet above sea level), and Latitude (degrees north of the equator) for 55 cities. Here the correlation matrix: 1. Which seems to be more useful
> Here are the scatterplot and regression analysis for Case Prices of 36 wines from vineyards in the Finger Lakes region of New York State and the Ages of the vineyards. (Data in Vineyards full) 1. Does it appear that vineyards in business longer get highe
> Data from 50 large U.S. cities show the mean January Temperature and the Latitude. Describe what you see in the scatterplot.
> It commonly believed that people use tips to reward good service. A researcher for the hospitality industry examined tips and ratings of service quality from 2645 dining parties at 21 different restaurants. The correlation between ratings of service and
> The downward trend in smoking you saw in the last exercise is good news for the health of babies, but will it ever stop? 1. Explain why you can’t use the linear model you created in Exercise R2.26 to see when smoking during pregnancy will cease altogethe
> The Child Trends Data Bank monitors issues related to children. The table shows a 50-state average of the percent of expectant mothers who smoked cigarettes during their pregnancies. 1. Create a scatterplot and describe the trend you see. 2. Find the cor
> An electronics website collects data on the size of new HD flat-panel televisions (measuring the diagonal of the screen in inches) to predict the cost (in hundreds of dollars). Which of these is most likely to be the slope of the regression line: 0.03, 0
> In the last exercise, you saw that the linear model had some deficiencies. Let create a better model. 1. Perhaps the cross-sectional area of a tree would be a better predictor of its age. Since area is measured in square units, try re-expressing the data
> A consumer organization wants to compare gas mileage figures for several models of cars made in the United States with autos manufactured in other countries. The data for a random sample of cars classified as midsize are found in the file MPG 2016. 1. Cr
> One can determine how old a tree is by counting its rings, but that requires either cutting the tree down or extracting a sample from the tree core. Can we estimate the tree age simply from its diameter? A forester measured 27 trees of the same species t
> The ranges inhabited by the Indian gharial crocodile and the Australian saltwater crocodile overlap in Bangladesh. Suppose a very large crocodile skeleton is found there, and we wish to determine the species of the animal. Wildlife scientists have measur
> There is evidence that eruptions of Old Faithful can best be predicted by knowing the duration of the previous eruption. 1. Describe what you see in the scatterplot of Intervals between eruptions vs. Duration of the previous eruption. 2. Write the equati
> Although some women are colorblind, this condition is found primarily in men. Why is it wrong to say there a strong correlation between Sex and Colorblindness?
> Are good grades in high school associated with family togetherness? A random sample of 142 high school students was asked how many meals per week their families ate together. Their responses produced a mean of 3.78 meals per week, with a standard deviati
> Here is a scatterplot of the residuals from the regression in Exercise R2.18: 1. Does the residual plot suggest that the regression conditions were satisfied? Explain. In the United States, fuel efficiency is usually measured as we did here, in miles per
> Consider a regression to predict the fuel efficiency (as miles per gallon, MPG) of the cars in the Cars data file. Here is one regression model using the Weight and the Drive Ratio: Response variable is: MPG R-squared = 89.5% s = 2.186 1. What is the int
> Can we predict the Horsepower of the engine that manufacturers will put in a car by knowing the Weight of the car? Here are the regression analysis and residuals plot: Dependent variable is: Horsepower R-squared = 84.1% 1. Write the equation of the regre
> Look again at the correlation table for cars in the previous exercise. 1. Which two variables in the table exhibit the strongest association? 2. Is that strong association necessarily cause-and-effect? Offer at least two explanations why that association
> What factor most explains differences in Fuel Efficiency among cars? Below is a correlation matrix exploring that relationship for the car Weight (1000 lb), Horsepower, Displacement, and number of Cylinders. (Data in Cars) 1. Which factor seems most stro
> A study that examined the health risks of smoking measured the cholesterol levels of people who had smoked for at least 25 years and people of similar ages who had smoked for no more than 5 years and then stopped. Create appropriate graphical displays fo
> One Thursday, researchers gave students enrolled in a section of basic Spanish a set of 50 new vocabulary words to memorize. On Friday, the students took a vocabulary test. When they returned to class the following Monday, they were retested without adva
> Highway planners investigated the relationship between traffic Density (number of automobiles per mile) and the average Speed of the traffic on a moderately large city thoroughfare. The data were collected at the same location at 10 different times over
> A statistics instructor created a linear regression equation to predict students final exam scores from their midterm exam scores. The regression equation was Fin=10+0.9 Mid. 1. If Susan scored a 70 on the midterm, what did the instructor predict for her
> Exercise R2.9 fit a regression model to the relationship between BCI and pH in streams sampled in the Adirondack Mountains. More variables are available. For example, scientists also recorded the water hardness. Here a new model: Response variable is: BC
> We looked at the data on life expectancy in different countries as they related to the (square root of the) number of doctors and to the (square root of the) number of TVs. Here a regression using both variables to predict life expectancy: Response varia
> In Chapter 8 we learned about the extraordinary depth and duration of the dives taken by penguins. In that chapter we modeled We can model a re-expression of Heart rate with the Duration (min) of dives. The data also include the depth of each dive. Here
> For the real estate data of the previous exercise, consider the value of the number of bedrooms in modeling the price of a home. The correlation between Price and Bedrooms is 0.116. Here is a regression model: Response variable is: Price R-squared = 14.6
> As a class project, students in a large statistics class collected publicly available information on recent home sales in their hometowns. There are 894 properties. Important predictors of the price of a home are its living area (sq ft) and the number of
> Continue your analysis of the manatee situation from Exercise R2.8. 1. Create a linear model of the association between Manatee Deaths and Powerboat Registrations. 2. Interpret the slope of your model. 3. Interpret the y-intercept of your model. 4. Which
> US News and World Report publishes a special issue on many U.S. colleges and universities. The scatterplots have Student/Faculty Ratio (number of students per faculty member) for the colleges and universities on the y-axes plotted against 4 other variabl
> Engineers at a computer production plant tested two methods for accuracy in drilling holes into a PC board. They tested how fast they could set the drilling machine by running 10 boards at each of two different speeds. To assess the results, they measure
> A start-up company has developed an improved electronic chip for use in laboratory equipment. The company needs to project the manufacturing cost, so it develops a spreadsheet model that takes into account the purchase of production equipment, overhead,
> Most water tanks have a drain plug so that the tank may be emptied when it to be moved or repaired. How long it takes a certain size of tank to drain depends on the size of the plug, as shown in the table. Create a model.
> The dataset Movies 06-15 introduced in the Chapter 3 exercises includes the distributor, number of tickets sold, and gross revenue in addition to the MPAA rating and the genre for each of the 10 years 2006 to 2015. Investigate the associations among the
> The Student survey dataset introduced in the Chapter 3 exercises includes responses to 13 questions. Investigate the associations among the variables that you find interesting. Write a short report on what you discover. Be sure to include summary statist
> The Titanic dataset includes more variables than just those discussed in Chapter 2. Others include such variables as the crew job and where each person boarded the ship. Stories, biographies, and pictures can be found on this site: www.encyclopedia-titan
> The Hopkins Forest dataset includes all 24 weather variables reported by the researchers. Many of the variables (e.g., temperature, relative humidity, solar radiation, wind) are reported as daily averages, minima and maxima. Using any of these variables,
> Is the mean amount of salt higher in menu items that contain meat? 1. Compare the sodium content of the meat and non-meat items with displays and summary statistics. 2. By shuffling the variable Meat 1000 times, investigate whether the mean sodium conten
> Consumer groups are concerned that cereals with a high sugar content (usually designed for children) are placed just where kids are most likely to see them in the middle shelf of the supermarket. The variable Middle indicates whether the cereal is locate
> Here is a stem-and-leaf display showing profits (in $M) for 30 of the 500 largest global corporations (as measured by revenue). The stems are split; each stem represents a span of 5000 ($M), from a profit of 43,000 ($M) to a loss of 7000 ($M). Use the st
> A company that markets build-it-yourself furniture sells a computer desk that is advertised with the claim less than an hour to assemble. However, through postpurchase surveys the company has learned that only 25% of its customers succeeded in building t
> In an experiment to determine whether seeding clouds with silver iodide increases rainfall, 52 clouds were randomly assigned to be seeded or not. The amount of rain they generated was then measured (in acre-feet). Here are the summary statistics: 1. Whic
> The Bicycle Helmet Safety Institute website includes a report on the number of bicycle fatalities per year in the United States. The table below shows the counts for the years 1994 2015. 1. What are the W for these data? 2. Display the data in a stem-and
> Consider again the Pew Research Center results on age and political party in Exercise R1.33 . 1. What is the marginal distribution of party affiliation? 2. Create segmented bar graphs displaying the conditional distribution of party affiliation for each
> According to the Bureau of Labor Statistics, the mean hourly wage for Chief Executives in 2009 was $80.43 and the median hourly wage was $77.27. By contrast, for General and Operations Managers, the mean hourly wage was $53.15 and the median was $44.55.
> The Pew Research Center conducts surveys regularly asking respondents which political party they identify with or lean toward. Among their results is the following table relating preferred political party and age. 1. What percent of people surveyed were
> Horsepower is another measure commonly used to describe auto engines. Here are the summary statistics and histogram displaying horse powers of the same group of 38 cars discussed in Exercise R1.31 1. Describe the shape, center, and spread of this distrib
> One measure of the size of an automobile engine is its displacement, the total volume (in liters or cubic inches) of its cylinders. Summary statistics for several models of new cars are shown. These displacements were measured in cubic inches. 1. How man
> Consider again the data on birth order and college majors in Exercise R1.28 1. What is the marginal distribution of majors? 2. What is the conditional distribution of majors for the oldest children? 3. What is the conditional distribution of majors for