Consumer groups are concerned that cereals with a high sugar content (usually designed for children) are placed just where kids are most likely to see them in the middle shelf of the supermarket. The variable Middle indicates whether the cereal is located on shelf 2, the middle shelf. 1. Compare the sugar content of the cereals on the middle shelf versus those on shelves 1 and 3 with displays and summary statistics. 2. By shuffling the variable Middle 1000 times, investigate whether the mean sugar content difference between the two groups of cereals could have arisen by chance. What do you conclude?
> Instead of Age, perhaps the Size of the vineyard (in acres) is associated with the price of the wines. Look at the scatterplot: 1. Do you see any evidence of an association? 2. What concern do you have about this scatterplot? 3. If the red + data point i
> Are people who use tobacco products more likely to consume alcohol? Here are data on household spending (in pounds) taken by the British government on 11 regions in Great Britain. Do tobacco and alcohol spending appear to be related? What questions do yo
> Since clean-air regulations have dictated the use of unleaded gasoline, the supply of leaded gas in New York state has diminished. The following table was given on the August 2001 New York State Math B exam, a statewide achievement test for high school s
> Does how long toddlers sit at the lunch table help predict how much they eat? The table and graph show the number of minutes the kids stayed at the table and the number of calories they consumed. Create and interpret a model for these data.
> Twins are often born at less than 9 months gestation. The graph from the Journal of the American Medical Association (JAMA) shows the rate of preterm twin births in the United States over the past 20 years. In this study, JAMA categorized mothers by the
> Consider the association between a student score on a French vocabulary test and the weight of the student. What direction and strength of correlation would you expect in each of the following situations? Explain. 1. The students are all in third grade.
> Here are the summary statistics for the Olympic jumps displayed in the previous exercise. 1. Write the equation of the line of regression for estimating High Jump from Long Jump. 2. Interpret the slope of the line. 3. In a year when the long jump is 8.9
> How are Olympic performances in various events related? The plot shows winning long-jump and high-jump distances, in meters, for the Summer Olympics from 1912 through 2016: 1. Describe the association. 2. Do long-jump performances somehow influence the h
> American League baseball teams play their games with the designated hitter rule, meaning that pitchers do not bat. The league believes that replacing the pitcher, typically a weak hitter, with another player in the batting order produces more runs and ge
> The September 1998 issue of the American Psychologist published an article by Kraut et al. that reported on an experiment examining the social and psychological impact of the Internet on 169 people in 73 households during their first 1 to 2 years online.
> Summary statistics for the data relating the Latitude and average January temperature for 55 large U.S. cities are given below. 1. What percent of the variation in January Temperature can be explained by variation in Latitude? 2. What is indicated by the
> The study of U.S. cities in Exercise R2.29 found the mean January Temperature (degrees Fahrenheit), Altitude (feet above sea level), and Latitude (degrees north of the equator) for 55 cities. Here the correlation matrix: 1. Which seems to be more useful
> Here are the scatterplot and regression analysis for Case Prices of 36 wines from vineyards in the Finger Lakes region of New York State and the Ages of the vineyards. (Data in Vineyards full) 1. Does it appear that vineyards in business longer get highe
> Data from 50 large U.S. cities show the mean January Temperature and the Latitude. Describe what you see in the scatterplot.
> It commonly believed that people use tips to reward good service. A researcher for the hospitality industry examined tips and ratings of service quality from 2645 dining parties at 21 different restaurants. The correlation between ratings of service and
> The downward trend in smoking you saw in the last exercise is good news for the health of babies, but will it ever stop? 1. Explain why you can’t use the linear model you created in Exercise R2.26 to see when smoking during pregnancy will cease altogethe
> The Child Trends Data Bank monitors issues related to children. The table shows a 50-state average of the percent of expectant mothers who smoked cigarettes during their pregnancies. 1. Create a scatterplot and describe the trend you see. 2. Find the cor
> An electronics website collects data on the size of new HD flat-panel televisions (measuring the diagonal of the screen in inches) to predict the cost (in hundreds of dollars). Which of these is most likely to be the slope of the regression line: 0.03, 0
> In the last exercise, you saw that the linear model had some deficiencies. Let create a better model. 1. Perhaps the cross-sectional area of a tree would be a better predictor of its age. Since area is measured in square units, try re-expressing the data
> A consumer organization wants to compare gas mileage figures for several models of cars made in the United States with autos manufactured in other countries. The data for a random sample of cars classified as midsize are found in the file MPG 2016. 1. Cr
> One can determine how old a tree is by counting its rings, but that requires either cutting the tree down or extracting a sample from the tree core. Can we estimate the tree age simply from its diameter? A forester measured 27 trees of the same species t
> The ranges inhabited by the Indian gharial crocodile and the Australian saltwater crocodile overlap in Bangladesh. Suppose a very large crocodile skeleton is found there, and we wish to determine the species of the animal. Wildlife scientists have measur
> There is evidence that eruptions of Old Faithful can best be predicted by knowing the duration of the previous eruption. 1. Describe what you see in the scatterplot of Intervals between eruptions vs. Duration of the previous eruption. 2. Write the equati
> Although some women are colorblind, this condition is found primarily in men. Why is it wrong to say there a strong correlation between Sex and Colorblindness?
> Are good grades in high school associated with family togetherness? A random sample of 142 high school students was asked how many meals per week their families ate together. Their responses produced a mean of 3.78 meals per week, with a standard deviati
> Here is a scatterplot of the residuals from the regression in Exercise R2.18: 1. Does the residual plot suggest that the regression conditions were satisfied? Explain. In the United States, fuel efficiency is usually measured as we did here, in miles per
> Consider a regression to predict the fuel efficiency (as miles per gallon, MPG) of the cars in the Cars data file. Here is one regression model using the Weight and the Drive Ratio: Response variable is: MPG R-squared = 89.5% s = 2.186 1. What is the int
> Can we predict the Horsepower of the engine that manufacturers will put in a car by knowing the Weight of the car? Here are the regression analysis and residuals plot: Dependent variable is: Horsepower R-squared = 84.1% 1. Write the equation of the regre
> Look again at the correlation table for cars in the previous exercise. 1. Which two variables in the table exhibit the strongest association? 2. Is that strong association necessarily cause-and-effect? Offer at least two explanations why that association
> What factor most explains differences in Fuel Efficiency among cars? Below is a correlation matrix exploring that relationship for the car Weight (1000 lb), Horsepower, Displacement, and number of Cylinders. (Data in Cars) 1. Which factor seems most stro
> A study that examined the health risks of smoking measured the cholesterol levels of people who had smoked for at least 25 years and people of similar ages who had smoked for no more than 5 years and then stopped. Create appropriate graphical displays fo
> One Thursday, researchers gave students enrolled in a section of basic Spanish a set of 50 new vocabulary words to memorize. On Friday, the students took a vocabulary test. When they returned to class the following Monday, they were retested without adva
> Highway planners investigated the relationship between traffic Density (number of automobiles per mile) and the average Speed of the traffic on a moderately large city thoroughfare. The data were collected at the same location at 10 different times over
> A statistics instructor created a linear regression equation to predict students final exam scores from their midterm exam scores. The regression equation was Fin=10+0.9 Mid. 1. If Susan scored a 70 on the midterm, what did the instructor predict for her
> Exercise R2.9 fit a regression model to the relationship between BCI and pH in streams sampled in the Adirondack Mountains. More variables are available. For example, scientists also recorded the water hardness. Here a new model: Response variable is: BC
> We looked at the data on life expectancy in different countries as they related to the (square root of the) number of doctors and to the (square root of the) number of TVs. Here a regression using both variables to predict life expectancy: Response varia
> In Chapter 8 we learned about the extraordinary depth and duration of the dives taken by penguins. In that chapter we modeled We can model a re-expression of Heart rate with the Duration (min) of dives. The data also include the depth of each dive. Here
> For the real estate data of the previous exercise, consider the value of the number of bedrooms in modeling the price of a home. The correlation between Price and Bedrooms is 0.116. Here is a regression model: Response variable is: Price R-squared = 14.6
> As a class project, students in a large statistics class collected publicly available information on recent home sales in their hometowns. There are 894 properties. Important predictors of the price of a home are its living area (sq ft) and the number of
> Continue your analysis of the manatee situation from Exercise R2.8. 1. Create a linear model of the association between Manatee Deaths and Powerboat Registrations. 2. Interpret the slope of your model. 3. Interpret the y-intercept of your model. 4. Which
> US News and World Report publishes a special issue on many U.S. colleges and universities. The scatterplots have Student/Faculty Ratio (number of students per faculty member) for the colleges and universities on the y-axes plotted against 4 other variabl
> Engineers at a computer production plant tested two methods for accuracy in drilling holes into a PC board. They tested how fast they could set the drilling machine by running 10 boards at each of two different speeds. To assess the results, they measure
> A start-up company has developed an improved electronic chip for use in laboratory equipment. The company needs to project the manufacturing cost, so it develops a spreadsheet model that takes into account the purchase of production equipment, overhead,
> Most water tanks have a drain plug so that the tank may be emptied when it to be moved or repaired. How long it takes a certain size of tank to drain depends on the size of the plug, as shown in the table. Create a model.
> The dataset Movies 06-15 introduced in the Chapter 3 exercises includes the distributor, number of tickets sold, and gross revenue in addition to the MPAA rating and the genre for each of the 10 years 2006 to 2015. Investigate the associations among the
> The Student survey dataset introduced in the Chapter 3 exercises includes responses to 13 questions. Investigate the associations among the variables that you find interesting. Write a short report on what you discover. Be sure to include summary statist
> The Titanic dataset includes more variables than just those discussed in Chapter 2. Others include such variables as the crew job and where each person boarded the ship. Stories, biographies, and pictures can be found on this site: www.encyclopedia-titan
> The Hopkins Forest dataset includes all 24 weather variables reported by the researchers. Many of the variables (e.g., temperature, relative humidity, solar radiation, wind) are reported as daily averages, minima and maxima. Using any of these variables,
> Is the mean amount of salt higher in menu items that contain meat? 1. Compare the sodium content of the meat and non-meat items with displays and summary statistics. 2. By shuffling the variable Meat 1000 times, investigate whether the mean sodium conten
> Here is a stem-and-leaf display showing profits (in $M) for 30 of the 500 largest global corporations (as measured by revenue). The stems are split; each stem represents a span of 5000 ($M), from a profit of 43,000 ($M) to a loss of 7000 ($M). Use the st
> A company that markets build-it-yourself furniture sells a computer desk that is advertised with the claim less than an hour to assemble. However, through postpurchase surveys the company has learned that only 25% of its customers succeeded in building t
> In an experiment to determine whether seeding clouds with silver iodide increases rainfall, 52 clouds were randomly assigned to be seeded or not. The amount of rain they generated was then measured (in acre-feet). Here are the summary statistics: 1. Whic
> The Bicycle Helmet Safety Institute website includes a report on the number of bicycle fatalities per year in the United States. The table below shows the counts for the years 1994 2015. 1. What are the W for these data? 2. Display the data in a stem-and
> Consider again the Pew Research Center results on age and political party in Exercise R1.33 . 1. What is the marginal distribution of party affiliation? 2. Create segmented bar graphs displaying the conditional distribution of party affiliation for each
> According to the Bureau of Labor Statistics, the mean hourly wage for Chief Executives in 2009 was $80.43 and the median hourly wage was $77.27. By contrast, for General and Operations Managers, the mean hourly wage was $53.15 and the median was $44.55.
> The Pew Research Center conducts surveys regularly asking respondents which political party they identify with or lean toward. Among their results is the following table relating preferred political party and age. 1. What percent of people surveyed were
> Horsepower is another measure commonly used to describe auto engines. Here are the summary statistics and histogram displaying horse powers of the same group of 38 cars discussed in Exercise R1.31 1. Describe the shape, center, and spread of this distrib
> One measure of the size of an automobile engine is its displacement, the total volume (in liters or cubic inches) of its cylinders. Summary statistics for several models of new cars are shown. These displacements were measured in cubic inches. 1. How man
> Consider again the data on birth order and college majors in Exercise R1.28 1. What is the marginal distribution of majors? 2. What is the conditional distribution of majors for the oldest children? 3. What is the conditional distribution of majors for
> Researchers for the Herbal Medicine Council collected information on people experiences with a new herbal remedy for colds. They went to a store that sold natural health products. There they asked 100 customers whether they had taken the cold remedy and,
> Is your birth order related to your choice of major? A statistics professor at a large university polled his students to find out what their majors were and what position they held in the family birth order. The results are summarized in the table. 1. Wh
> Here are the number of pieces of mail received at a school office for 36 days. 1. Plot these data. 2. Find appropriate summary statistics. 3. Write a brief description of the school mail deliveries. 4. What percent of the days actually lie within one sta
> A class of fourth graders takes a diagnostic reading test, and the scores are reported by reading grade level. The 5-number summaries for the 14 boys and 11 girls are shown: 1. Which group had the highest score? 2. Which group had the greater range? 3. W
> Is it a good idea to listen to music when studying for a big test? In a study conducted by some statistics students, 62 people were randomly assigned to listen to rap music, Mozart, or no music while attempting to memorize objects pictured on a page. The
> Avoiding an accident when driving can depend on reaction time. That time, measured from the moment the driver first sees the danger until he or she steps on the brake pedal, is thought to follow a Normal model with a mean of 1.5 seconds and a standard de
> Babe Ruth was the first great slugger in baseball. His record of 60 home runs in one season held for 34 years until Roger Maris hit 61 in 1961. Mark McGwire (with the aid of steroids) set a new standard of 70 in 1998. Listed below are the home run totals
> A study in South Africa focusing on the impact of health insurance identified 1590 children at birth and then sought to conduct follow-up health studies 5 years later. Only 416 of the original group participated in the 5-year follow-up study. This made r
> The times of skaters in the qualifying heats for the women short track race at the 2018 Olympics in PyeongChang are given in the table below. 1. The mean finishing time was 45.075 seconds, with a standard deviation of 4.50 seconds. If the Normal model is
> Is the Statue of Liberty nose too long? Her nose measures 4²6³, but she is a large statue, after all. Her arm is 42 feet long. That means her arm is 42/4.5=9.3 times as long as her nose. Is that a reasonable ratio? Shown in the ta
> The National Highway Traffic Safety Administration reported that there were 3206 fatal accidents involving drivers between the ages of 15 and 19 years old the previous year, of which 65.5% involved male drivers. Of the male drivers, 18.4% involved drinki
> Does the duration of an eruption have an effect on the length of time that elapses before the next eruption? 1. The histogram below shows the duration (in minutes) of those 222 eruptions. Describe this distribution. 2. Explain why it is not appropriate t
> It is a common belief that Yellowstone most famous geyser erupts once an hour at very predictable intervals. The histogram below shows the time gaps (in minutes) between 222 successive eruptions. Describe this distribution.
> Average daily temperatures in January and July for 60 large U.S. cities are graphed in the histograms below. (Data in City climate) 1. What aspect of these histograms makes it difficult to compare the distributions? 2. What differences do you see between
> The Framingham Heart Study recorded the cholesterol levels of more than 1400 participants. (Data in Framingham) Here is an ogive of the distribution of these cholesterol measures. (An ogive shows the percentage of cases at or below a certain value.) Cons
> Which of these scatterplots show 1. little or no association? 2. a negative association? 3. a linear association? 4. a moderately strong association? 5. a very strong association?
> The dataset from England and Wales also notes for each town whether it was south or north of Derby. Here are some summary statistics and a comparative boxplot for the two regions. 1. What is the overall mean mortality rate for the two regions? 2. Do you
> In an investigation of environmental causes of disease, data were collected on the annual mortality rate (deaths per 100,000) for males in 61 large towns in England and Wales. In addition, the water hardness was recorded as the calcium concentration (par
> Progressive Insurance asked customers who had been involved in auto accidents how far they were from home when the accident happened. The data are summarized in the table. 1. Create an appropriate graph of these data. 2. Do these data indicate that drivi
> You pick a card from a standard deck and record its denomination (7, say) and its suit (maybe spades). 1. Is the variable suit categorical or quantitative? 2. Name a game you might be playing for which you would consider the variable denomination to be c
> A study by the Pew Internet & American Life Project found that 78% of U.S. residents over 16 years old read a book in the past 12 months. They also found that 21% had read an e-book using a reader or computer during that period. A newspaper reporting on
> One Thursday, researchers gave students enrolled in a section of basic Spanish a set of 50 new vocabulary words to memorize. On Friday, the students took a vocabulary test. When they returned to class the following Monday, they were reteste without advan
> As part of the course work, a class at an upstate NY college collects data on streams each year. Students record a number of biological, chemical, and physical variables, including the stream name, the substrate of the stream (limestone (L), shale (S), o
> A credit card bank is investigating the incidence of fraudulent card use. The bank suspects that the type of product bought may provide clues to the fraud. To examine this situation, the bank looks at the North American Industry Classification System (NA
> Based on long-term investigation, researchers have suggested that the acidity (pH) of rainfall in the Shenandoah Mountains can be described by the Normal model N(4.9,0.6). 1. Draw and carefully label the model. 2. What percent of storms produce rainfall
> Public relations staff members at State U phoned 850 local residents. After identifying themselves, the callers asked the survey participants their ages, whether they had attended college, and whether they had a favorable opinion of the university. The o
> How fast do horses run? Kentucky Derby winners run well over 30 miles per hour, as shown in this graph. The graph shows the percentage of Derby winners that have run slower than each given speed. Note that few have won running less than 33 miles per hour
> Clarksburg Bakery is trying to predict how many loaves to bake. In the past 100 days, they have sold between 95 and 140 loaves per day. Here is a histogram of the number of loaves they sold for the past 100 days. 1. Describe the distribution. 2. Which sh
> Facebook uploads more than 350 million photos every day onto its servers. For this collection, describe the Who and the What.
> The National Center for Health Statistics (NCHS) conducts an extensive survey consisting of an interview and medical examination with a representative sample of about 5000 people a year. The interview includes demographic, socioeconomic, dietary, and oth
> The website www.nobelprize.org allows you to look up all the Nobel prizes awarded in any year. The data are not listed in a table. Rather you drag a slider to the year and see a list of the awardees for that year. Describe the Who in this scenario.
> Sports announcers love to quote statistics. During the Super Bowl, they particularly love to announce when a record has been broken. They might have a list of all Super Bowl games, along with the scores of each team, total scores for the two teams, margi
> Satellites send back nearly continuous data on the earth land masses, oceans, and atmosphere from space. How might researchers use this information in both the short and long terms to help study changes in the earth climate?
> Sensors in parking lots are able to detect and communicate when spaces are filled in a large covered parking garage next to an urban shopping mall. How might the owners of the parking garage use this information both to attract customers and to help the
> Online retailers such as Amazon.com keep data on products that customers buy, and even products they look at. What does Amazon hope to gain from such information?
> Many grocery store chains offer customers a card they can scan when they check out and offer discounts to people who do so. To get the card, customers must give information, including a mailing address and e-mail address. The actual purpose is not to rew
> Here is the ANOVA table for the cookie experiment of Exercise 2 along with an interaction plot. What does the interaction term say about the cookie recipes?
> Here are the summary statistics for Verbal SAT scores for a high school graduating class: 1. Create side-by-side boxplots comparing the scores of boys and girls as best you can from the information given. 2. Write a brief report on these results. Be sure
> Here is an ANOVA table with an interaction term and the corresponding interaction plot for the TV watching data of Exercise 1 . What does the interaction term mean here?
> The student performing the chocolate chip cookie experiment of Exercise 2 planned to analyze his results with an Analysis of Variance on two factors. Here are some displays. Do you think the assumptions for ANOVA are satisfied?
> The TV watching study of Exercise 1 was collected as a survey of students at a small college. Do the assumptions of ANOVA appear to be met? Here are some displays to help in your decision: