2.99 See Answer

Question: Joe wants to impress his boss. He


Joe wants to impress his boss. He builds a regression model to predict sales that has 20 predictors and an R2 of 80%. Sally builds a competing model with only 5 predictors, but an R2 of only 78%. Which model is likely to be most useful for understanding the drivers of sales? How could the boss tell? Explain.


> A study in South Africa focusing on the impact of health insurance identified 1590 children at birth and then sought to conduct follow-up health studies 5 years later. Only 416 of the original group participated in the 5-year follow-up study. This made r

> The times of skaters in the qualifying heats for the women short track race at the 2018 Olympics in PyeongChang are given in the table below. 1. The mean finishing time was 45.075 seconds, with a standard deviation of 4.50 seconds. If the Normal model is

> Is the Statue of Liberty nose too long? Her nose measures 4²6³, but she is a large statue, after all. Her arm is 42 feet long. That means her arm is 42/4.5=9.3 times as long as her nose. Is that a reasonable ratio? Shown in the ta

> The National Highway Traffic Safety Administration reported that there were 3206 fatal accidents involving drivers between the ages of 15 and 19 years old the previous year, of which 65.5% involved male drivers. Of the male drivers, 18.4% involved drinki

> Does the duration of an eruption have an effect on the length of time that elapses before the next eruption? 1. The histogram below shows the duration (in minutes) of those 222 eruptions. Describe this distribution. 2. Explain why it is not appropriate t

> It is a common belief that Yellowstone most famous geyser erupts once an hour at very predictable intervals. The histogram below shows the time gaps (in minutes) between 222 successive eruptions. Describe this distribution.

> Average daily temperatures in January and July for 60 large U.S. cities are graphed in the histograms below. (Data in City climate) 1. What aspect of these histograms makes it difficult to compare the distributions? 2. What differences do you see between

> The Framingham Heart Study recorded the cholesterol levels of more than 1400 participants. (Data in Framingham) Here is an ogive of the distribution of these cholesterol measures. (An ogive shows the percentage of cases at or below a certain value.) Cons

> Which of these scatterplots show 1. little or no association? 2. a negative association? 3. a linear association? 4. a moderately strong association? 5. a very strong association?

> The dataset from England and Wales also notes for each town whether it was south or north of Derby. Here are some summary statistics and a comparative boxplot for the two regions. 1. What is the overall mean mortality rate for the two regions? 2. Do you

> In an investigation of environmental causes of disease, data were collected on the annual mortality rate (deaths per 100,000) for males in 61 large towns in England and Wales. In addition, the water hardness was recorded as the calcium concentration (par

> Progressive Insurance asked customers who had been involved in auto accidents how far they were from home when the accident happened. The data are summarized in the table. 1. Create an appropriate graph of these data. 2. Do these data indicate that drivi

> You pick a card from a standard deck and record its denomination (7, say) and its suit (maybe spades). 1. Is the variable suit categorical or quantitative? 2. Name a game you might be playing for which you would consider the variable denomination to be c

> A study by the Pew Internet & American Life Project found that 78% of U.S. residents over 16 years old read a book in the past 12 months. They also found that 21% had read an e-book using a reader or computer during that period. A newspaper reporting on

> One Thursday, researchers gave students enrolled in a section of basic Spanish a set of 50 new vocabulary words to memorize. On Friday, the students took a vocabulary test. When they returned to class the following Monday, they were reteste without advan

> As part of the course work, a class at an upstate NY college collects data on streams each year. Students record a number of biological, chemical, and physical variables, including the stream name, the substrate of the stream (limestone (L), shale (S), o

> A credit card bank is investigating the incidence of fraudulent card use. The bank suspects that the type of product bought may provide clues to the fraud. To examine this situation, the bank looks at the North American Industry Classification System (NA

> Based on long-term investigation, researchers have suggested that the acidity (pH) of rainfall in the Shenandoah Mountains can be described by the Normal model N(4.9,0.6). 1. Draw and carefully label the model. 2. What percent of storms produce rainfall

> Public relations staff members at State U phoned 850 local residents. After identifying themselves, the callers asked the survey participants their ages, whether they had attended college, and whether they had a favorable opinion of the university. The o

> How fast do horses run? Kentucky Derby winners run well over 30 miles per hour, as shown in this graph. The graph shows the percentage of Derby winners that have run slower than each given speed. Note that few have won running less than 33 miles per hour

> Clarksburg Bakery is trying to predict how many loaves to bake. In the past 100 days, they have sold between 95 and 140 loaves per day. Here is a histogram of the number of loaves they sold for the past 100 days. 1. Describe the distribution. 2. Which sh

> Facebook uploads more than 350 million photos every day onto its servers. For this collection, describe the Who and the What.

> The National Center for Health Statistics (NCHS) conducts an extensive survey consisting of an interview and medical examination with a representative sample of about 5000 people a year. The interview includes demographic, socioeconomic, dietary, and oth

> The website www.nobelprize.org allows you to look up all the Nobel prizes awarded in any year. The data are not listed in a table. Rather you drag a slider to the year and see a list of the awardees for that year. Describe the Who in this scenario.

> Sports announcers love to quote statistics. During the Super Bowl, they particularly love to announce when a record has been broken. They might have a list of all Super Bowl games, along with the scores of each team, total scores for the two teams, margi

> Satellites send back nearly continuous data on the earth land masses, oceans, and atmosphere from space. How might researchers use this information in both the short and long terms to help study changes in the earth climate?

> Sensors in parking lots are able to detect and communicate when spaces are filled in a large covered parking garage next to an urban shopping mall. How might the owners of the parking garage use this information both to attract customers and to help the

> Online retailers such as Amazon.com keep data on products that customers buy, and even products they look at. What does Amazon hope to gain from such information?

> Many grocery store chains offer customers a card they can scan when they check out and offer discounts to people who do so. To get the card, customers must give information, including a mailing address and e-mail address. The actual purpose is not to rew

> Here is the ANOVA table for the cookie experiment of Exercise 2 along with an interaction plot. What does the interaction term say about the cookie recipes?

> Here are the summary statistics for Verbal SAT scores for a high school graduating class: 1. Create side-by-side boxplots comparing the scores of boys and girls as best you can from the information given. 2. Write a brief report on these results. Be sure

> Here is an ANOVA table with an interaction term and the corresponding interaction plot for the TV watching data of Exercise 1 . What does the interaction term mean here?

> The student performing the chocolate chip cookie experiment of Exercise 2 planned to analyze his results with an Analysis of Variance on two factors. Here are some displays. Do you think the assumptions for ANOVA are satisfied?

> The TV watching study of Exercise 1 was collected as a survey of students at a small college. Do the assumptions of ANOVA appear to be met? Here are some displays to help in your decision:

> A student performed an experiment to compare chocolate chip cookie recipes. He baked batches of cookies with different amounts of Sugar: 0.5, 0.375, and 0.25 cups, and with different kinds of Chips: milk, semisweet, and dark chocolate. Cookie quality was

> In the previous chapter we considered TV watching by male and female student athletes. In that example, we categorized the students into four groups, but now we have seen that these data could be analyzed with two factors, Sex and Athlete. Write the ANOV

> A bank is studying the time that it takes 6 of its tellers to serve an average customer. Customers line up in the queue and then go to the next available teller. Here is a boxplot of the last 200 customers and the times it took each teller: 1. What are t

> Here are case prices (in dollars) of wines produced by wineries along three of the Finger Lakes. 1. What null and alternative hypotheses would you test for these data? Talk about prices and location, not symbols. 2. Do the conditions for an ANOVA seem to

> Here are boxplots that show the relationship between the number of cylinders a car engine has and the car fuel economy for a sample of cars. 1. State the null and alternative hypotheses that you might consider for these data. 2. Do the conditions for an

> A student performed an experiment with three different grips to see what effect it might have on the distance of a backhanded Frisbee throw. She tried it with her normal grip, with one finger out, and with the Frisbee inverted. She measured in paces how

> To shorten the time it takes him to make his favorite pizza, a student designed an experiment to test the effect of sugar and milk on the activation times for baking yeast. Specifically, he tested four different recipes and measured how many seconds it t

> A student study of the effects of caffeine asked volunteers to take a memory test 2 hours after drinking soda. Some drank caffeine-free cola, some drank regular cola (with caffeine), and others drank a mixture of the two (getting a half-dose of caffeine)

> A student interested in improving her dart-throwing technique designs an experiment to test 4 different stances to see whether they affect her accuracy. After warming up for several minutes, she randomizes the order of the 4 stances, throws a dart at a t

> A student runs an experiment to study the effect of three different mufflers on gas mileage. He devises a system so that his Jeep Wagoneer uses gasoline from a one-liter container. He tests each muffler 8 times, carefully recording the number of miles he

> A figure skater tried various approaches to her Salchow jump in a designed experiment using 5 different places for her focus (arms, free leg, midsection, takeoff leg, and free). She tried each jump 6 times in random order, using two of her skating partne

> An intern from the marketing department at the Holes R Us online piercing salon has recently finished a study of the company 500 customers. He wanted to know whether the mean ZIP code of customers purchasing different products varied according to the las

> A survey of 1021 school-age children was conducted by randomly selecting children from several large urban elementary schools. Two of the questions concerned eye and hair color. In the survey, the following codes were used: The students analyzing the dat

> A researcher investigated four different word lists for use in hearing assessment. She wanted to know whether the lists were equally difficult to understand in the presence of a noisy background. To find out, she tested 96 subjects with normal hearing ra

> A student runs an experiment to test four different popcorn brands, recording the number of kernels left un-popped. She pops measured batches of each brand 4 times, using the same popcorn popper and randomizing the order of the brands. After collecting h

> In a regression to predict compensation of employees in a large firm, the predictors in the regression were Years with the Firm, Age, and Years of Experience. The coefficient of Age is negative and statistically significantly different from zero. Does th

> For each of the following cases, would your primary concern about them be that they had a large residual, large leverage, or likely large influence on the regression model? 1. In a regression to predict freshman grade point averages as part of the admiss

> Here are summary statistics for the sizes (in acres) of a collection of vineyards in the Finger Lakes region of New York State: Suppose you didn’t have access to the data. Answer the following questions from the summary statistics alone

> For each of the following cases, would your primary concern about them be that they had a large residual, large leverage, or likely large influence on the regression model? Explain your thinking. 1. In a regression to predict the construction cost of rol

> 1. Look up additional nutrition information about the BK items and combine a file holding that information with the existing BK data. 2. Define the new variable Fat/Carb as the ratio of Fat grams to Carbohydrate grams in each BK item

> 1. In the Burger King items data, use one of the variables to separate the items containing meat from the items that do not contain meat, and analyze those separately. 2. Combine data about McDonald menu items using the same variables with the data from

> Use the information in Exercise 1 to test the hypotheses H0: β1=0 vs. HA: β1‰ 0. What do you conclude about the relationship between earnings and SAT scores?

> Shoot to score, another one Returning to the results of Exercise 2, write a sentence to explain the meaning of the standard error of the slope of the regression line, SE(b1)=0.0125, and the corresponding P-value.

> Continuing with the regression of Exercise 1, write a sentence that explains the meaning of the standard error of the slope of the regression line, SE(b1)=1.545, and the corresponding P-value.

> Using the regression output from Exercise 2, identify the residual standard deviation and explain its meaning with a sentence in context.

> Using the regression output in Exercise 1, identify the residual standard deviation and explain what it means in the context of the problem.

> Discuss the assumptions and conditions necessary for proceeding with the regression analysis in Exercise 2. Do you think the conditions are satisfied?

> Discuss the assumptions and conditions necessary for proceeding with the regression analysis in Exercise 1. Do you think the conditions are satisfied?

> A survey of major universities asked what percentage of incoming freshmen usually graduate on time in 4 years. Use the summary statistics given to answer the questions that follow. 1. Would you describe this distribution as symmetric or skewed? Explain.

> A college hockey coach collected data from the 2016–2017 National Hockey League season. He hopes to convince his players that the number of shots taken has an effect on the number of goals scored. The coa

> The coach we’ve been following wants to predict how many goals each of his players will score this season. Explain why a model like the ones we’ve made won’t be very successful at doing that.

> Naturally, you would like to know what you are going to earn in the next few years. Explain why a regression model such as the ones we have found won’t do a very good job of such a prediction. (Sorry.)

> Continuing from Exercise 14, the coach responds to the players by claiming that shooting accuracy is more important than time on the ice. He adds Shoot% (% of shots on goal) to the model. Response variable is: Goals R squared=95.7% s=0.8850 with 654=61 d

> A second predictor in Exercise 13 improved the regression model of Exercise 1, so let try a third. Here a model with average ACT score of the entering class included: Response variable is: Earn R squared=36.5% s=5372 with 6874=683 degrees of freedom 1. T

> The players on the team in Exercise 2 point out to the coach that they can’t shoot if they are not on the ice. They add the variable TimeOnIce/Game (TOI/G) (in minutes per game) to the regression: (Reminder: if you are using the full da

> Continuing with the data from Exercise 1, here a regression with the percent of students who receive merit-based financial aid included in the model: Response variable is: Earn R squared=35.5% 1. Write the regression model. 2. What is the interpretation

> The coach in Exercise 2 found a 95% confidence interval for the slope of his regression line. Recall that he is trying to understand how the number of goals scored is related to shots taken. Interpret with a sentence the meaning of the interval 0.099267±

> Construct a 95% confidence interval for the slope of the regression line in Exercise 1. Interpret the meaning of the interval. Be sure to state it in the context of the data and the question about the data.

> What can the hockey coach in Exercise 2 conclude about shooting and scoring goals from the fact that the P-value < 0.0001 for the slope of the regression line? Write a sentence in context.

> A survey of 1021 school-age children was conducted by randomly selecting children from several large urban elementary schools. Two of the questions concerned eye and hair color. In the survey, the following codes were used: The statistics students analyz

> Does attending college pay back the investment? What factors predict higher earnings for graduates? Money magazine surveyed graduates, asking about their point of view of the colleges they had attended (Money Best Colleges at new.time.com/money/best-coll

> BCE Homer Iliad is an epic poem, compiled around 800 BCE, that describes several weeks of the last year of the 10-year siege of Troy (Ilion) by the Achaeans. The story centers on the rage of the great warrior Achilles. But it includes many details of inj

> For the data in Exercise 2, 1. Compute the standardized residual for each type of card. 2. Are any of these particularly large? (Compared to what?) 3. What does the answer to part b say about this new group of customers?

> For the data in Exercise 1, 1. Compute the standardized residual for each season. 2. Are any of these particularly large? (Compared to what?) 3. Why should you have anticipated the answer to part b?

> A market researcher working for the bank in Exercise 2 wants to know if the distribution of applications by card is the same for the past three mailings. She takes a random sample of 200 from each mailing and counts the number applying for Silver, Gold,

> An analyst at a local bank wonders if the age distribution of customers coming for service at his branch in town is the same as at the branch located near the mall. He selects 100 transactions at random from each branch and researches the age information

> For the customers in Exercise 2, 1. If the customers apply for the three cards according to the historical proportions, about how big, on average, would you expect the χ2 statistic to be (what is the mean of the χ2 distribution)? 2. Does the statistic

> For the births in Exercise 1, 1. If there is no seasonal effect, about how big, on average, would you expect the χ2 statistic to be (what is the mean of the χ2 distribution)? 2. Does the statistic you computed in Exercise 1 seem large in comparison to

> At a major credit card bank, the percentages of people who historically apply for the Silver, Gold, and Platinum cards are 60%, 30%, and 10%, respectively. In a recent sample of customers responding to a promotion, of 200 customers, 110 applied for Silve

> The Iliad also reports the cause of many injuries. Here is a table summarizing those reports for the 152 injuries for which the Iliad provides that information. Is there an association? 1. Under the null hypothesis, what are the expected values? 2. Compu

> Three statistics classes all took the same test. Histograms and boxplots of the scores for each class are shown below. Match each class with the corresponding boxplot.

> If there is no seasonal effect on human births, we would expect equal numbers of children to be born in each season (winter, spring, summer, and fall). A student takes a census of her statistics class and finds that of the 120 students in the class, 25 w

> Consider the weights from Exercise 4. The side-by-side boxplots below show little difference between the two groups. Should this be sufficient to draw a conclusion about the accuracy of the weigh-in-motion scale?

> Thinking about the data on fuel efficiency in Exercise 3 , why is the blocking accomplished by a matched pairs analysis particularly important for a sample that has both cars and trucks?

> Find a 98% confidence interval of the weight differences in Exercise 4 . Interpret this interval in context.

> In Exercise 3, after deleting an outlying value of –27, the mean difference in fuel efficiencies for the 632 vehicles was 7.37 mpg with a standard deviation of 2.52 mpg. Find a 95% confidence interval for this difference and interpret it in context.

> The calibration test for a new weight-in-motion method of weighing trucks was introduced in Chapter 6, exercise 52 . Is this method consistent with the traditional method of static weighing? Are the conditions for matched pairs inference satisfied? Weigh

> We have data on the city and highway fuel efficiency of 633 cars and trucks. 1. Would it be appropriate to use paired t methods to compare the city fuel efficiency of the cars and the trucks? 2. Would it be appropriate to use paired t methods to compare

> Which of the following scenarios should be analyzed as paired data? 1. Spouses are asked about the number of hours of sleep they get each night. We want to see if husbands get more sleep than wives. 2. 50 insomnia patients are given a placebo and 50 are

> Which of the following scenarios should be analyzed as paired data? 1. Students take an MCAT prep course. Their before and after scores are compared. 2. 20 male and 20 female students in class take a midterm. We compare their scores. 3. A group of colleg

> The researchers from Exercise 1 want to test if the proportions of foreign born are the same in the United States and Canada. What is the appropriate standard error to use for the hypothesis test? 1. What is the difference in the proportions of foreign b

> Ozone levels (in parts per billion, ppb) were recorded at sites in New Jersey monthly between 1926 and 1971. Here are boxplots of the data for each month (over the 46 years), lined up in order (January=1): 1. In what month was the highest ozone level eve

> If the information in Exercise 2 is to be used to make inferences about all people who work at non-profits and for-profit companies, what conditions must be met before proceeding? List them and explain if they are met.

> If the information in Exercise 1 is to be used to make inferences about the proportion all Canadians and all U.S. citizens born in other countries, what conditions must be met before proceeding? Are they met? Explain.

> For the interval given in Exercise 4 , explain what 95% confidence means.

> For the interval given in Exercise 3 , explain what 95% confidence means.

2.99

See Answer