The dataset below shows midterm and homework scores from an introductory statistics course.
1. Fit a model predicting the second midterm score from the first.
2. Comment on the model you found, including a discussion of the assumptions and conditions for regression. Is the coefficient for the slope statistically significant?
3. A student comments that because the P-value for the slope is very small, Midterm 2 is very well predicted from Midterm 1. So, he reasons, next term the professor can give just one midterm. What do you think?
> A student experiment was run to test the performance of 4 brands of batteries under 2 different Environments (room temperature and cold). For each of the 8 treatments, 2 batteries of a particular brand were put into a flashlight. The flashlight was then
> Refer back to the experiment in Exercise 22 . Instead of Total counts, redo the analysis using log(Total counts) as the response. Do your conclusions change? How? Are the assumptions of the model better satisfied?
> Refer back to the experiment in Exercise 21 . Instead of mpg redo the analysis using log (mpg) as the response. Do your conclusions change? How? Are the assumptions of the model better satisfied?
> A gas chromatograph is an instrument that measures the amounts of various compounds in a sample by separating its constituents. Because different components are flushed through the system at different rates, chromatographers are able to both measure and
> An experiment to test a new gasoline additive, Gasplus, was performed on three different cars: a sports car, a minivan, and a hybrid. Each car was tested with both Gasplus and regular gas on 10 different occasions and their gas mileage was recorded. Here
> Building on the cup experiment of the Chapter 4 Step-By-Step, a student selects one type of container and designs an experiment to see whether the type of Liquid stored and the outside Environment affect the ability of a cup to maintain temperature. He r
> The students running the sprouts experiment (Exercise 12 ) also kept track of the number of beans sprouted (out of 40) for each of the 36 dishes. Here are the partial boxplots of Sprouts plotted against Salinity and Temperature: 1. State the hypotheses a
> For his final project, Jonathan examined the effects of two factors on how well stains are removed when washing clothes. On each of 16 new white handkerchiefs, he spread a teaspoon of dirty motor oil (obtained from a local garage). He chose 4 Temperature
> A student performed an experiment to see if her favorite sneakers and the time of day might affect her free throw percentage. She tried shooting with and without her favorite sneakers and in the early morning and at night. For each treatment combination,
> Refer back to Exercise 14 . Perform your own analysis of the data to see if eating fish and contracting prostate cancer are related.
> How have movies changed during the decade from 2006 to 2015? Here is a contingency table showing the proportion of movies with each of the MPAA categories in each year: 1. Are these column percents or row percents? How can you tell? 2. Does it look like
> Refer back to Exercise 13 . Perform your own analysis of the data to see if baldness and heart disease are related. Do your conclusions support the claim that baldness is a cause of heart disease? Explain.
> The Chapter 3 Step-By-Step looked at a Swedish study that asked 6272 men how much fish they ate and whether or not they had prostate cancer. (Data in Fish diet) Here are summary counts: 1. Comment on her analysis. What problems, if any, do you find with
> A retrospective study examined the link between baldness and the incidence of heart disease. In the study, 1435 middle-aged men were selected at random and examined to see whether they showed signs of Heart Disease (or not) and what amount of Baldness th
> An experiment on mung beans was performed to investigate the environmental effects of salinity and water temperature on sprouting. Forty beans were randomly allocated to each of 36 petri dishes that were subject to one of four levels of Salinity (0, 4, 8
> The National Highway Transportation Safety Administration runs crash tests in which stock automobiles are crashed into a wall at 35 mph with dummies in both the passenger and the driver seats. The THOR Alpha crash dummy is capable of recording 134 channe
> Refer to the experiment in Exercise 8 . After analyzing his data the student reports that the F-ratio for Tire Pressure is 4.29 with a P-value of 0.030, the F-ratio for Acceleration is 2.35 with a P-value of 0.143, and the F-ratio for the Interaction eff
> A pharmaceutical company tested three formulations of a pain relief medicine for migraine headache sufferers. For the experiment, 27 volunteers were selected and 9 were randomly assigned to one of three drug formulations. The subjects were instructed to
> To see how much of a difference time of day made on the speed at which he could download files, a college sophomore performed an experiment. He placed a file on a remote server and then proceeded to download it at three different time periods of the day.
> We also have data on the protein content of the cereals in Exercise 19 by their shelf number. Here are the boxplot and ANOVA table: 1. What are the null and alternative hypotheses? 2. What does the ANOVA table say about the null hypothesis? (Be sure to r
> Supermarkets often place similar types of cereal on the same supermarket shelf. We have data on the shelf as well as the sugar, sodium, and calorie content of 77 cereals. Does sugar content vary by shelf? At the top of the next column is a boxplot and an
> Students in an Intro Stats course were asked to describe their politics as Liberal, Moderate, or Conservative. Here are the results: 1. What percent of the class is male? 2. What percent of the class considers themselves to be Conservative? 3. What perce
> A biology student is studying the effect of 10 different fertilizers on the growth of mung bean sprouts. She sprouts 12 beans in each of 10 different petri dishes, and adds the same amount of fertilizer to each dish. After one week she measures the heigh
> A school district superintendent wants to test a new method of teaching arithmetic in the fourth grade at his 15 schools. He plans to select 8 students from each school to take part in the experiment, but to make sure they are roughly of the same ability
> In a statement to a Senate Public Works Committee, a senior executive of Texaco, Inc., cited a study on the effectiveness of auto filters on reducing noise. Because of concerns about performance, two types of filters were studied, a standard silencer and
> A student wants to investigate the effects of real vs. substitute eggs on his favorite brownie recipe. He enlists the help of 10 friends and asks them to rank each of 8 batches on a scale from 1 to 10. Four of the batches were made with real eggs, four w
> Particulate matter is a serious form of air pollution often arising from industrial production. One way to reduce the pollution is to put a filter, or scrubber, at the end of the smokestack to trap the particulates. An experiment to determine which smoke
> An experiment to determine the effect of several methods of preparing cultures for use in commercial yogurt was conducted by a food science research group. Three batches of yogurt were prepared using each of three methods: traditional, ultrafiltration, a
> A regression model for data on breakfast cereals originally looked like this: Dependent variable is: Calories R squared =84.5% R-squared (adjusted)=83.4% s=7.947 with 776=71 degrees of freedom Let’s take a closer look at the coefficien
> HIV One ongoing health problem in the part of Africa encompassing the outlying countries for the regression model of Exercise 17 is HIV/AIDS. Could that explain these outliers? Here another model, now with the logarithm of the HIV incidence included as a
> Here the residual plot corresponding to the regression model of Exercise 18 : The extreme case this time is Weight Watchers Pepperoni (makes sense, doesn’t it?). We can make one more indicator for Weight Watchers. Here the model: Depend
> The residual plot of Exercise 17 calls out some countries that have particularly large negative residuals. They are Gabon, Swaziland, Botswana, Namibia, and South Africa. What do these countries have in common? (Hint: Consult a map.) What does it mean fo
> Prior to graduation, a high school class was surveyed about its plans. The following table displays the results for white and minority students (the Minority group included African American, Asian, Hispanic, and Native American students): 1. What percent
> A plot of Studentized residuals against predicted values for the regression model found in Exercise 16 now looks like this. It has been colored according to Type of pizza and separate regression lines fitted for each type: 1. Comment on this diagnostic p
> At the top of the next column is a regression analysis to predict Life expectancy using the data of Exercise 15 and a plot of the residuals Response variable is: Life expectancy 240 total cases of which 20 are missing Comment on the model and the residua
> Here a plot of the Studentized residuals against the predicted values for the regression model found in Exercise 14 : The two extraordinary cases in the plot of residuals are Reggio and Michelina, two gourmet pizzas. 1. Interpret these residuals. What do
> Here is a scatterplot matrix of the variables as re-expressed in Exercise 13 using a version that places Normal probability plots on the diagonal. 1. Comment on their suitability for a regression model to predict Life expectancy. The points are colored a
> Union rated frozen pizzas. Their report includes the number of Calories, Fat content, and Type (cheese or pepperoni, represented here as an indicator variable that is 1 for cheese and 0 for pepperoni). Here a regression model to predict the Score awarded
> The United States Central Intelligence Agency maintains a public site called the World Factbook at www.cia.gov/library/publications/the-worldfactbook/. There you find a wealth of variables about all the countries of the world. Let’s exa
> In Chapter 9 , Exercises 14 , 18, 29, and 30, we considered data on hill races in Scotland. These are overland races that climb and descend hills sometimes several hills in the course of one race. Here is a regression analysis to predict the Women Record
> In Exercise 25 of Chapter 9 , we considered a multiple regression model for predicting calories in breakfast cereals. The regression looked like this: Dependent variable is: Calories R-squared =38.4% R-squared (adjusted)=35.9% s=15.60 with 774=73 degrees
> In previous chapters we have looked at data from the 50 states. Here an analysis of data from a few years earlier. The Murder rate is per 100,000, HS Graduation rate is in %, Income is per capita income in dollars, Illiteracy rate is per 1000, and Life E
> The following software output provides information about the Size (in square feet) of 18 homes in Ithaca, New York, and the city assessed Value of those homes. Dependent variable is Value 1. Explain why inference for linear regression is appropriate with
> The Pew Research survey cited in Exercise 27 also asked what employment sector the respondents worked in and whether their job gave them a sense of identity or whether it was just what they do for a living. This table summarizes their responses: 1. Is th
> For each of the following, list the sample space and tell whether you think the events are equally likely: 1. Toss 2 coins; record the order of heads and tails. 2. A family has 3 children; record the number of boys. 3. Flip a coin until you get a head or
> The following software output is based on the mortality rate (deaths per 100,000 people) and the education level (average number of years in school) for 58 U.S. cities. Dependent variable is Mortality 1. Comment on the assumptions for inference. 2. Is th
> A sample of 84 model- 2011 cars from an online information service was examined to see how fuel efficiency (as highway mpg) relates to the cost (Manufacturer Suggested Retail Price in dollars) of cars. Here are displays and computer output: Dependent var
> Remember the Little League instructional video discussed in Chapter 21, Exercise 35? Ads claimed it would improve the performances of Little League pitchers. To test this claim, 20 Little Leaguers threw 50 pitches each, and we recorded the number of stri
> The professor teaching the introductory statistics class discussed in Exercise 57 wonders whether performance on homework can accurately predict midterm scores. 1. To investigate it, she fits a regression of the sum of the two midterms scores on homework
> Researchers at the University of Denver Infant Study Center wondered whether temperature might influence the age at which babies learn to crawl. Perhaps the extra clothing that babies wear in cold weather would restrict movement and delay the age at whic
> Tablet computers 2014 Cnet.com tests tablet computers and continuously updates its list. As of January 2014, the list included the battery life (in hours) and luminous intensity (i.e., screen brightness, in cd/m2). We want to know if Battery life is rela
> Consider again the relationship between the sales and profits of Fortune 500 companies that you analyzed in Exercise 52. 1. Find a 95% confidence interval for the slope of the regression line. Interpret your interval in context. 2. Last year, the drug ma
> Consider again the relationship between the population and ozone level of U.S. cities that you analyzed in Exercise 51. 1. Give a 90% confidence interval for the slope of the relationship between ozone level and population. 2. For the cities studied, the
> A business analyst was interested in the relationship between a company sales and its profits. She collected data (in millions of dollars) from a random sample of Fortune 500 companies and created the regression analysis and summary statistics shown. The
> Pew Research surveyed 5006 U.S. adults to ask their opinions about the state of jobs in the United States in 2016. (www.pewsocialtrends.org/2016/10/06/the-state-of-american-jobs/) Respondents were asked how satisfied they are with their current job and
> The Environmental Protection Agency is examining the relationship between the ozone level (in parts per million) and the population (in millions) of U.S. cities. Part of the regression analysis is shown. Dependent variable is Ozone Dependent variable is
> A skeptic suggests that reduced sea ice isn’t due to global climate change at all. He offers the following model, including Year since 1979 as another predictor as an alternative to the model in Exercise 23 (Data in Sea ice): Response variable is: Extent
> The output shows an attempt to model the association between average January Temperature (in degrees Fahrenheit) and Latitude (in degrees north of the equator) for 59 U.S. cities. Which of the assumptions for inference do you think are violated? Explain.
> Further analysis of the data for the breakfast cereals in Exercise 46 looked for an association between Fiber content and Calories by attempting to construct a linear model. Here are three graphs. Which of the assumptions for inference are violated? Expl
> Is your IQ related to the size of your brain? A group of female college students took a test that measured their verbal IQs and also underwent an MRI scan to measure the size of their brains (in 1000s of pixels). The scatterplot and regression analysis a
> A healthy cereal should be low in both calories and sodium. Data for 77 cereals were examined and judged acceptable for inference. The 77 cereals had between 50 and 160 calories per serving and between 0 and 320 mg of sodium per serving. HereÃ&cen
> Consider once again the CO2 and global temperature data of Exercise 41. The mean CO2 level for these data is 352.566 ppm. 1. Find a 90% confidence interval for the mean global temperature anomaly if the CO2 level reaches 450 ppm. 2. Find a 90% prediction
> Consider again the data in Exercise 40 about the gas mileage and weights of cars. 1. Create a 95% confidence interval for the average fuel efficiency among cars weighing 2500 pounds, and explain what your interval means. 2. Create a 95% prediction interv
> Consider the CO2 and global temperature data of Exercise 41. 1. Find a 90% confidence interval for the slope of the true line describing the association between Temp and CO2. 2. Explain in this context what your confidence interval means.
> Consider again the data in Exercise 40 about the gas mileage and weights of cars. 1. Create a 95% confidence interval for the slope of the regression line. 2. Explain in this context what your confidence interval means.
> In an effort to reduce the number of gun-related homicides, some cities have run buyback programs in which the police offer cash (often $50) to anyone who turns in an operating handgun. Chance magazine looked at results from a four-year period in Milwauk
> Data collected from around the globe (including the sea ice data of Exercise 23) show that the earth is getting warmer. The generally accepted explanation relates climate change to an increase in atmospheric levels of carbon dioxide (CO2) because CO2 is
> A consumer organization has reported test data for 50 car models. We will examine the association between the weight of the car (in thousands of pounds) and the fuel efficiency (in miles per gallon). Here are the scatterplot, summary statistics, and regr
> The price of a car depends on its age as well as on its mileage. Here is a regression in which the age of the cars (in years) is included in the regression model from Exercise 34: Response variable is: Price 1. What is the interpretation of the coefficie
> Biologists studying the effects of acid rain on wildlife collected data from 172 streams in the Adirondack Mountains. They recorded the pH (acidity) of the water and the BCI, a measure of biological diversity. Here a scatterplot of BCI against pH for the
> Based on the analysis of marriage ages given in Exercise 33, find a 95% confidence interval for the rate at which the age gap is closing. Explain what your confidence interval means
> Based on the analysis of marriage ages given in Exercise 33, find a 95% confidence interval for the rate at which the age gap is closing. Explain what your confidence interval means
> On January 22, 2017, www.autotrader.com listed 55 used Honda Civics for sale by owner. Here a scatterplot of the asking price vs. the number of miles on the odometer (in thousands): 1. Do you think a linear model is appropriate? Explain. Here is the regr
> Chapter 8, Exercises 42, 44, and 49, looked at the how the age at first marriage has changed over time for men and women. One trend was that people have been waiting until they are older to get married. Generally, men are older at their first marriage th
> Based on the regression output seen in Exercise 28, create a 95% confidence interval for the slope of the regression line and interpret it in context.
> Here is a mosaic plot of the data on Diet and Politics from Exercise 5 combined with data on Gender. 1. Are there more men or women in the survey? Explain briefly. 2. Does there appear to be an association between Politics and Gender? Explain briefly. 3.
> Based on the regression output seen in Exercise 27, create a 95% confidence interval for the slope of the regression line and interpret your interval in context.
> Look again at Exercise 28 regression output for age and cholesterol level. (Data in Framingham) 1. The output reports s = 46.16. Explain what that means in this context. 2. What the value of the standard error of the slope of the regression line? 3. Expl
> Look again at Exercise 27 regression output for the calorie and sodium content of hot dogs. 1. The output reports s=59.66. Explain what that means in this context. 2. What the value of the standard error of the slope of the regression line? 3. Explain wh
> Does a person cholesterol level tend to change with age? Data collected from 1406 adults aged 45 to 62 as part of the Framingham study produced the regression analysis shown. Assuming that the data satisfy the conditions for inference, examine the associ
> Healthy eating probably doesn’t include hot dogs, but if you are going to have one, you’d probably hope it low in both calories and sodium. Recently, Consumer Reports listed the number of calories and sodium content (i
> Exercise 24 shows computer output examining the association between the sizes of houses and their sale prices. 1. Check the assumptions and conditions for inference. 2. Find a 95% confidence interval for the slope and interpret it in context.
> Exercise 23 shows computer output examining the association between Arctic sea ice extent and global mean temperature. Find a 95% confidence interval for the slope and interpret it in context.
> How does the price of a house depend on its size? Data from Saratoga, New York, on 1063 randomly selected houses that had been sold include data on price ($1000s) and size (1000 ft2), producing the following graphs and computer output: Dependent variable
> Climate scientists have been observing the extent of sea ice in the northern Arctic using satellite observations. Many have expressed concern because in recent decades the extent of sea ice has declined precipitously possibly due to global climate change
> The 2013 World Drug Report investigated the prevalence of drug use as a percentage of the population aged 15 to 64. Data from 32 European countries are shown in the following scatterplot and regression analysis. (World Drug Report, 2013. www.unodc.org/un
> The dataset Student survey contains 299 responses to a student survey from a statistics project. The questions asked included: How would you rate yourself politically? (1=Far left, 9 = Far right) What is your gender? Do you believe in God? Pick a random
> In Chapter 6, we looked at data from the National Oceanic and Atmospheric Administration about their success in predicting hurricane tracks. Here is a scatterplot of the error (in nautical miles) for predicting hurricane locations 24 hours in the future
> The coach from Exercise 2 called a team meeting to summarize the results from his study. Would it be a good strategy to tell the players that all they need to do is to shoot more and the goals will follow?
> An SAT preparation course wants to advertise based on the analyses we’ve seen that raising your SAT scores will increase your eventual earnings. Is that conclusion supported by these analyses?
> Use the survey results in the table to investigate differences in education level attained among different age groups in the United States.
> Most pregnancies are full term, but some are preterm (less than 37 weeks). Of those that are preterm, the Centers for Disease Control and Prevention classifies them as early (less than 34 weeks) and late (34 to 36 weeks). A December 2010 National Vital S
> Titanic Newspaper headlines at the time, and traditional wisdom in the succeeding decades, have held that women and children escaped the Titanic in greater proportions than men. Here a table with the relevant data. Do you think that survival was independ
> A subtle form of racial discrimination in housing is racial steering. Racial steering occurs when real estate agents show prospective buyers only homes in neighborhoods already dominated by that family race. This violates the Fair Housing Act of 1968. Ac
> In Exercise 44, you found that the expected cell counts failed to satisfy the conditions for inference. 1. Find a sensible way to combine some cells that will make the expected counts acceptable. 2. Test a hypothesis about the full moon and state your co
> In some situations where the expected cell counts are too small, as in the case of the grades given by Professors Alpha and Beta in Exercise 43, we can complete an analysis anyway. We can often proceed after combining cells in some way that makes sense a
> Some people believe that a full moon elicits unusual behavior in people. The table shows the number of arrests made in a small town during weeks of six full moons and six other randomly selected weeks in the same year. We wonder if there is evidence of a
> The following data show the percentage change in population for the 50 states and the District of Columbia from the 2000 census to the 2010 census. Using appropriate graphical displays and summary statistics, write a report on the percentage change in po
> Two different professors teach an introductory statistics course. The table shows the distribution of final grades they reported. We wonder whether one of these professors is an easier grader. 1. Will you test goodness-of-fit, homogeneity, or independenc
> In April 2009, Gallup published results from data collected from a large sample of adults in the 27 European Union member states. One of the questions asked was, Which is the most practicable and realistic option for child care, taking into account the n