The analyst in Exercise 11 fits the model with the four predictor variables. The regression output shows: // a) How many observations were used in the regression? b) What might you do next? c) Is it clear that Income is more important to predicting Spending than Networth? Explain.
> A regression was performed to predict selling Price of houses in dollars from their Area in square feet, Lotsize in square feet, and Age in years. The R2 is 92%. The equation from this regression is given here. Price = 169,328 + 35.3 Area + 0.718 Lotsiz
> We really should have examined the residuals. Here is a scatterplot of the residuals from the regression of Exercise 14. a) Which assumptions and conditions for regression can you check with this plot? What do you conclude? Perhaps we should re-express
> Here are some plots of residuals for the regression of Exercise 13. Which of the regression conditions can you check with these plots? Do you find that those conditions are met? 250 125 -125 300 375 450 Predicted 250 125 -125 + 1.25 -1.25 0.00 Nsco
> A second-order autoregressive model for the gas prices is: Using values from the table, what is the predicted value for January 2007 (the value just past those given in the table)? Dependent variable is: Gas R squared = 82.2% R squared (adjusted) =
> The investor in Exercise 18 now accepts your analysis but claims that it demonstrates that it doesn’t matter how many weeks a show plays on Broadway; receipts will be essentially the same. Explain why this interpretation is not a valid use of this regres
> A Police union leader accepts your analysis in Exercise 17 but claims that it proves that paying police more will reduce violent crime. Explain why this interpretation is not a valid use of this regression model. Offer some alternative explanations.
> Consider the coefficient of Playing Weeks in the regression table of Exercise 14. a) State the standard null and alternative hypotheses for the true coefficient of Playing Weeks. b) Test the null hypothesis 1at a = 0.052 and state your conclusion. c) A
> Suppose you have fit a linear model to some data and now take a look at the residuals. For each of the following possible residuals plots, tell whether you would try a re-expression and, if so, why. a) - b) c)
> Suppose you have fit a linear model to some data and now take a look at the residuals. For each of the following possible residuals plots, tell whether you would try a re-expression and, if so, why. a) b) c)
> A real estate agent collects data to develop a model that will use the Size of a new home (in square feet) to predict its Sale Price (in thousands of dollars). Which of these is most likely to be the slope of the regression line: 0.008, 0.08, 0.8, or 8?
> Although some women are colorblind, this condition is found primarily in men. An advertisement for socks marked so they were easy for someone who was colorblind to match started out “There’s a strong correlation between sex and colorblindness.” Explain i
> Here is a regression of Women’s age vs. Men’s age, and a plot of the residuals. a) The residual plot shows 4 outliers, labeled according to the years they correspond to. Explain what they say about the data for those
> In Exercise 39 you investigated the federal rate on 3-month Treasury bills between 1950 and 1980. The scatterplot below shows that the trend changed dramatically after 1980, so we’ve built a new regression model that includes only the d
> A second-order autoregressive model for the apple prices (for all 4 years of data) is / Using the values from the table, what is the predicted value for January 2007 (the value just past those given in the table)?
> In Exercise 21 we looked at the age at which women married as one of the variables considered by those selling wedding services. Another variable of concern is the difference in age of the two partners. The graph shows the ages of both men and women at f
> Here’s a plot showing the federal rate on 3-month Treasury bills from 1950 to 1980, and a regression model fit to the relationship between the Rate (in %) and Years since 1950. (www.gpoaccess.gov/eop/) a) What is the correlation betw
> How does the speed at which a car drives affect fuel economy? Owners of a taxi fleet, watching their bottom line sink beneath fuel costs, hired a research firm to tell them the optimal speed for their taxis to drive. Researchers drove a compact car for 2
> Small businesses must track every expense. A f lower shop owner tracked her costs for heating and related it to the average daily Fahrenheit temperature, finding the model Cost = 133 - 2.13 Temp. The residuals plot for her data is shown. a) Interpret
> Published reports about violence in computer games have become a concern to developers and distributors of these games. One firm commissioned a study of violent behavior in elementary-school children. The researcher asked the children’s parents how much
> A researcher gathering data for a pharmaceutical firm measures blood pressure and the percentage of body fat for several adult males and finds a strong positive association. Describe three different possible cause-and-effect relationships that might be p
> The original five points in Exercise 33 produce a regression line with slope 0. Match each of the green points (a–e) with the slope of the line after that one point is added: 1) -0.45 2) -0.30 3) 0.00 4) 0.05 5) 0.85
> The scatterplot shows five blue data points at the left. Not surprisingly, the correlation for these points is r = 0. Suppose one additional data point is added at one of the five positions suggested below in green. Match each point (a–
> Each of the following scatterplots a–d shows a cluster of points and one “stray” point. For each, answer questions 1–4: 1) In what way is the point unusual? Does it have high leverag
> Each of the four scatterplots a–d that follow shows a cluster of points and one “stray” point. For each, answer questions 1–4: 1) In what way is the point unusual? Does it have high
> For the Gas prices of Exercise 6, find the lag2 version of the prices.
> Like many businesses, The National Hurricane Center also participates in a program to improve the quality of data and predictions by government agencies. They report their errors in predicting the path of hurricanes. The following scatterplot shows the t
> Much attention has been paid to the challenges faced by the airline industry. Patterns in customer demand are an important variable to watch. The scatterplot below shows the number of passengers departing from Oakland (CA) airport month by month from 199
> How does what a movie earns relate to its run time? Will audiences pay more for a longer film? Does the relationship depend on the type of film? The scatterplot shows the relationship for the films in Exercise 27 between U.S. Gross earnings and Run Time.
> Here’s a scatterplot of the production budgets (in millions of dollars) vs. the running time (in minutes) for a collection of major movies. Dramas are plotted in red and all other genres are plotted in blue. A separate least squares reg
> An intern who has created a linear model is disappointed to find that her R2 value is a very low 13%. a) Does this mean that a linear model is not appropriate? Explain. b) Does this model allow the intern to make accurate predictions? Explain.
> In justifying his choice of a model, a consultant says “I know this is the correct model because R2 = 99.4%.” a) Is this reasoning correct? Explain. b) Does this model allow the consultant to make accurate predictions? Explain.
> The United Nations Development Programme (UNDP) uses the Human Development Index (HDI) in an attempt to summarize in one number the progress in health, education, and economics of a country. The mean years of schooling is positively associated with HDI.
> The United Nations Development Programme (UNDP) collects data in the developing world to help countries solve global and national development challenges. In the UNDP annual Human Development Report, you can find data on over 100 variables for each of 197
> Even with campaigns to reduce smoking, Americans still consume more than four packs of cigarettes per month per adult (libraries.ucsd.edu/ssds/pub/ CTS/tobacco/sales). The Centers for Disease Control and Prevention track cigarette smoking in the United S
> Weddings are one of the fastest growing businesses; about $40 billion is spent on weddings in the United States each year. But demographics may be changing, and this could affect wedding retailers’ marketing plans. Is there evidence tha
> For the Apple prices of Exercise 5, find the lag1 version of the prices.
> Orange growers know that the larger an orange the higher the price it will bring. But as the number of oranges on a tree increases, the fruit tends to be smaller. Here’s a table of that relationship. Create a model for this relationship
> The Organization for Economic Cooperation and Development (OECD) is an organization comprised of thirty countries. To belong, a country must support the principles of representative democracy and a free market economy. How have these countries grown in t
> For the regression model in Exercise 8, the leverage values look like this: The movie with the highest leverage of 0.219 is Walt Disney’s John Carter, which grossed $66M but had a budget of $300M. If the budget for John Carter had bee
> Here is the scatterplot of the variables in Exercise 7 with regression lines added for each kind of movie: The regression model is: a) Write out the regression model. b) In this regression, the variable Budget*R Rating is an interaction term. How wou
> Are R rated movies as profitable as those rated PG-13? Here’s scatterplot of USGross ($M) vs. Budget ($M) for PG-13 (green) and R (purple) rated movies a) How would you code the indicator variable? (Use PG-13 as the base level.) b) H
> A marketing manager has developed a regression model to predict quarterly sales of his company’s mid-weight microfiber jackets based on price and amount spent on advertising. An intern suggests that he include indicator (dummy) variables for each quarter
> For each of the following, show how you would code dummy (indicator) variables to include in a regression model. a) Type of residence (Apartment, Condominium, Townhouse, Single family home) b) Employment status (Full-time, Part-time, Unemployed)
> Here is the regression for Exercise 3 with an indicator variable: a) Write out the regression model. b) In this regression, the variable R Rating is an indicator variable that is 1 for movies that have an R rating. How would you interpret the coeffici
> Do movies of different types have different rates of return on their budgets? Here’s a scatterplot of Gross Revenue in US ($M) vs. Budget ($M) for recent movies whose MPAA Rating is either PG (blue) or R (red): a) Why might a research
> A marketing manager has developed a regression model to predict quarterly sales of his company’s down jackets based on price and amount spent on advertising. An intern suggests that he include an indicator (dummy) variable for the Fall quarter. a) How wo
> For the Gas prices of Exercise 6, the actual value for January 2007 was 2.321. Find the absolute percentage error of your forecast.
> If the VIF for Networth in the regression of Exercise 11 was 20.83, what would the R2 be from the regression of Networth on Age, Income, and Past Spending?
> The analyst from Exercise 11, worried about collinearity, regresses Age against Past Spending, Income, and Networth. The output shows: What is the VIF for Age? Response Variable: Age R? = 98.75% Adjusted R? = 98.74% s = 2.112 with 908 – 4 = 904 deg
> An analyst wants to build a regression model to predict spending from the following four predictor variables: a) How many observations were used in the regression? b) What might you do next? c) Is it clear that Income is more important to predicting Sp
> For the same regression as in Exercise 9, the Cook’s Distances look like this: The outlier, once again, is John Carter, whose budget was more than $200M more than its gross revenue in the U.S. Setting this movie aside and rerunning th
> For each of the following, show how you would code dummy (or indicator) variables to include in a regression model. a) Company unionization status (Unionized, No Union) b) Gender (Female, Male) c) Account Status (Paid on time, Past Due) d) Political part
> In the regression model of Exercise 3, a) What is the R2 for this regression? What does it mean? b) Why is the “Adjusted R Square” in the table different from the “R Square”?
> a) What is the null hypothesis tested for the coefficient of Run Time in the regression of Exercise 3? b) What is the t-statistic corresponding to this test? c) Why is this t-statistic negative? d) What is the P-value corresponding to this t-statistic? e
> In the regression output for the movies of Exercise 3, a) What is the null hypothesis tested for the coefficient of Stars in this table? b) What is the t-statistic corresponding to this test? c) What is the P-value corresponding to this t-statistic? d) C
> For the movies regression, here is a histogram of the residuals. What does it tell us about these assumptions and conditions? a) Linearity Condition b) Nearly Normal Condition c) Equal Spread Condition 50 40 30 20 10 -150 -25 100 225 Residuals (U)
> For the Apple prices smoothed in Exercise 5, the actual value for January 2007 was 1.034. Find the absolute percentage error of your forecast.
> For the movies examined in Exercise 4, here is a scatterplot of USGross vs. Budget: What (if anything) does this scatterplot tell us about the following assumptions and conditions for the regression? a) Linearity Condition b) Equal Spread Condition c)
> A middle manager at an entertainment company, upon seeing the analysis of Exercise 3, concludes that the longer you make a movie, the less money it will make. He argues that his company’s films should all be cut by 30 minutes to improve their gross. Expl
> What can predict how much a motion picture will make? We have data on a number of recent releases that includes the USGross (in $M), the Budget ($M), the Run Time (minutes), and the average number of Stars awarded by reviewers. The first several entries
> A candy maker surveyed chocolate bars available in a local supermarket and found the following least squares regression model: a) The hand-crafted chocolate she makes has 15 g of fat and 20 g of sugar. How many calories does the model predict for a ser
> A study of homes looking at the relationship between Age of a home and Price produced the following scatterplot. A regression was fit to the data as shown below. On the basis of this plot, would you advise using this regression? Explain. 350,000- 3
> A scatterplot of Salary against Years Experience for some employees, and the scatterplot of residuals against predicted Salary from the regression line are shown in the figures. On the basis of these plots, would you recommend a re-expression of either S
> The regression of Total Revenue on Total Expenses for the concerts of Exercise 13 gives the following model: a) The Durbin-Watson statistic for this analysis is 0.73. Consult Table D in Appendix B and complete the test at α = 0.05. b) Wha
> The manager of the concert production company considered in earlier exercises considers the regression of Total Revenue on Ticket Sales (see Exercise 4) and computes the Durbin-Watson statistic, obtaining a value of 0.51. a) Consult Table D in Appendix B
> A company fits a regression to predict monthly Orders over a period of 48 months. The Durbin-Watson statistic on the residuals is 0.875. a) At a = 0.01, using k = 1 and n = 50, what are the values of dL and dU? b) Is there evidence of positive autocorrel
> A beverage company specializing in sales of champagne reports four years of quarterly sales as follows (in millions of $): The regression equation is Predicted Sales = 14.15 + 4.87 Quarter. a) Find the residuals. b) Plot the residuals against Quarter.
> Here are data on the monthly price of Delicious apples and gas, which are both components of the Consumer Price Index. The timeplot shows the years 2006–2009 for apples; the data table shows just 2006, for both. For the Gas prices: (D
> Here is another part of the regression output for the movies in Exercise 3: a) Using the values from the table, show how the value of R2 could be computed. Don’t try to do the calculation, just show what is computed. b) What is the F-
> A house in the upstate New York area from which the chapter data was drawn has 2 bedrooms and 1000 square feet of living area. Using the multiple regression model found in the chapter, a) Find the price that this model estimates. b) The house just sold
> The bookstore in Exercise 5 decides to have a gala event in an attempt to drum up business. They hire 100 employees for the day and bring in a total of $42,000. a) Find the regression line predicting Sales from Number of people working with the new point
> The production company of Exercise 7 offers advanced sales to “Frequent Buyers” through its website. Here’s a relevant scatterplot: One performer refused to permit advanced sales. What effect has th
> A regression of Total Revenue on Ticket Sales by the concert production company of Exercises 2 and 4 finds the model a) Management is considering adding a stadium-style venue that would seat 10,000. What does this model predict that revenue would be if
> Here are prices for the external disk drives we saw in Chapter 15, Exercise 10: The least squares line is The assumptions and conditions for regression are met. a) Disk drives keep growing in capacity. Some tech experts now talk about Petabyte (PB
> Here are the data from the small bookstore we saw in Chapter 15, Exercise 9. The regression line is: and we can assume that the assumptions and conditions for regression are met. Calculations with technology find that a) Find the predicted sales on
> The concert production company of Exercise 2 made a second scatterplot, this time relating Total Revenue to Ticket a) Describe the relationship between Ticket Sales and Total Revenue. b) How are the results for the two venues similar? c) How are they d
> The analyst in Exercise 1 tried fitting the regression line to each market segment separately and found the following: What does this say about her concern in Exercise 1? Was she justified in worrying that the overall model might not accurately summ
> Here is a table of values from the U.S. Bureau of Labor Statistics: (Data in BLS output) For the series of Output per hour of labor: a) Make a time series plot. b) Describe the trend component. (Remember: Direction, Form, and Strength.) c) Is there evi
> Are the following data time series? If not, explain why. a) Reports from the Bureau of Labor Statistics on the number of U.S. adults who are employed full-time in each major sector of the economy. b) The quarterly Gross Domestic Product (GDP) of France f
> Are the following data time series? If not, explain why. a) Quarterly earnings of Microsoft Corp. b) Unemployment in August 2010 by Education level. c) Time spent in training by workers in NewCo. d) Numbers of e-mails sent by employees of SynCo each hour
> Fred Barolo heads a travel company that offers, among other services, customized travel packages. These packages provide a relatively high profit margin for his company, but Fred worries that a weakened economic outlook will adversely affect this segment
> Alpine Medical Systems, Inc., is a large provider of medical equipment and supplies to hospitals, doctors, clinics, and other health care professionals. Alpine’s VP of Marketing and Sales, Kenneth Jadik, asked one of the company’s analysts, Nicole Haly,
> Most people older than 40 remember chia seeds as the source of “green hair” for a variety of animal-shaped terra-cotta figurines. While chia pets are still available (there is even one resembling President Trump), chia seeds are now more often recognized
> GoLearn is a new social networking site designed mainly for college students who have an interest in study abroad. Students from both within and outside the United States are beginning to join, although the majority of users are from the United States. U
> A concert production company examined its records. The manager made the following scatterplot. The company places concerts in two venues, a smaller, more intimate theater (plotted with blue circles) and a larger auditorium-style venue. a) Describe the
> Here are data on the monthly price of Delicious apples and gas, which are both components of the Consumer Price Index. The timeplot shows the years 2006–2009 for apples; the data table shows just 2006, for both. For the Apple prices:
> Life insurance rates are based on life expectancy values compiled for large demographic groups. But with improvements in medical care and nutrition, life expectancies have been changing. Here is a table from the National Vital Statistics Report that give
> Many professions use tables to determine key quantities. The value of a log is based on the number of board feet of lumber the log may contain. (A board foot is the equivalent of a piece of wood 1 inch thick, 12 inches wide, and 1 foot long. For example,
> Consider again the post-1960 trend in U.S. GDP we examined in Exercise 57. Here are a regression and residual plot when we use the square root of GDP in the model. Is this a better model for GDP? Explain. Response variable is: VGDP R squared = 99.4%
> The scatterplot shows the gross domestic product (GDP) of the United States in trillions of (2010) dollars plotted against years since 1960. (Data in GDP and DJIA 2017) A linear model fit to the relationship looks like this: (We’ve i
> Of course, what matters most to the individual entrepreneur—the licensed commercial lobster fisher—is the price of lobster. Here’s an analysis relating that price ($/lb) to the number of traps (millio
> Does the Dow Jones Industrial Average (DJIA) reflect the economy as measured by the Gross Domestic Product (GDP)? Here’s a plot and a regression. (Both are converted to 2010 dollars to remove the effects of inflation. GDP is in $B to ma
> Lobster are caught in traps, which are baited and left in the open ocean. Licenses to fish for lobster are limited, there is a small additional fee for each trap in use, and there are limits on the numbers of traps that can be placed in each of seven fis
> According to the Maine Department of Marine Resources, in 2016 more than 130,800,000 pounds of lobster were landed in Maine—a catch worth more than $533.09M. The lobster fishing industry is carefully controlled and licensed, and facts a
> In Exercise 40, we fit a linear regression for the number of monthly international visitors to Hawaii (for the years 2002 through 2006) using Time and dummy variables for the months as predictors. The R2 value was 59.9% and a residual plot against Time
> In Exercise 39, we fit a linear regression for the number of monthly domestic visitors to Hawaii (for the years 2002 through 2006) using Time and dummy variables for the months as predictors. The R2 value was 96.6% and a residual plot against Time would
> The data for hard drives in Exercise 6 originally included a 200 GB (0.2 TB) drive that sold for $299.00 (see Chapter 4, Exercise 2). a) Find the regression line predicting Price from Capacity with this hard drive added. b) What has changed from the orig
> In Exercise 39, we fit a linear regression for the number of monthly domestic visitors to Hawaii (for the years 2002 through 2006) using Time and dummy variables for the months as predictors. The R2 value was 96.6% and a residual plot against Time would
> The following time series plot shows the data for the monthly U.S. Unemployment rate (%) from January 2003 to June 2013. These data have been seasonally adjusted (meaning that the seasonal component has already been removed). a) What time series compon
> Return to the oil price data of Exercise 47. a) Find a linear model for this series. b) Find an exponential (multiplicative) model for this series. c) For the model of Exercise 47 and the models of parts a and b, compute the MAPE. Which model did best? G