For each of the following summaries, first indicate what the usual interpretation would be. Then indicate whether there are any other possibilities. a. r = 1. b. r = 0.85. c. r = 0. d. r = –0.15. e. r = –1.
> Name six properties of a data set that are displayed by a histogram.
> a. What fraction of the variation in salaries can be explained by the fact that some employees are older than others? b. What salary would you expect for a 42-year-old individual? c. Find the 95% confidence interval for the salary of a new individual (fr
> Take a close look at the data using summaries and graphs. What do you find?
> a. What is a random experiment? b. Why does defining a random experiment help to focus your thoughts about an uncertain situation?
> When may you use the least-significant-difference test to compare individual pairs of samples? When is this not permitted?
> Describe and give a formula for each of the following quantities, which are used in performing a one-way analysis of variance: a. Total sample size, n. b. Grand average,
> a. State the hypotheses for the one-way analysis of variance. b. Is the research hypothesis very specific about the nature of any differences?
> a. What kind of data set should be analyzed using the one-way analysis of variance? b. Why should not you use the unpaired t test instead of the one-way analysis of variance?
> Explain in what sense the analysis of variance involves actually analyzing variance—in particular, what variances are analyzed and why?
> a. Define a first-order autoregressive process in terms of the relationship between successive observations. b. What are the X and Y variables in the regression model to predict the next observation in a first-order autoregressive process? c. Describe th
> a. Define the random noise process in terms of the relationship between successive observations. b. Comment on the following: If it is a random noise process, then special time-series methods are not needed to analyze it. c. What are the forecast and for
> a. How is the flexibility of the Box-Jenkins ARIMA process approach helpful in time-series analysis? b. What is parsimony? c. How does the forecast relate to the actual future behavior of the estimated process? d. How do the forecast limits relate to the
> a. How is a linear trend estimated in trend-seasonal analysis? b. What kind of forecast does the linear trend represent? c. What do you do to produce a forecast from the linear trend? d. Which components are represented in this forecast? Which are missin
> Consider annual salary as the Y variable and age as the X variable. a. Draw a scatterplot and describe the relationship. b. Find the correlation coefficient. What does it tell you? Is it appropriate, compared to the scatterplot? c. Find the least-squares
> a. How do you compute the ratio-to-moving-average? Which components does it represent? b. What do you do to the ratio-to-moving-average to produce a seasonal index? Why does this work? c. What does a seasonal index represent? d. How do you seasonally adj
> a. How is the moving average different from the original series? b. For trend-seasonal analysis, why do we use exactly 1 year of data at a time in the moving average? c. Which components remain in the moving average? Which are reduced or eliminated?
> a. Name the four basic components of a monthly or quarterly time series, from the trend-seasonal approach. b. Carefully distinguish the cyclic and the irregular components.
> a. What is a forecast? b. What are the forecast limits? c. What role does a mathematical model play in forecasting? d. Why does not trend-seasonal analysis produce forecast limits?
> a. Define a first-order ARIMA process in terms of the relationship between successive observations. b. What parameter values would you set equal to zero in an ARIMA process in order to have a random walk? c. How can you construct an ARMA process from an
> a. Define a random walk in terms of the relationship between successive observations. b. Carefully distinguish a random noise process from a random walk. c. Comment on the following: If it is a random walk, then special time-series methods are not needed
> a. Define a first-order ARMA process in terms of the relationship between successive observations. b. What parameter value would you set equal to zero in an ARMA process in order to have an autoregressive process? c. What parameter value would you set eq
> a. Define a first-order moving-average process in terms of the relationship between successive observations. b. What is a moving-average process a moving average of? c. Describe the forecasts for two or more periods into the future of a first-order movin
> a. What kind of material appears in the analysis and methods section? b. Should you describe everything you have examined in the analysis and methods section? Why or why not?
> Should you leave key results out of the executive summary, ending it with a sentence such as “We have examined these issues and have come up with some recommendations.” Why or why not?
> a. What salary would you expect for an individual with no (zero years of) experience? b. Find the 95% confidence interval for the salary of a new individual (from the same population from which the data were drawn) who has no experience. c. Find the 95%
> Give some reasons why you might want to include statistical results in a report.
> Why is it necessary to identify the purpose and audience of a report?
> a. What material belongs in the appendix? b. How can an appendix help you satisfy both the casual and the dedicated reader?
> a. Give two reasons for providing a reference when you make use of material from the Internet, a book, a magazine, or another source. b. How can you tell if you have provided enough information in a reference? c. How would you reference material from a t
> a. What is the multiple regression linear model? b. List three ways in which the multiple regression linear model might fail to hold. c. What scatterplot can help you spot problems with the multiple regression linear model?
> a. If you want to be sure to get the best predictions, why not include among your X variables every conceivably helpful variable you can think of? b. How can a prioritized list help you solve the variable selection problem? c. Briefly describe two automa
> a. What is multicollinearity? b. What are the harmful effects of extreme multicollinearity? c. How might moderate multicollinearity cause your F test to be significant, even though none of your t tests are significant? d. How can multicollinearity proble
> a. How are the standardized regression coefficients computed? b. How are they useful? c. What are their measurement units?
> a. What is the t test for an individual regression coefficient? b. In what way is such a test adjusted for the other X variables? c. If the F test is not significant, are you permitted to go ahead and test individual regression coefficients?
> a. What does the result of the F test tell you? b. What are the two hypotheses of the F test? c. In order for the F test to be significant, do you need a high or a low value of R2? Why?
> a. What salary would you expect for an individual with 3 years of experience? b. Find the 95% confidence interval for the salary of a new individual (from the same population from which the data were drawn) who has 3 years of experience. c. Find the 95%
> For the regression equation, answer the following: a. What is it used for? b. Where does it come from? c. What does the constant term tell you? d. What does a regression coefficient tell you?
> a. What kind of variable should you create in order to include information about a categorical variable among your X variables? Please give the name of the variables and indicate how they are created. b. For a categorical variable with four categories, h
> a. What is interaction? b. What can be done to include interaction terms in the regression equation?
> How does polynomial regression help you deal with nonlinearity?
> a. What is an elasticity? b. Under what circumstances will a regression coefficient indicate the elasticity of Y with respect to Xi?
> a. What are the axes in the diagnostic plot? b. Why is it good to find no structure in the diagnostic plot?
> For multiple regression, answer the following: a. What are the three goals? b. What kinds of data are necessary?
> Define the predicted value and the residual for a given data point.
> a. What is so special about the least-squares line that distinguishes it from all other lines? b. How does the least-squares line “know” that it is predicting Y from X instead of the other way around? c. It is reasonable to summarize the “most typical” d
> a. If large values of X cause the Y values to be large, would you expect the correlation to be positive, negative, or zero? Why? b. If you find a strong positive correlation, does this prove that large values of X cause the Y values to be large? If not,
> a. What fraction of the variation in salaries can be explained by the fact that some employees have more experience than others? b. What salary would you expect for an individual with 8 years of experience? c. Find the 95% confidence interval for the sal
> Draw a scatterplot to illustrate each of the following kinds of structure in bivariate data. There is no need to work from data for this question; you may draw the points directly. a. No relationship between X and Y. b. Linear relationship with strong po
> a. What is the covariance between X and Y? b. Which is easier to interpret, the covariance or the correlation? Why?
> a. Give an example in which the intercept term, a, has a natural interpretation. b. Give an example in which the intercept term, a, does not have a natural interpretation.
> Using a least-squares line, you have predicted that the cost of goods sold will rise to $8.33 million at the end of next quarter based on expected sales of $38.2 million. Your friend in the next office remarks, “Isn’t it also true that a cost of goods so
> Statistical inference in regression is based on the linear model. Name at least three problems that can arise when the linear model fails to hold.
> Identify and write a formula for each of the following quantities, which are useful for accomplishing statistical inference. a. The standard error used for the regression coefficient. b. The standard error of the intercept term. c. The standard error of
> a. What is the linear model? b. Which two parameters define the population straight-line relationship? c. What sample statistics are used to estimate the three population parameters α, β, and σ? d. Is the slope of the least-squares line, computed from a
> Distinguish the standard error of estimate and the coefficient of determination.
> a. What is a type I error? Can it be controlled? Why or why not? b. What is a type II error? Can it be controlled? Why or why not? c. When, if ever, is it correct to say that “the null hypothesis is true with probability 0.95”? d. What can you say about
> a. How much experience would you expect for a 50-year-old individual? b. Find the 95% confidence interval for the experience of a new individual (from the same population from which the data were drawn) who is 50 years old. c. Find the 95% confidence int
> a. What assumptions must be satisfied for a two-sided t-test to be valid? b. Consider each assumption in turn. What happens if the assumption is not satisfied? What, if anything, can be done to fix the problem?
> a. What, in general, is a test statistic? b. Which test statistic would you use for a two-sided t-test? c. What, in general, is a critical value? d. Which critical value would you use for a two-sided t-test?
> a. What is the reference value? Does it come from the sample data? Is it known or unknown? b. What is the t-statistic? Does it depend on the reference value? c. Does the confidence interval change depending on the reference value?
> a. What is Student’s t-test? b. Who was Student? What was his contribution?
> a. Briefly describe the steps involved in performing a two-sided test concerning a population mean based on a confidence interval. b. Briefly describe the steps involved in performing a two-sided test concerning a population mean based on the t-statistic
> a. What is a hypothesis? In particular, is it a statement about the population or the sample? b. How is the role of the null hypothesis different from that of the research hypothesis? Which one usually includes the case of pure randomness? Which one has
> a. Describe the general process of constructing confidence intervals and performing hypothesis tests using the rule of thumb when you have an estimator and its standard error. b. If you also know the number of degrees of freedom, how would your answer ch
> a. What is an unpaired t-test? b. Identify the two hypotheses involved in an unpaired t-test. c. What is the “independence” requirement? Give a concrete example. d. How is an unpaired t-test similar to and different from an ordinary t-test for just one s
> a. What is a paired t-test? b. Identify the two hypotheses involved in a paired t-test. c. What is the “pairing” requirement? Give a concrete example. d. How is a paired t-test similar to and different from an ordinary t-test for just one sample?
> a. How is a one-sided test performed based on a confidence interval? b. How is a one-sided test performed based on the t-statistic?
> Consider annual salary as the Y variable and experience as the X variable. a. Draw a scatterplot and describe the relationship. b. Find the correlation coefficient. What does it tell you? Is it appropriate, compared to the scatterplot? c. Find the least-
> a. What is a one-sided test? b. What are the hypotheses for a one-sided test? c. When are you allowed to perform a one-sided test? What should you do if you are not sure if it is allowed? d. If you perform a one-sided test when it’s really not permitted,
> a. What is the purpose of hypothesis testing? b. How is the result of a hypothesis test different from a confidence interval statement?
> a. What is the relative frequency interpretation of the correctness of a confidence interval? b. What is the “lifetime track record” interpretation of the correctness of many confidence intervals?
> a. Describe the two assumptions needed for the confidence interval statement to be valid. b. For each assumption, give an example of what could go wrong if it were not satisfied. c. How does the central limit theorem help satisfy one of these assumptions
> a. How many degrees of freedom are there for a single sample of size n? b. What accounts for the degree of freedom lost? c. How many degrees of freedom should you use if the standard error is known exactly?
> a. What would you change in the computation of a two-sided 95% prediction interval to find a two sided 99% prediction interval instead? b. What would you change in the computation of a two-sided 95% prediction interval to find a one sided 95% prediction
> a. What is the standard error for prediction? b. Why is the standard error for prediction even larger than the standard deviation S?
> a. What is the difference between a prediction interval and a confidence interval? b. Which type of interval should you use to learn about the mean spending habits of your typical customer? c. Which type of interval should you use to learn about the spen
> a. What additional criterion must be satisfied for a one sided confidence interval to be valid (in addition to the two assumptions needed for a two-sided confidence interval)? b. If in doubt, should you use a one-sided or a two sided confidence interval?
> a. Why must a one-sided confidence interval always include the sample average? b. Must a one-sided confidence interval always include the population mean?
> Test to see if the population mean age for training level A differs from that for levels B and C combined.
> a. What is the central limit theorem? b. Does the central limit theorem specify that individual cases follow a normal distribution? c. How do you interpret the idea that the average has a normal distribution? d. What is the mean of a sum of independent o
> a. What is an estimator? b. What is an estimate? c. A sample standard deviation is found to be 13.8. Is this number an estimator or an estimate of the population standard deviation? d. What is the error of estimation? When you estimate an unknown number,
> a. What is a pilot study? b. What can go wrong if you do not do a pilot study?
> a. What is a random sample? b. Why is a random sample approximately representative? c. What is the difference between a random sample selected with and one selected without replacement? d. What is a table of random digits? How is it used in sample select
> a. What is a representative sample? b. What is a biased sample? c. How can a representative sample be chosen?
> a. What is a systematic sample? b. What are the main problems with systematic samples? c. Why is there no reliable standard error available for use with a sample average computed from a systematic sample?
> a. What is a stratified random sample? b. What are the benefits of stratification? c. When is stratification most likely to be helpful?
> a. What is the finite-population correction factor? b. What is the adjusted standard error? c. What is an idealized population? d. In what way are your results more limited if you use the finite-population correction factor than if you do not?
> a. What is the standard error of a statistic? b. In what way does the standard error indicate the quality of the information provided by an estimate? c. What typically happens to the standard error as the sample size, n, increases?
> a. What is a population? b. What is a sample? Why is sampling useful? c. What is a census? Would you always want to do a census if you had the resources?
> Now examine the effect of gender on annual salary, with and without adjusting for age and experience. a. Find the average annual salary for men and for women and compare them. b. Using a two-sided test at the 5% level, test whether men are paid significa
> a. What kinds of situations give rise to an exponential distribution? b. What is meant by the fact that an exponential random variable has no memory? c. Can the standard normal probability table be used to find probabilities for an exponential distributi
> a. What kinds of situations give rise to a Poisson distribution? b. Is the Poisson a discrete or a continuous distribution? c. What is the standard deviation of a Poisson distribution? d. How do you find probabilities for a Poisson distribution if the me
> a. What is a normal distribution? b. Identify all of the different possible normal distributions. c. What does the area under the normal curve represent? d. What is the standard normal distribution? What is it used for? e. What numbers are found in the s
> a. What is a factorial? b. Find 3!, 0!, and 15!. c. What is a binomial coefficient? What does it represent in the formula for a binomial probability? d. Find the binomial coefficient “8 choose 5.”
> For a binomial distribution: a. Why do not you just construct the probability tree to find the probabilities? b. How do you find the mean and the standard deviation? c. How do you find the probability that X is equal to some number? d. How do you find th
> a. How do you tell if a random variable has a binomial distribution? b. What is a binomial proportion? c. What are n, π, X, and p?
> a. What is the probability distribution of a discrete random variable? b. How do you find the mean of a discrete random variable? How do you interpret the result? c. How do you find the standard deviation of a discrete random variable? How do you interpr
> a. What is a discrete random variable? b. What is a continuous random variable? c. Give an example of a discrete random variable that is continuous for practical purposes.
> a. What is a random variable? b. What is the difference between a random variable and a number?
> a. Name the three main sources of probability numbers. b. What is the equally likely rule? c. Are you allowed to use someone’s guess as a probability number? d. What is the difference between a Bayesian and a frequentist analysis?