[SOLVED] a. What is a pilot study? b.

> a. What is the t test for an individual regression coefficient? b. In what way is such a test adjusted for the other X variables? c. If the F test is not significant, are you permitted to go ahead and test individual regression coefficients?

> a. What does the result of the F test tell you? b. What are the two hypotheses of the F test? c. In order for the F test to be significant, do you need a high or a low value of R2? Why?

> a. What salary would you expect for an individual with 3 years of experience? b. Find the 95% confidence interval for the salary of a new individual (from the same population from which the data were drawn) who has 3 years of experience. c. Find the 95%

> For the regression equation, answer the following: a. What is it used for? b. Where does it come from? c. What does the constant term tell you? d. What does a regression coefficient tell you?

> a. What kind of variable should you create in order to include information about a categorical variable among your X variables? Please give the name of the variables and indicate how they are created. b. For a categorical variable with four categories, h

> a. What is interaction? b. What can be done to include interaction terms in the regression equation?

> How does polynomial regression help you deal with nonlinearity?

> a. What is an elasticity? b. Under what circumstances will a regression coefficient indicate the elasticity of Y with respect to Xi?

> a. What are the axes in the diagnostic plot? b. Why is it good to find no structure in the diagnostic plot?

> For multiple regression, answer the following: a. What are the three goals? b. What kinds of data are necessary?

> Define the predicted value and the residual for a given data point.

> a. What is so special about the least-squares line that distinguishes it from all other lines? b. How does the least-squares line “know” that it is predicting Y from X instead of the other way around? c. It is reasonable to summarize the “most typical” d

> a. If large values of X cause the Y values to be large, would you expect the correlation to be positive, negative, or zero? Why? b. If you find a strong positive correlation, does this prove that large values of X cause the Y values to be large? If not,

> a. What fraction of the variation in salaries can be explained by the fact that some employees have more experience than others? b. What salary would you expect for an individual with 8 years of experience? c. Find the 95% confidence interval for the sal

> Draw a scatterplot to illustrate each of the following kinds of structure in bivariate data. There is no need to work from data for this question; you may draw the points directly. a. No relationship between X and Y. b. Linear relationship with strong po

> a. What is the covariance between X and Y? b. Which is easier to interpret, the covariance or the correlation? Why?

> For each of the following summaries, first indicate what the usual interpretation would be. Then indicate whether there are any other possibilities. a. r = 1. b. r = 0.85. c. r = 0. d. r = –0.15. e. r = –1.

> a. Give an example in which the intercept term, a, has a natural interpretation. b. Give an example in which the intercept term, a, does not have a natural interpretation.

> Using a least-squares line, you have predicted that the cost of goods sold will rise to $8.33 million at the end of next quarter based on expected sales of $38.2 million. Your friend in the next office remarks, “Isn’t it also true that a cost of goods so

> Statistical inference in regression is based on the linear model. Name at least three problems that can arise when the linear model fails to hold.

> Identify and write a formula for each of the following quantities, which are useful for accomplishing statistical inference. a. The standard error used for the regression coefficient. b. The standard error of the intercept term. c. The standard error of

> a. What is the linear model? b. Which two parameters define the population straight-line relationship? c. What sample statistics are used to estimate the three population parameters α, β, and σ? d. Is the slope of the least-squares line, computed from a

> Distinguish the standard error of estimate and the coefficient of determination.

> a. What is a type I error? Can it be controlled? Why or why not? b. What is a type II error? Can it be controlled? Why or why not? c. When, if ever, is it correct to say that “the null hypothesis is true with probability 0.95”? d. What can you say about

> a. How much experience would you expect for a 50-year-old individual? b. Find the 95% confidence interval for the experience of a new individual (from the same population from which the data were drawn) who is 50 years old. c. Find the 95% confidence int

> a. What assumptions must be satisfied for a two-sided t-test to be valid? b. Consider each assumption in turn. What happens if the assumption is not satisfied? What, if anything, can be done to fix the problem?

> a. What, in general, is a test statistic? b. Which test statistic would you use for a two-sided t-test? c. What, in general, is a critical value? d. Which critical value would you use for a two-sided t-test?

> a. What is the reference value? Does it come from the sample data? Is it known or unknown? b. What is the t-statistic? Does it depend on the reference value? c. Does the confidence interval change depending on the reference value?

> a. What is Student’s t-test? b. Who was Student? What was his contribution?

> a. Briefly describe the steps involved in performing a two-sided test concerning a population mean based on a confidence interval. b. Briefly describe the steps involved in performing a two-sided test concerning a population mean based on the t-statistic

> a. What is a hypothesis? In particular, is it a statement about the population or the sample? b. How is the role of the null hypothesis different from that of the research hypothesis? Which one usually includes the case of pure randomness? Which one has

> a. Describe the general process of constructing confidence intervals and performing hypothesis tests using the rule of thumb when you have an estimator and its standard error. b. If you also know the number of degrees of freedom, how would your answer ch

> a. What is an unpaired t-test? b. Identify the two hypotheses involved in an unpaired t-test. c. What is the “independence” requirement? Give a concrete example. d. How is an unpaired t-test similar to and different from an ordinary t-test for just one s

> a. What is a paired t-test? b. Identify the two hypotheses involved in a paired t-test. c. What is the “pairing” requirement? Give a concrete example. d. How is a paired t-test similar to and different from an ordinary t-test for just one sample?

> a. How is a one-sided test performed based on a confidence interval? b. How is a one-sided test performed based on the t-statistic?

> Consider annual salary as the Y variable and experience as the X variable. a. Draw a scatterplot and describe the relationship. b. Find the correlation coefficient. What does it tell you? Is it appropriate, compared to the scatterplot? c. Find the least-

> a. What is a one-sided test? b. What are the hypotheses for a one-sided test? c. When are you allowed to perform a one-sided test? What should you do if you are not sure if it is allowed? d. If you perform a one-sided test when it’s really not permitted,

> a. What is the purpose of hypothesis testing? b. How is the result of a hypothesis test different from a confidence interval statement?

> a. What is the relative frequency interpretation of the correctness of a confidence interval? b. What is the “lifetime track record” interpretation of the correctness of many confidence intervals?

> a. Describe the two assumptions needed for the confidence interval statement to be valid. b. For each assumption, give an example of what could go wrong if it were not satisfied. c. How does the central limit theorem help satisfy one of these assumptions

> a. How many degrees of freedom are there for a single sample of size n? b. What accounts for the degree of freedom lost? c. How many degrees of freedom should you use if the standard error is known exactly?

> a. What would you change in the computation of a two-sided 95% prediction interval to find a two sided 99% prediction interval instead? b. What would you change in the computation of a two-sided 95% prediction interval to find a one sided 95% prediction

> a. What is the standard error for prediction? b. Why is the standard error for prediction even larger than the standard deviation S?

> a. What is the difference between a prediction interval and a confidence interval? b. Which type of interval should you use to learn about the mean spending habits of your typical customer? c. Which type of interval should you use to learn about the spen

> a. What additional criterion must be satisfied for a one sided confidence interval to be valid (in addition to the two assumptions needed for a two-sided confidence interval)? b. If in doubt, should you use a one-sided or a two sided confidence interval?

> a. Why must a one-sided confidence interval always include the sample average? b. Must a one-sided confidence interval always include the population mean?

> Test to see if the population mean age for training level A differs from that for levels B and C combined.

> a. What is the central limit theorem? b. Does the central limit theorem specify that individual cases follow a normal distribution? c. How do you interpret the idea that the average has a normal distribution? d. What is the mean of a sum of independent o

> a. What is an estimator? b. What is an estimate? c. A sample standard deviation is found to be 13.8. Is this number an estimator or an estimate of the population standard deviation? d. What is the error of estimation? When you estimate an unknown number,

> a. What is a random sample? b. Why is a random sample approximately representative? c. What is the difference between a random sample selected with and one selected without replacement? d. What is a table of random digits? How is it used in sample select

> a. What is a representative sample? b. What is a biased sample? c. How can a representative sample be chosen?

> a. What is a systematic sample? b. What are the main problems with systematic samples? c. Why is there no reliable standard error available for use with a sample average computed from a systematic sample?

> a. What is a stratified random sample? b. What are the benefits of stratification? c. When is stratification most likely to be helpful?

> a. What is the finite-population correction factor? b. What is the adjusted standard error? c. What is an idealized population? d. In what way are your results more limited if you use the finite-population correction factor than if you do not?

> a. What is the standard error of a statistic? b. In what way does the standard error indicate the quality of the information provided by an estimate? c. What typically happens to the standard error as the sample size, n, increases?

> a. What is a population? b. What is a sample? Why is sampling useful? c. What is a census? Would you always want to do a census if you had the resources?

> Now examine the effect of gender on annual salary, with and without adjusting for age and experience. a. Find the average annual salary for men and for women and compare them. b. Using a two-sided test at the 5% level, test whether men are paid significa

> a. What kinds of situations give rise to an exponential distribution? b. What is meant by the fact that an exponential random variable has no memory? c. Can the standard normal probability table be used to find probabilities for an exponential distributi

> a. What kinds of situations give rise to a Poisson distribution? b. Is the Poisson a discrete or a continuous distribution? c. What is the standard deviation of a Poisson distribution? d. How do you find probabilities for a Poisson distribution if the me

> a. What is a normal distribution? b. Identify all of the different possible normal distributions. c. What does the area under the normal curve represent? d. What is the standard normal distribution? What is it used for? e. What numbers are found in the s

> a. What is a factorial? b. Find 3!, 0!, and 15!. c. What is a binomial coefficient? What does it represent in the formula for a binomial probability? d. Find the binomial coefficient “8 choose 5.”

> For a binomial distribution: a. Why do not you just construct the probability tree to find the probabilities? b. How do you find the mean and the standard deviation? c. How do you find the probability that X is equal to some number? d. How do you find th

> a. How do you tell if a random variable has a binomial distribution? b. What is a binomial proportion? c. What are n, π, X, and p?

> a. What is the probability distribution of a discrete random variable? b. How do you find the mean of a discrete random variable? How do you interpret the result? c. How do you find the standard deviation of a discrete random variable? How do you interpr

> a. What is a discrete random variable? b. What is a continuous random variable? c. Give an example of a discrete random variable that is continuous for practical purposes.

> a. What is a random variable? b. What is the difference between a random variable and a number?

> a. Name the three main sources of probability numbers. b. What is the equally likely rule? c. Are you allowed to use someone’s guess as a probability number? d. What is the difference between a Bayesian and a frequentist analysis?

> Suppose males and females are equally likely and that the number of each gender follows a binomial distribution. (Note that the database contains observations of random variables, not the random variables themselves.) a. Find n and π for the binomial dis

> a. What is the relative frequency of an event? b. How is the relative frequency different from the probability of an event? c. What is the law of large numbers?

> a. What is a probability? b. Which of the following has a probability number: a random experiment, a sample space, or an event? c. If a random experiment is to be run just once, how can you interpret an event with a probability of 0.06?

> What is a Venn diagram?

> a. What is a probability tree? b. What are the four rules for probability trees?

> a. What is the interpretation of independence of two events? b. How can you tell whether two events are independent or not? c. Under what conditions can two mutually exclusive events be independent?

> a. What is the interpretation of conditional probability in terms of new information? b. Is the conditional probability of A given B, always the same number as the conditional probability of B given A? c. How can you find the conditional probability from

> a. What is the union of two events? b. What is the probability of “one event or another” if you know, (1) Their probabilities and the probability of “one event and the other”? (2) That they are mutually exclusive?

> a. What is the intersection of two events? b. What is the probability of “one event and another” if you know (1) Their probabilities and the probability of “one event or the other”? (2) Their probabilities and that they are independent? (3) That they are

> a. What is the range? b. What are the measurement units of the range? c. For what purposes is the range useful? d. Is the range a very useful statistical measure of variability? Why or why not?

> How would your answers to question 6 change if the data were not normally distributed? Data from question 6: If your data set is normally distributed, what proportion of the individuals do you expect to find: a. Within one standard deviation from the a

> Continue using predictions of annual salary based on age and experience. a.* Find the predicted annual salary and prediction error for employee 33 and compare the result to the actual annual salary. b. Find the predicted annual salary and prediction erro

> Would the average-based procedure they are currently using ordinarily be a good method? Or is it fundamentally flawed? Justify your answers.

> a. What is the variance? b. What are the measurement units of the variance? c. Which is the more easily interpreted variability measure, the standard deviation or the variance? Why? d. Once you know the standard deviation, does the variance provide any a

> a. What is the standard deviation? b. What does the standard deviation tell you about the relationship between individual data values and the average? c. What are the measurement units of the standard deviation? d. What is the difference between the samp

> When each data value is multiplied by a fixed number, what happens to a. The average, median, and mode? b. The standard deviation and range? c. The coefficient of variation?

> What is a weighted average? When should it be used instead of a simple average?

> What is the average? Interpret it in terms of the total of all values in the data set.

> What is an outlier? How do you decide whether a data point is an outlier or not?

> What is a percentile? In particular, is it a percentage (e.g., 23%), or is it specified in the same units as the data (e.g., $35.62)?

> How do you usually define the mode for a quantitative data set? Why is this definition ambiguous?

> Consider the cumulative distribution function: a. What is it? b. How is it drawn? c. What is it used for? d. How is it related to the histogram and box plot?

> What kinds of trouble do outliers cause?

> View each column as a collection of independent observations of a random variable. a. In each case, what kind of variable is represented, continuous or discrete? Why? b.* Consider the event “annual salary is above $40,000.” Find the value of the binomial

> Why is it important in a report to explain how you dealt with an outlier?

> What is a bimodal distribution? What should you do if you find one?

> Why is it important to identify the source of funding when evaluating the results of a statistical study?

> Distinguish between primary and secondary data.

> Differentiate between probability and statistics.

> Differentiate between time-series data and cross sectional data.

> What is the difference between ordinal and nominal qualitative data?

> What is the main problem with skewness? How can it be solved in some cases?

> What is the difference between discrete and continuous quantitative variables?

Question: a. What is a pilot study? b.