## Description

**INSTANT DOWNLOAD WITH ANSWERS**

**Business Forecasting 6th Edition by Wilson – Test Bank**

**SAMPLE QUESTIONS**

**Chapter 2**

** **

**MULTIPLE CHOICE TEST BANK**

Note: The correct answer is denoted by **.

- Why are forecasting textbooks full of applied statistics?

- A) Statistics is the study of uncertainty.
- B) Real-world business decisions involve risk and uncertainty.
- C) Forecasting attempts to reduce the uncertainty for uncertain events.
- D) Forecasting ultimately deals with probability.
- E) All the above. **

- Which of the following is
__not__part of the recommended nine-step forecast process?

- A) What role do forecasts play in the business decision process?
- B) What exactly is to be forecast?
- C) How urgent is the forecast?
- D) Is there enough data?
- E) All the above are parts of the process. **

- Of the following model selection criteria, which is often the most important in determining the appropriate forecast method?

- A) Technical background of the forecast user.
- B) Patterns the data have exhibited in the past. **
- C) How much money is in the forecast budget?
- D) What is the forecast horizon?
- E) When is the forecast needed?

- In a time series plot of a typical The GAP store which of the following data patterns would be most prominent?

- A)
- B) **
- C)
- D)
- E) All the above.

- Which of the following is
__incorrect__?

- The forecaster should be able to defend why a particular model or procedure has

been chosen.

- B) Forecast errors should be discussed in an objective manner to maximize management’s confidence in the forecast process.
- C) Forecast errors should not be discussed since most people know that forecasting is an inexact science. **
- D) You should tailor your presentation to the sophistication of the audience to maximize credibility in the forecast process.
- E) None of the above.

- In the model-testing phase of the nine-step process, which of the following refers to that portion of a sample that is best to use to evaluate model-forecast accuracy?

- A)
- B) Forecast horizon.
- C) Holdout period. **
- D)
- E) None of the above.

- The text presents a guide to selecting an appropriate forecasting method based on

- A) data patterns.
- B) quantity of historical data available.
- C) forecast horizon.
- D) quantitative background of the forecast user.
- E) All the above. **

- Which time-series component is said to fluctuate around the long-term trend and is fairly irregular in appearance?

- A)
- B) **
- C)
- D)
- E) None of the above.

- Forecasting January sales based on the previous month’s level of sales is likely to lead to error if the data are _____.

- A)
- B) Non-cyclical.
- C) **
- D)
- E) None of the above.

- The difference between seasonal and cyclical components is:

- A)
- B)
- C)
- D)
- E) All the above. **

- For which data frequency is seasonality
__not__a problem?

- A)
- B)
- C)
- D)
- E) **

- One can realistically
__not__expect to find a model that fits any data set perfectly, due to the ____ component of a time series.

- A)
- B)
- C)
- D) **
- E) None of the above.

- When a time series contains no trend, it is said to be

- A)
- B)
- C)
- D) **
- E)

- Stationarity refers to

- A) the size of the RMSE of a forecasting model.
- B) the size of variances of the model’s estimates.
- C) a method of forecast optimization.
- D) lack of trend in a given time series. **
- E) None of the above.

- Which of the following is
__not__a measure of central tendency in a population?

- A)
- B)
- C)
- D) **

- Which of the following is
__not__a descriptive statistic?

- A)
- B)
- C)
- D)
- E) None of the above. **

- Which of the following is
__not__an important part of classical statistics?

- A) Summary measures of probability distributions called descriptive statistics.
- B) Probability distribution functions, which characterize all outcomes of a variable.
- C) The use of sampling distributions, which describe the uncertainty in making inference about the population on the basis of a sample.
- D) The concept of expected value.
- E) All of the above are important. **

- The standard normal probability table

- A) is equivalent to a t distribution if the sample size is less than 30.
- B) shows a normal distribution with standard deviation equal to zero.
- C) is used to make inference for all normally distributed random variables. **
- D) All the above.
- E) None of the above.

- The median and mode may be more accurate than the sample mean in forecasting the populations mean when

- A) The sample size is small.
- B) The sample size is large.
- C) The sample has one large outlier. **
- D) The population is assumed to be normally distributed.
- E) All the above.

- The arithmetic average of the occurrence of some random variable is also called the _____.

- A)
- B) **
- C)
- D) Standard deviation.
- E) None of the above.

- In finance, an investor who ignores risk is termed “risk neutral.” What descriptive statistic is our risk neutral investor ignoring when she generates stock portfolios?

- A)
- B)
- C)
- D) Standard deviation. **
- E) None of the above.

- In calculating the sample variance we subtract one from the sample size. This is because

- A) the population mean is unknown.
- B) of using the sample mean to estimate the population mean.
- C) the sum of deviations about the sample mean is zero.
- D) the sample mean is employed.
- E) All the above. **

- Which statistic is correctly interpreted as the “average” spread of data about the mean?

- A)
- B)
- C)
- D) Standard deviation. **
- E)

- Which measure of dispersion in a data set is the most intuitive and represents an average?

- A)
- B)
- C) Standard deviation. **
- D)
- E)

- Which of the following is
__not__an attribute of a normal probability distribution?

- A) It is symmetrical about the mean.
- B) Most observations cluster around the mean.
- C) Most observations cluster around zero. **
- D) The distribution is completely determined by the mean and variance.
- E) All the above are correct.

- Which of the following is
__not__a foundation of classical statistics?

- A) Summary measures of probability distribution called descriptive statistics.
- Probability distribution function which characterize all possible outcomes of a

random variable.

- C) The knowledge of thousands and thousands of normal probability tables required for statistical inference of normally distributed random variables. **
- D) The concept of expected value, which is the average value of a random variable taken over a large number of samples.

- A company claims that the rubber belts, which it manufactures, have a mean service life of at least 800 hours. A random sample of 36 belts from a very large shipment of the company’s belts shows a mean life of 760 hours and a standard deviation of 90 hours. Which of the following is the most appropriate on the basis of the sample results?

- A) The sample results do not warrant rejection of the company’s claim if the risk of a Type I error is specified at .05.
- B) The sample results do warrant rejection of the company’s claim if the risk of Type I error is specified at .05. **
- C) Since the sample mean falls below the company’s claim, the sample results indicate that the company claim is incorrect.
- D) The sample results are indeterminate since the magnitude of the sample standard deviation is greater than the difference between the company’s claimed figure and the sample mean.

- Based upon ten years of monthly data, the monthly rate of return of the DOW Jones 30 composite stock portfolio was normally distributed with mean .0084 and variance .0014. What is the probability, that in any given month, we observe a rate of return on the DOW above 10 percent?

- A) Less than one percent. **
- B) Two percent.
- C) Three percent.
- D) Not enough information is provided to answer the question.

- Suppose you observe the entire population of a random variable and you wish to test some hypothesis about the mean. To perform your hypothesis test you

- A) apply a sampling distribution to the problem.
- B) obtain sample estimates of population parameters.
- C) simply find the population mean and compare it to the hypothesized value. **
- D) Apply the t distribution.
- E) There is no answer to this question.

- If two large random samples are drawn from two populations, each having a mean of $100, the relevant sampling distribution of their difference has a mean of:

- A) $200.
- B) The sum of the two sample means.
- C) **
- D) The difference between the two sample means.

- The sampling distribution of the sample mean, when sampling from a normal population with mean m and variance s2is:

- A) normally distributed with mean m and variance s2.
- B) normally distributed with mean m and variance s2.
- C) normally distributed with mean m and variance s2/n. **
- D) normally distributed with mean 0 and variance 1.

- Type I error

- A) is said to arise when we reject a true null hypothesis.
- B) has a probability value equal to the significance level of any statistical test.
- C) is a measure of the uncertainty associated with rejecting any null hypothesis on the basis of sample data.
- D) both a) and b) are correct.
- E) A, B, and C are all correct. **

- The sampling distribution of the sample mean is

- A) normally distributed with mean m and variance s2.
- B) normally distributed with mean m and variance s2.
- C) distributed as a t distribution with variance 1.
- D) normally distributed with mean 0 and variance 1.
- E) None of the above. **

- Sampling distributions

- A) are the distributions of all possible values of a sample statistic based upon repeated sampling.
- B) are used to make inference when the population of a variable is unobservable.
- C) exhibit important properties for the ranking of alternative estimators such as unbiasedness and efficiency.
- D) All the above. **

- The null hypothesis is that there is no significant linear relationship between prices and sales. Type I error is:

- A) To conclude that there is a significant linear relationship when there is.
- B) To conclude that there is not a significant linear relationship when there is.
- C) To conclude that there is a significant linear relationship when there is not. **
- D) To conclude that the correlation coefficient is equal to zero.

- An unbiased model

- A) is one that does not consistently over-estimate or under-estimate the true value of a parameter. **
- B) is one that consistently produces estimates with the smallest RMSE.
- C) is one, which contains no independent variable; it depends solely on time-series pattern recognition.
- D) is one made up by a team of forecasters.

- Suppose that you mistakenly move the decimal point to the right one digit in data from a normal population with a mean of zero. What happens to the standard deviation?

- A) Data with mistake has standard error ten times the original. **
- B) Data with mistake has same standard error as the original.
- C) Data with the mistake has twice the standard error of the original.
- D) Data with the mistake has one hundred times the standard error of the original.
- E) None of the above.

- Which test statistic is appropriate for statistical inference about the population mean when both the population mean and variance are unknown?

- A) (X – m)/s.
- B) (X – m)/s.
- C) (X – m)/s.
- D) None of the above. **

- Which statement is
__incorrect__?

- A) Confidence intervals depend on sample size.
- B) The sample mean is the best estimator if sampling from a normal population.
- C) The sample mean is an unbiased estimator.
- D) Confidence intervals provide no more information than point estimates. **
- E) The sample variance is an unbiased estimator.

- Which of the following statements is
__incorrect__about the family of normal distributions?

- A) P[m ± 1s] = .68.
- B) P[m ± 2s] = .95.
- C) P[m ± 3s] = .99.
- D) None of the above. **

- The standard normal transformation

- A) involves subtracting the mean so as to center the transformed distribution on zero.
- B) involves division by the standard deviation so as standardize the transformed variance to one.
- C) is used to make inference about all normal distributions based upon the standard normal distribution.
- D) is the number of standard deviations by which a random variable differs from its mean.
- E) All the above. **

- A machine fills ketchup bottles. One of the requirements is that the mean content of the bottles should be 10 ounces. Management wishes to set up a decision rule to decide whether or not this is true based on a random sample of bottles. The risk of type I error is specified at .05. A sample of 100 bottles will be taken; it is believed that the standard deviation of fills is .3 ounces. If it is decided that Z = 2, the decision rule boundary values are:

- A) 60 and 10.40.
- B) 10 and 9.90.
- C) 94 and 10.06. **
- D) 40 and 10.60.
- E) None of the above.

- Last year’s midterm results showed a mean of 51 points and a variance of 46. An approximate confidence interval is closest to:

- A) 2 to 57.8.
- B) 4 to 64.6. **
- C) 5 to 97.
- D) None of the above.

- A difference between the population standard deviation of the random variable X, and the standard deviation of the sampling distribution of the sample mean is

- A) one is based upon the other.
- B) dependence on sample size.
- C) the possibility of sampling error.
- D) application to the t distribution.
- E) All the above. **

- Which probability distribution is appropriate for testing hypotheses concerning an unknown population mean when the sample variance is used to estimate the population variance?

- A) The normal distribution with mean m and variance s2.
- B) The normal distribution with mean 0 and variance 1.
- C) The standard normal distribution.
- D) The t distribution with n-1 degrees of freedom. **
- E) None of the above.

__Note__: The next two questions are a pair.

- A random sample of bolts is taken from inventory and their length is measured. The average length in the sample is 5.3 inches, with a standard deviation of .2 inches. The sample size was 50. The point estimate for the mean length of all bolts in inventory is:

- A) 3 inches. **
- B) .2 inches.
- C) 908 to 5.692 inches.
- D) 3 inches plus or minus .2

- In the previous question, a 95% confidence interval for the unknown population mean is closest to:

- A) 3 inches.
- B) 9 to 5.7 inches.
- C) 3 inches plus or minus .056. **
- D) 784 to 5.816 inches.
- E) None of the above.

- Which of the following statements about the probability of Type I and Type II error is
__not__correct?

- A) Type I error cannot occur if the null hypothesis is false.
- B) Type II error cannot occur if the null hypothesis is true.
- C) If the null hypothesis is true, the results of the test will either be a correct conclusion or a Type I error.
- D) It is not possible to specify both the probabilities of Type I and II errors since only one of them can occur. **

- A sample of 100 selected at random from a process with a mean of 500.52 and a standard deviation of 4.0. Estimate the probability that a sample of 100 would have a mean equal to or greater than 500.52 if the true population mean is really 500.0.

- A) about .4938.
- B) about .9032.
- C) about .0968. **
- D) about .4032.

- A random sample of employee files is drawn revealing an average of 2.8 overtime hours worked per week with a standard deviation of .7; the sample size is 500. The resulting 90% confidence interval is:

- A) 1 to 3.5.
- B) 6 to 3.5.
- C) 75 o 2.85. **
- D) 6 to 3.0.
- E) None of the above.

- A hypothesis test requires a two-tailed critical region if the

- A) alternative hypothesis is one of the true mean being above some number.
- B) alternative hypothesis is one of the true mean being below some number.
- C) alternative hypothesis is one of the true mean not being equal to some number. **
- D) null hypothesis involves an inequality.

- In statistical hypothesis testing, the approach is to see whether you find sufficient evidence to reject the null hypothesis. This implies

- A) the null is framed such that its rejection confirms a conjecture.
- B) we can set the probability of rejecting a true null.
- C) we can set the probability of Type I error to our satisfaction.
- D) we seek to reject the null to confirm a belief.
- E) All the above. **

- A medical researcher has just calculated a correlation coefficient of zero for two particular random variables. Which of the following statements is most accurate?

- A) There is no significant linear difference between the two variables.
- B) There is no significant relationship between the two variables.
- C) There is no significant linear relationship between the two variables. **
- D) There is a significant linear relationship between the two variables.

- The correlation coefficient is an extremely important descriptive statistic because

- A) It provides a unit-free measure of how two random variables move together.
- B) It provides a measure of the linear association between a pair of random variables.
- C) It provides the forecaster with a diagnostic tool of when regression analysis is appropriate for the business-forecasting problem.
- D) All the above. **

- A large sample of X-Y data values are analyzed and reveal a correlation coefficient of

-.88. Which statement is correct?

- A) If r had been +.88, the correlation would have been much stronger.
- B) The correlation is weak because r is less than -1.
- C) A fairly strong negative linear relationship exists. **
- D) A weak negative relationship exists.

- Which of the following is
__not__true regarding the Central Limit Theorem?

- A) For a sufficient large sample, the sampling distribution is approximately normal.
- B) The sampling distribution converges to the normal distribution as the sample size increases.
- C) Regardless of the population distribution from which the sample is drawn, if the sample size is sufficiently large, the normal curve can be used.
- D) Regardless of the sample size used, the normal curve can be used as the sampling distribution. **

- Suppose two random variables X and Y are related as follows: Y = 1/X2. The population Pearson correlation coefficient should be:

- A) +1.
- B) **
- C) -1.
- D) .5.
- E) None of the above.

- Which functions are
__not__appropriate for use of the Pearson correlation coefficient to estimate the correlation between a pair of random variables?

- A) Cubic polynomials.
- B) Quadratic polynomials.
- C) Higher-order polynomials.
- D) Functions involving a variable raised to the one-half power.
- E) Reciprocal functions.
- F) All the above. **

- The sampling distribution of the sample Pearson correlation coefficient

- A) has a mean of zero.
- B) has a mean of r. **
- C) is biased.
- D) has a mean of r.
- E) None of the above.

- If we were to know the true population correlation, confidence intervals for the population correlation can be constructed using the _____ distribution.

- A) t distribution.
- B) standard normal distribution. **
- C) chi-square distribution.
- D) F distribution.
- E) All the above.

- If the scatterplot of two variables has a circular pattern, this suggests the two variables have a population correlation coefficient of

- A) -1.
- B) -.5.
- C) **
- D) +.5.
- E) +1.

- Which of the following is
__not__used to calculate the sample Pearson correlation coefficient for the variables X and Y?

- A) Sample mean of X.
- B) Sample mean of Y.
- C) Sample covariance of X and Y.
- D) Sample standard deviation of X.
- E) All the above are used to calculate correlation coefficients. **

- Which of the following is
__not__a benefit of a scatter diagram?

- A) The nature of the X-Y relationship (linear of nonlinear) may be revealed.
- B) The strength of the relationship may be revealed.
- C) The sign of the correlation coefficient will be revealed.
- D) Displaying the population size. **

- In order to conduct a correlation analysis, the collected data must be:

- A) Related to the real world.
- B) **
- C) Consist of categories.
- D) Highly Correlated.
- E) All the above.

- Which of the following is
__not__a reason for testing the population correlation coefficient is zero? - A) To see if r and rho (r) are equal. **
- B) To make inference from sample to population.
- C) To bring sample size into the analysis.
- D) To determine if a significant X-Y relationship exists.
- E) All the above are correct.

- Suppose the sample Pearson correlation coefficient (r) is estimated to be .75 with a sample size of 35. The correct calculated value of the test statistic for a null of zero correlation is:
- A) 5. **
- B) 6.
- C) 1.
- D) 5
- E) None of the above.

- When testing the null hypothesis that the population correlation between a pair of variables is zero

- A) the normal sampling distribution is used.
- B) the chi-square distribution is used.
- C) the standard normal distribution is used.
- D) The t distribution is used for small samples. **

- For a collection of 15 X-Y data values, the sample correlation coefficient was estimated at -.63 from a sample of size 15. The calculated t value for a null of zero correlation is:

- A) 92.
- B) 92.
- C) -2.92. **
- D) -1.92.
- E) None of the above.

- When the correlation coefficient is negative, it means:

- A) there is a weak relationship.
- B) when X goes down, Y does too.
- C) X will not be a good predictor of Y.
- D) when X goes down, Y tends to go up. **
- E) None of the above.

- When forecasting with time-series data, it is highly recommended to test for the presence of a trend in the data. Testing for trend at the 10% level of significance

- A) can be accomplished by use of a standard 95% correlogram.
- B) requires use of the standard normal probability distribution.
- C) can be accomplished by comparing the estimated autocorrelation coefficient with the number 2 divided by the square root of sample size.
- D) requires use of the t distribution. **

- Quarterly time-series data with a trend can be applied to models that assume stationary data by

- A) Averaging the data over time.
- B) Taking the first difference of the original series. **
- C) Taking the fourth difference of the original series.
- D) Using a moving average.

- Which of the following is
__not__consistent with the presence of a trend in a time series?

- A) The autocorrelation function declines quickly to zero as the lag increases. **
- B) The autocorrelation function of the first-differences declines quickly to zero as the lag increases.
- C) The autocorrelation function declines slowly towards zero as the lag increases.
- D) The autocorrelation function of the first-differences quickly declines to zero.

- Serial correlation refers to the correlation between a variable and:

- A)
- B) another very similar variable.
- C) itself when lagged one or more periods. **
- D) another variable when the analysis is done on a computer.
- E) None of the above.

- Which of the following is appropriate for testing the null hypothesis of zero autocorrelation at lag k at the approximate 95 percent level?

- A) Reject null if |rk| > 1/n.
- B) Reject null if |rk| > 2/n.
- C) Reject null if |rk| < 1/n.
- D) Reject null if |rk| < 2/n.
- E) None of the above. **

- A time series whose 24-quarter lag correlogram shows no tendency to diminish towards zero can be said to

- A) have a trend term.
- B) be nonstationary.
- C) have a long memory.
- D) be serially correlated.
- E) All the above. **

- Which of the following null hypotheses is
__not__consistent with a test for data seasonality?

- A) H0: r4 = r8 = 0 for quarterly data.
- B) H0: r12 = r24 = 0 for monthly data.
- C) H0: r1 = r2 = 0 for annual data. **
- D) H0: r7 = 0 for weekly data.
- E) All the above.

**ESSAY/PROBLEM EXAM QUESTIONS**

- Volatility in exchange rates leads speculators to bet on market trends in foreign exchange markets. To examine this investment opportunity, data were obtained from 1980M1 through 1990M12 on the Japanese-yen US-dollar exchange rate, defined as number of yen per one US dollar.

The data were converted to monthly percentage changes by first-differencing the logarithms of the level. Treating our data as a population, we then calculated the monthly mean and variance for the percentage change in exchange rate. We found the percentage change in (yen/$) exchange rate is distributed normally with mean -.004386 and variance .00093585.

What is the probability that, in any given month, we observe a 5% or more increase in the exchange rate? Formally, find: P[%(yen/$) ≥ .05].

ANSWER: The appropriate Z value for %(yen/$) = .05 is:

Z = [.05 – (-.004386)]/.03059 = 1.78.

Using Table 2-4, the probability can be found as follows:

P[Z ≥ 1.778] = P[Z ≥ 0] – P[0 ≤ Z < 1.778] = .5 – .4625 = .0375.

Accordingly, there is about a 4% chance of a 5% or more change in the exchange rate in a given month.

- Prove that, when sampling from a normally distributed population, the sample mean is an unbiased estimator of the population mean.

ANSWER: Suppose we draw a random sample from a random variable X having a normal distribution with mean m and variance s^{2}. If the population is large we can assume that each drawing X_{i} is identically and independently normally distributed with mean m and variance s^{2}.

To show that the sample mean is an unbiased estimator, we need to find the expected value of the sample mean:

Accordingly, the sample mean is an unbiased estimator of the population mean when sampling from a large normal population. This implies that, when using the sample mean to make inference about the population mean, we are correct on average.

- Prove that the sum of deviations about the sample mean is zero.

ANSWER: The question asks you to show that for sample size n, that the sum of deviations about the sample mean is zero:

Using the definition of the sample mean, this can easily be shown:

- A random sample of twelve automobiles showed the following figures for miles achieved on a gallon of gas. Assume the population distribution is normal. From the data:

(a) Find the expected miles per gallon of gas.

ANSWER: Sample mean = 232.9/12 = 19.4 miles per gallon.

(b) Find the sample variance.

__ __

ANSWER: 13.2867/11 = 1.208 is the estimated sample variance.

Note that we divide sum of squared deviations by 11, not 12. This is because we are using the sample mean to calculate deviations.

(c) Find an approximate 95% confidence interval for the population mean.

__ __

ANSWER: The approximate 95% confidence interval is two standard deviations from the mean using the sampling distribution of the sample mean:

After substituting in the appropriate estimates:

we can solve for the confidence interval:

P[18.765 < m < 20.03] = .05.

- In 1985, the government bond yield in the United States was 10.62 percent. A random sample of government bond yields in nine foreign countries was:

11.04, 6.34, 10.94, 13.00, 7.34, 13.09, 4.78, 10.62, 6.87.

The mean foreign bond yield was 9.34 with variance 9.31. Assume that government bond yields are normally distributed.

At the 5 percent level of significance, test whether the government bond yields in the rest of the world during 1985 were lower than in the United States.

ANSWER: Following the four steps of hypothesis testing we have:

Step #1: __Hypotheses__: We formulate null so that its rejection verifies our assertion.

Step #2: __Test Statistic__: Inference about an unknown population mean when the variance is unknown requires use of the t distribution:

Step #3: __Critical Region__: Since we have a one-tailed alternative the critical region lies in the negative tail of the sampling distribution. In addition, since our sample size is 9 and the level of significance is 5%, we have P(t8 ≤ t8,.o5) = .05, hence t8,.o5 = -1.86.

Step #4: __Decision Rule__:

Inserting the values of sample mean and standard error we have:

Accordingly, we cannot reject the null and conclude that world rates of interest were **not** significantly lower than in the United States.

- WABC radio is the home of a popular morning talk show called “Imus in the Morning,” featuring talk show host Don Imus. WABC was concerned that many listeners found Don Imus too offensive. Accordingly, WABC conducted a market research survey in which a sample of listeners were asked their opinion of the show on a scale of 1 to 10, with 10 being the most favorable ranking. The mean response for a sample of 400 people was 7.25 with a sample standard deviation of 2.51. Using this same survey technique, the Howard Stern Show had a mean rating of 7.85.

Does Don Imus have statistically the *same* approval rating as Howard Stern? Test this assertion using a 5 percent level of significance.

ANSWER: Formulate the null as if Imus’s was equal to Stern’s:

H0: m = 7.85, and HA: m ≠ 7.85

Since both the population mean and variance were unknown, the appropriate critical region, applying the t distribution is with 399 degrees of freedom is:

The calculated value of the t-statistic is:

Since this lies in the critical region, we reject the null hypothesis and conclude that Imus’s approval rating is statistically *different* than Limbaugh’s. (Note that we would get the same result using a one-tailed alternative).

- Private housing starts are considered leading indicators of future economic activity. Using monthly data on private housing starts over the period 1959M1-1997M4, the estimated correlograms for a 24-month lag structure are reported below.

Plot of Autocorrelation (+) and Partial Autocorrelation (*)

Lag -1.0 0.0 1.0

|——————————-+——————————-|

1 | | [**************************+ |

2 | **************] | + |

3 | *****] | + |

4 | | [**|+ |

5 | | +***** |

6 | ***+] | |

7 | | +[**| |

8 | | [+***** |

9 | | [*****+********* |

10 | | [******** + |

11 | | [****** + |

12 | |**] | + |

13 | ****************] | + |

14 | *******] | + |

15 | ****] + |

16 | +|**] | |

17 | + | [**** |

18 | + |**] | |

19 | + | [ | |

20 | + | [ | |

21 | |+ [*** |

22 | | [**| + |

23 | | [* | + |

24 | | *] | + |

Using the approximate 95% rule, do private housing starts data exhibit seasonal variation? Explain your inferences clearly and note what hypotheses you are testing in your answer.

ANSWER: Seasonality refers to variation in the level of a series that occurs at the same time each year. Accordingly, seasons should be correlated over time.

To test for seasonality we can use estimated autocorrelation coefficients under the following null hypothesis for monthly data:

H0: rk = 0 for k of 12 and 24.

Our data series on private housing starts has 460 observations. Accordingly, using the approximate 95% rule, the absolute critical value of r_{k} is:

Since the estimated values of r12 and r24 are above the critical value of .0932, we reject the null of zero autocorrelation at lags of 12- and 24-months. Accordingly, there is sufficient evidence to conclude the data is seasonal. Indeed, as shown in the correlogram, the seasonal patterns are quite significant.

- Gross Domestic Product (GDP) is a measure of the current dollar value of all goods and services produced in the United States. It is essentially a base measure of national income and is at the center of most debates about the well being of the U.S. economy. Accordingly, the forecasting of GDP is done by many agencies — Office of Management and Budget, Federal Reserve Board, Department of Labor, Department of Commerce — to name a few.

A necessary first-step in preliminary model selection is to examine whether the data is stationary.

To examine this statistically, quarterly seasonally adjusted GDP data were used to obtain the following correlogram were estimated using a two-year lag.

Plot of Autocorrelation (+) and Partial Autocorrelation (*)

Lag -1.0 0.0 1.0

|——————————-+——————————-|

1 | | [******************************+*

2 | ************] | +|

3 | |****] | + |

4 | |****] | + |

5 | | [ | + |

6 | | [****| + |

7 | | **] | + |

8 | |****] | + |

- a) Is GDP stationary? Explain using the approximate 95% rule.

ANSWER: Stationarity refers to the absence of any long-term trend in the level of a time series. Stationarity can be tested by examination of the autocorrelation coefficients over time. Ability to reject the null:

H0: rk = 0 for k within 1-2 years,

implies that the time series is nonstationary.

Our data series on GDP has 205 observations. Accordingly, using the approximate 95%, the absolute critical value of r_{k} is:

Since all of the estimated coefficients exceed this critical value, we can reject each individual null of zero autocorrelation, and conclude that the data are nonstationary.

- b) As a further examination of the behavior of GDP we next first-differenced the data to remove any trend. The resulting series is DGDP and a 2-year autocorrelation correlogram is reported below.

Plot of Autocorrelation (+) and Partial Autocorrelation (*)

Lag -1.0 0.0 1.0

|——————————-+——————————-|

1 | | [***********************+* |

2 | | [********** + |

3 | | [******** + |

4 | | [****| + |

5 | | [ | + |

6 | | [***** + |

7 | | [****** + |

8 | | ***] | + |

Are the first differences of GDP (DGDP) stationary? Explain.

ANSWER: The first-differences of GDP (DGDP) are not stationary as shown by the ability to reject the null:

H0: rk = 0 for k = 1, 2, .., 8.

Accordingly, the first-differenced series still has trend. This is consistent with a nonlinear trend in the levels and may require further differencing.

Specifically, a correlogram and autocorrelation function for differences of the first-differenced series is reported below.

Plot of Autocorrelation (+) and Partial Autocorrelation (*)

Lag -1.0 0.0 1.0

|——————————-+——————————-|

1 | +***********] | |

2 | *******+**] | |

3 | |****] + | |

4 | | [ + | |

5 | *+****] | |

6 | *****+] | |

7 | | [** | + |

8 | *+********] | |

As shown in the correlogram, this series is now stationary. Students should note that, while most economic data is first-difference stationary, a few series will require second differencing to remove any trend in the data.

- Stationarity refers to whether or not time-series data has an upward or downward trend over time. To examine this issue with regard to stock prices, monthly data were obtained for the period 1947M1 through 1997M4 on the Standard and Poor’s 500 composite. Using a 36-month lag, the following correlogram was estimated.

Plot of Autocorrelation (+) and Partial Autocorrelation (*)

Lag -1.0 0.0 1.0

|——————————-+——————————-|

1 | | [*******************************+

2 | ********] | +|

3 | | [**| +|

4 | | [ | + |

5 | | [*** + |

6 | | [* | + |

7 | | [* | + |

8 | | [*** + |

9 | | *] | + |

10 | |**] | + |

11 | | [* | + |

12 | ****] | + |

13 | | [*** + |

14 | | *] | + |

15 | | [*** + |

16 | | [**| + |

17 | | [* | + |

18 | | [* | + |

19 | |**] | + |

20 | | [*** + |

21 | | [ | + |

22 | | [ | + |

23 | | [*** + |

24 | | [**| + |

25 | |**] | + |

26 | | [ | + |

27 | | [* | + |

28 | | [ | + |

29 | |**] | + |

30 | | [* | + |

31 | | [* | + |

32 | | [*** + |

33 | | [ | + |

34 | |**] | + |

35 | | [* | + |

36 | | [ | + |

Using the correlogram, are stock prices stationary? Explain.

ANSWER: Since we can reject all of the autocorrelations being zero for lags up to three years, stock prices are clearly nonstationary. Specifically, we can reject the following null with level of significance 5% using the correlogram:

H0: rk = 0 for k = 1 to 36.

This is why financial analysts think in terms of rates of return, which allows a comparison of stock price changes over time and across stocks.

- Explain why a 95% confidence interval for a population parameter is wider than a 90% confidence interval for that parameter based on the same information.

ANSWER: This is a general result. Based on the same information, the greater the probability content the wider will be the confidence interval for any population parameter. This is to be expected; the surer we want to be that a computed interval will contain the parameter, the wider the interval that will be required.

- Daimler-Chrysler Motor Company is testing a new engine for miles per gallon (MPG). Based upon testing under normal conditions for 100,000 miles, the following sample mileages were obtained: 30.7, 31.8, 30.2, 32.0, and 31.3.

- Find a point estimate for MPG.

ANSWER: The point estimate is simply the sample mean of 31.2.

- What is the sample variance of MPG?

ANSWER: Summing the squared deviations about the sample mean and dividing by sample size minus one gives us .565 as our estimate of the standard error of MPG.

- c) Suppose Chrysler wanted to advertise the new engine as obtaining at least 30 miles-per-gallon under normal driving conditions. Can they proceed with the advertising campaign at the 95% level of confidence?

ANSWER: Perhaps the easiest way to examine this is using a 95% confidence interval. The formula for an exact 95% confidence interval is:

Given our extremely small sample size of 5, the appropriate t-value for = .05 is 2.776.

Plugging in the values of the sample mean and standard deviation, our confidence interval becomes:

Accordingly, our 95% confidence interval is 30.3 to 32.1 MPG. Since the lower bound for this interval is above 30 MPG, we can be 95% confident that the true mean MPG exceeds 30 MPG.

Go ahead with the advertising campaign.

**Chapter 9**

**Multiple Choice**

*Identify the choice that best completes the statement or answers the question.*

Decile-Wise

A data mining routine has been applied to a transaction dataset and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

____ 1. Consider the decile-wise lift chart above. Interpret the meaning of the first and second bars from the left.

a. | The first variable in the model is more predictive than the second variable. |

b. | These bars are never interpreted for the validation dataset; they are only interpreted for the training dataset. |

c. | Since only two bars rise above unity little explanatory power is exhibited by the model. |

d. | The first two bars show that this model outperforms a random assignment. |

____ 2. Consider the decile-wise lift chart above. An analyst comments that you could improve the accuracy of the model by classifying everything as nonfraudulent. What will the error rate be if you follow her advice?

a. | The error rate will increase. |

b. | The error rate will decrease. |

c. | The change in the error rate cannot be determined. |

d. | The error rate will arbitrarily change. |

____ 3. Which of the following situations represents the confusion matrix for the transactions data mentioned above?

a. | A |

b. | B |

c. | C |

d. | D |

____ 4. What is the classification error rate for the following confusion matrix?

a. | 2.2% |

b. | 0.82% |

c. | 10% |

d. | 0.21% |

e. | Impossible to determine from information given. |

____ 5. Consider the Toyota Corolla data below:

Which variable is a dummy variable?

a. | Fuel_Type |

b. | Color_Black |

c. | KM |

d. | HP |

____ 6. Which of the variables below (from the Toyota Corolla dataset) is a categorical variable?

a. | Fuel_Type |

b. | Color_Black |

c. | KM |

d. | HP |

Flight Delays Data (Naive Bayes Model)

N.B.

**Success = 1 = Delayed**

** Failure = 0 = Ontime**

____ 7. Using the Flight Delays data above that was computed using a Naive Bayes Model, calculate the ontime probability for the following flight:

Carrier = DL

Day of Week = 7

Departure Time = 1000 – 1059

Destination = LGA

Origin = DCA

Weather = 0

a. | 87% |

b. | 92% |

c. | 95% |

d. | 97% |

e. | 99% |

____ 8. Consider the following confusion matrix.

How much better did this data mining technique do as compared to a naive model?

a. | no better than a naive model. |

b. | 1.2% better than a naive model. |

c. | 5.6% better than a naive model. |

d. | 7.8% better than a naive model. |

e. | 10.1% better than a naive model. |

____ 9. “Bayesian Probability” as used in the Naive Bayes Model

a. | uses naive probabilities to estimate class probabilities. |

b. | uses only a single classifying variable to estimate the class probabilities. |

c. | uses simple probabilities instead of conditional probabilities. |

d. | uses derived probabilities to obtain class probabilities. |

____ 10. “Overfitting” refers to

a. | estimating a model that explains the data points perfectly and leaves no error but that is unlikely to be accurate in prediction. |

b. | using too many independent variables or classifiers in a model. |

c. | the process used to test data mining models for accuracy. |

d. | the estimation or scoring of new data. |

____ 11. How does a “k-nearest neighbor” model work?

a. | It uses conditional probabilities to estimate the prior probability of interest. |

b. | It uses geometric distances from observations in the data to select a class for an unknown. |

c. | It uses a dichotomous dependent variable estimated with any type of independent variable. |

d. | It is based upon the concept of algorithmic minimization. |

____ 12. A “training data set” is

a. | used to compare models and pick the best one. |

b. | used to build various models of interest. |

c. | used to assess the performance of the chosen model with new data. |

____ 13. A “validation data set” is

a. | used to compare models and pick the best one. |

b. | used to build various models of interest. |

c. | used to assess the performance of the chosen model with new data. |

Logistic Regression

The following diagram is a **Logistics Regression** coefficient table for the UniversalBank data. The “Y” variable is the dichotomous variable is ** Loan Offer** (success =1). The multiple R

^{2}for this Logistics Regression is reported as 0.6544.

____ 14. For the Logistics Regression Model above, the positive coefficients for dummy variables ** CD Account**,

**, and**

*EducGrad*

*EducProf*a. | are associated with higher probabilities of accepting the loan offer. |

b. | are insignificant because of their p-values and therefore irrelevant. |

c. | have Odds that are too high to be considered relevant. |

d. | are proved to be causally related to the loan offer variable. |

____ 15. Consider the Logistic Regression Model above for the UniversalBank data. The coefficient on the continuous variable ** Income** means that

a. | Income is causally related to the loan offer variable. |

b. | Income is irrelevant because of its p-value. |

c. | higher values of Income are associated with greater probability of accepting the loan offer. |

d. | Income is likely not associated with the loan offer variable. |

____ 16. For the Logistic Regression above using the UniversalBank data, the R^{2} reported by XLMiner™ was 0.6544. The lift chart was given as:

a. | Neither the lift chart nor the R^{2} indicate a high degree of confidence in the model. |

b. | Both the lift chart and the R^{2} indicate a high degree of confidence in the model. |

c. | The lift chart indicates high confidence in the model but the R^{2} is at odds with this conclusion. |

d. | Because only a single bar of the decile-wise lift chart is above 1, there is little confidence in the model. |

____ 17. Consider the Logistics Regression Model above for the UniversalBank data. Which variable or variables appear to be insignificant?

a. | Only Age. |

b. | Age and Experience. |

c. | Income, CD Account, EducGrad, and EducProf. |

d. | All variables with “odds” less than zero. |

____ 18. Consider the Logistic Regression Model above for the UniversalBank data.

a. | Strong collinearity can lead to problems with the model. |

b. | Strong correlation among the independent variables is not a difficulty when using Logit. |

c. | The Logit Model automatically adjusts for collinearity. |

d. | None of the above are correct. |

RidingLawnmower Problem

____ 19. Consider the RidingLawnmower data above and the K-Nearest Neighbor Model results shown.

a. | The optimal value of k was 8 because there was an almost even split between owners and non-owners. |

b. | The optimal value of k should always be less than the number of independent variables. |

c. | The optimal value of k is the number of “neighbors” the model has chosen to poll when selecting a category choice. |

d. | The optimal value of k is irrelevant since we most often let k=1. |

____ 20. Examine the RidingLawnmower data above. Consider a new household with $60,000 income and lot size 20,000 ft. Using k=1, would you classify this individual as an owner or non-owner?

a. | Owner |

b. | Non-owner |

c. | Impossible to tell |

____ 21. Consider the RidingLawnmower data above. Consider a new household with $60,000 income and lot size 20,000 ft. Using k=3, would you classify this individual as an owner or non-owner?

a. | Owner |

b. | Non-owner |

c. | Impossible to tell |

____ 22. Consider the RidingLawnmower data above. Why would the model choose a higher value of k than *k*=1?

a. | The model will rarely choose higher values of k unless there is collinearity in the independent variables. |

b. | The model only chooses higher values of k the dataset is large. |

c. | The choice of k is made by the researcher alone and not the software. |

d. | Higher values of k provide smoothing that reduces the risk of overfitting due to noise in the training data. |

____ 23.

The diagram above represents which data mining technique?

a. | K-nearest-neighbor |

b. | Regression tree |

c. | Naive Bayes |

d. | Logit |

____ 24.

The above diagram represents what data mining classification scheme?

a. | K-nearest-neighbor |

b. | Regression tree |

c. | Naive Bayes |

d. | Logit |

____ 25.

The information above was provided for an email that was classified as spam. What data mining technique was probably used to make the classification?

a. | K-nearest-neighbor |

b. | Regression tree |

c. | Naive Bayes |

d. | Logit |

____ 26.

Which data mining technique (represented above) uses a quadratic classifier?

a. | K-nearest-neighbor |

b. | Regression tree |

c. | Naive Bayes |

d. | Logit |

____ 27.

What data mining technique is represented in the diagram of a classification scheme above?

a. | K-nearest-neighbor |

b. | Regression tree |

c. | Naive Bayes |

d. | Logit |

____ 28.

The misclassification rate in the confusion matrix above is

a. | 0 percent. |

b. | 10 percent. |

c. | 9 percent. |

d. | 19 percent. |

e. | None of the above are correct. |

____ 29.

The Universal Bank data represented above has been partitioned with what percentages?

a. | 50%, 30%, 20% in training, validation, and test sets |

b. | 60%, 40% in training and validation sets |

c. | 60%, 20%, 20% in training, validation, and test sets |

d. | 50%, 20%, 30% in training, validation, and test sets |

e. | None of the above are correct. |

____ 30. In data mining the model should be applied to a data set that was not used in the estimation process in order to find out the accuracy on unseen data; that “unseen” data set is called

a. | the training data set. |

b. | the validation data set. |

c. | the test data set. |

d. | the holdout data set. |

e. | None of the above are correct. |

____ 31. In data mining the term “binning” refers to

a. | a Naive Bayes classification system. |

b. | ranking the data. |

c. | transforming data into a categorical variable. |

d. | grouping data into classes. |

e. | None of the above is correct. |

____ 32. In the K-Nearest-Neighbor technique in data mining, the “K” refers to

a. | the originator of the technique, Jonathan Knowlton. |

b. | the number of classifiers used. |

c. | the number of classes into which the variable may be divided. |

d. | the weight of the dependent variable. |

e. | None of the above is correct. |

____ 33.

The data mining technique represented above is probably

a. | a k-nearest-neighbor model. |

b. | a naive Bayes model. |

c. | a regression tree. |

d. | a logistic regression. |

____ 34.

In setting up this k-nearest-neighbor model

a. | the user is allowing XLMiner™ to select the optimal value of k. |

b. | the optimal k is set by the user at 10. |

c. | the data is normalized in order to take into account the categorical variables. |

d. | it is necessary to set an optimal value for k. |

____ 35.

In the k-nearest-neighbor model represented above what is the error rate represented?

a. | about 3 percent. |

b. | about 5 percent. |

c. | about 7 percent. |

d. | more than 10 percent. |

____ 36.

The lift chart above shows that the data mining classification model

a. | is working well in classifying unseen data. |

b. | is working well in classifying training data. |

c. | is working quite poorly. |

d. | is doing no better at classifying than a naive model. |

____ 37. The diagram below depicts the probability that a person takes out a loan given their level of income. The function shown is

a. | an ordinary least squares model (OLS). |

b. | a linear probability model (LPM). |

c. | the odds function. |

d. | a logit. |

____ 38. Consider the equation below.

This equation is the basis of

a. | the logit model. |

b. | the naive Bayes Model. |

c. | the k-nearest neighbor model. |

d. | classification tree models. |

____ 39. “Pruning” is used in what data mining model?

a. | Naive Bayes |

b. | Logit |

c. | K-Nearest Neighbor |

d. | Regression Trees |

____ 40. “Pruning” is used

a. | to overcome correlation among the independent variables. |

b. | only when the independent variables are dichotomous. |

c. | to prevent the model from overfitting the data. |

d. | as a “data utility” in order to create a validation set. |

____ 41. With most data mining techniques we “partition” the data

a. | into “success” and “failure” results in order to create a dependent variable that is a dummy variable. |

b. | only when we require a confusion matrix to be created. |

c. | after estimating the appropriate technique. |

d. | in order to judge how our model will do when we apply it to new data. |

____ 42. “Entropy” measures are used in which data mining technique?

a. | Logit |

b. | Classification Trees |

c. | Naive Bayes |

d. | K-Nearest Neighbor |

e. | Neural Networks |

____ 43. “Information Gain” and “Entropy”

a. | are used in Classification Trees to determine when to stop the algorithm. |

b. | are two components of Bayes Theorem. |

c. | are related ways of categorizing risk. |

d. | are unrelated. |

____ 44.

If I choose to classify Insects as either Katydids or Grasshoppers by examining the distribution of the lengths of the antennas of a sample of the two insects (as shown below), this would be the beginning analysis of what data mining tool?

a. | Naive Bayes |

b. | Logit |

c. | Regression Tree |

d. | K-Nearest Neighbor |

____ 45. What data mining technique is being depicted below?

a. | K-Nearest Neighbor |

b. | Naive Bayes |

c. | Decision Tree |

d. | Logit |

e. | Neural Net |

____ 46. Consider the following Lift Chart. Cumulative percentage of hits is the Y-axis variable. Percent of the entire list is the X-axis variable.

What is the “Lift” at 5%?

a. | exactly 4 |

b. | about 5 |

c. | exactly 20 |

d. | about 25 |

e. | unable to determine from information given. |

____ 47. Consider the printout below:

What is the “Misclassification Rate?”

a. | 0 |

b. | 3 |

c. | 50 |

d. | 30 |

e. | It is not shown in this printout. |

____ 48.

Examine the Naive Bayes output above that describes the Titanic survival model.

What is the probability of survival if you are a crew member, male, and adult?

a. | 0.613324957 |

b. | 0.001352846 |

c. | 0.442673445 |

d. | 0.046373782 |

____ 49. The “logit” is

a. | a linear function with a Z distribution. |

b. | can be an attribute in a logistics regression. |

c. | the natural log of an odds ratio. |

d. | the conditional probability that the success rate is greater than the cutoff value. |

____ 50.

The diagram above represents

a. | the locus of all points that could cause the success rate to be above 50 percent. |

b. | a logistics regression output from XLMiner. |

c. | the Naive Bayes classifier as being between zero and one. |

d. | a graph of the possible values of the logit in a logistics regression. |

____ 51. In logistics regression data mining, **P/(1-P)** represents

a. | the logit. |

b. | the log likelihood of success. |

c. | the odds of success. |

d. | the cutoff value. |

____ 52.

The regression line shown above was estimated using an ordinary least squares regression technique. This regression is inappropriate to use on this data because

a. | the attribute measured here is dichotomous. |

b. | there is no apparent relationship between hours of study and outcome. |

c. | there is only a single attribute in the model. |

d. | the target variable is categorical. |

____ 53. Among the advantages to using the Naive Bayes model is

a. | it is quite sensitive to irrelevant features. |

b. | it is fast at classification. |

c. | it can be used in situations in which the target variable is continuous. |

d. | All of the above are advantages. |

____ 54. Naive Bayes is called “Naive” because

a. | very few attributes are needed to obtain accurate classifications. |

b. | the model assumes that only continuous variables can be used as attributes. |

c. | it tends to be used only as a “baseline” model in order to measure the effectiveness of other data mining techniques. |

d. | the attributes are assumed to be independent o one another. |

____ 55. In a Naive Bayes model it is necessary

a. | that all attributes be categorical. |

b. | to partition the data into three parts (training, validation, and scoring). |

c. | to set cutoff values to less than 0.75. |

d. | to have a continuous target variable. |

____ 56. Naive Bayes models

a. | use a linear classifier. |

b. | use a nonlinear classifier. |

c. | use a waveform classifier. |

d. | use a logit as a classifier. |

____ 57. Which classification technique that we covered assumed that the attributes had independent distributions?

a. | k-Nearest Neighbor |

b. | Classification trees |

c. | Naive Bayes |

d. | Logistics Regression |

____ 58. Our confidence that X is an apple given that we have seen X is red and round

a. | is a coincident probability. |

b. | could lead us to misclassify similar objects. |

c. | is a prior probability. |

d. | is a posterior probability. |

____ 59.

What data mining technique is demonstrated here?

a. | k-Nearest Neighbor |

b. | Classification Tree |

c. | Naive Bayes |

d. | Logistic Regression |

____ 60.

The table above is part of the output from a data mining algorithm seeking to predict whether an individual will take out a personal loan given a set of attributes. What data mining technique is probably being used here?

a. | k-Nearest Neighbor |

b. | Classification Tree |

c. | Naive Bayes |

d. | Regression Tree |

____ 61.

The above is a prune log for a data mining technique. What technique would have this type of output?

a. | k-Nearest Neighbor |

b. | Classification Tree |

c. | Naive Bayes |

d. | Logistic Regression |

____ 62.

The above table is a decile wise lift chart. The first bar on the left indicates

a. | that our attribute or attributes did little to explain predicted success in this model. |

b. | that the lift will not vary with the number of cases we consider. |

c. | that taking the 10% of the records that are ranked by the model as the most probable 1’s” yields about as much as a naive model. |

d. | that taking the 10% of the records that are ranked by the model as the most probable 1’s” yields twice as many 1’s as would a random selection of 10% of the records. |

____ 63.

Which attribute above provides the greatest reduction in entropy?

a. | Hair Length |

b. | Weight |

c. | Age |

d. | The above are not reasonable entropy measures; rather they show information gain. |

____ 64. When a collection of objects is completely uniform,

a. | entropy is at a maximum. |

b. | entropy is at a minimum. |

c. | entropy would be about .5. |

d. | Uniformity has nothing to do with entropy. |

____ 65. Suppose that a data mining routine has an adjustable cutoff (threshold) mechanism by which you can alter the proportion of records classified as owner. Three cases are described below.

Describe how moving the cutoff up or down from a starting point of 0.5 affects the misclassification error rate.

a. | The misclassification error rate dropped as the threshold dropped. |

b. | The misclassification error rate dropped as the threshold increased. |

c. | The misclassification error rate remained unchanged as the threshold changed. |

d. | The misclassification error rate changed when the threshold either increased or decreased. |

## Reviews

There are no reviews yet.