Introduction to Linear Regression Analysis. Douglas C. Montgomery

Читать онлайн.
Название Introduction to Linear Regression Analysis
Автор произведения Douglas C. Montgomery
Жанр Математика
Серия
Издательство Математика
Год выпуска 0
isbn 9781119578758



Скачать книгу

is

      Example 2.7 The Rocket Propellant Data

      We find a 95% prediction interval on a future value of propellant shear strength in a motor made from a batch of sustainer propellant that is 10 weeks old. Using (2.45), we find that the prediction interval is

ueqn34-2

      which simplifies to

ueqn34-3

      Therefore, a new motor made from a batch of 10-week-old sustainer propellant could reasonably be expected to have a propellant shear strength between 2048.32 and 2464.32 psi.

image

      We may generalize (2.45) somewhat to find a 100(1 − α) percent prediction interval on the mean of m future observations on the response at x = x0. Let in35-1 be the mean of m future observations at x = x0. A point estimator of in35-2 is in35-3. The 100(1 − α)% prediction interval on in35-4 is

      (2.46) image

      The quantity

ueqn36-1

      that is, 90.18% of the variability in strength is accounted for by the regression model.

      The statistic R2 should be used with caution, since it is always possible to make R2 large by adding enough terms to the model. For example, if there are no repeat points (more than one y value at the same x value), a polynomial of degree n − 1 will give a “perfect” fit (R2 = 1) to n data points. When there are repeat points, R2 can never be exactly equal to 1 because the model cannot explain the variability related to “pure” error.

      Although R2 cannot decrease if we add a regressor variable to the model, this does not necessarily mean the new model is superior to the old one. Unless the error sum of squares in the new model is reduced by an amount equal to the original error mean square, the new model will have a larger error mean square than the old one because of the loss of one degree of freedom for error. Thus, the new model will actually be worse than the old one.

      The magnitude of R2 also depends on the range of variability in the regressor variable. Generally R2 will increase as the spread of the x’s increases and decrease as the spread of the x’s decreases provided the assumed model form is correct. By the delta method (also see Hahn 1973), one can show that the expected value of R2 from a straight-line regression is approximately

ueqn36-2

      Clearly the expected value of R2 will increase (decrease) as Sxx (a measure of the spread of the x’s) increases (decreases). Thus, a large value of R2 may result simply because x has been varied over an unrealistically large range. On the other hand, R2 may be small because the range of x was too small to allow its relationship with y to be detected.

      There are several other misconceptions about R2. In general, R2 does not measure the magnitude of the slope of the regression line. A large value of R2 does not imply a steep slope. Furthermore, R2 does not measure the appropriateness of the linear model, for R2 will often be large even though y and x are nonlinearly related. For example, R2 for the regression equation in Figure 2.3b will be relatively large even though the linear approximation is poor. Remember that although R2 is large, this does not necessarily imply that the regression model will be an accurate predictor.

      A hospital is implementing a program to improve service quality and productivity. As part of this program the hospital management is attempting to measure and evaluate patient satisfaction. Table B.17 contains some of the data that have been collected on a random sample of 25 recently discharged patients. The response variable is satisfaction, a subjective response measure on an increasing scale. The potential regressor variables are patient age, severity (an index measuring the severity of the patient’s