Computational Statistics in Data Science. Группа авторов

Читать онлайн.
Название Computational Statistics in Data Science
Автор произведения Группа авторов
Жанр Математика
Серия
Издательство Математика
Год выпуска 0
isbn 9781119561088



Скачать книгу

alt="z left-parenthesis alpha right-parenthesis"/> be a quantile of a standard normal distribution possibly chosen to correct for simultaneous inference. Recall that theta equals left-parenthesis theta 1 comma ellipsis comma theta Subscript p Baseline right-parenthesis, let ModifyingAbove theta With Ì‚ Subscript h Sub Subscript i denote the ith component of ModifyingAbove theta With Ì‚ Subscript h. Further, let upper A Subscript i i comma n denote the ith diagonal of upper A Subscript n. Then

upper C Subscript alpha Superscript upper R Baseline left-parenthesis ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis equals product Underscript i equals 1 Overscript p Endscripts StartSet theta Subscript i Baseline colon ModifyingAbove theta With Ì‚ Subscript h Sub Subscript i Subscript Baseline minus z left-parenthesis alpha right-parenthesis StartFraction upper A Subscript i i comma n Baseline Over StartRoot n EndRoot EndFraction less-than theta Subscript i Baseline less-than ModifyingAbove theta With Ì‚ Subscript h Sub Subscript i Subscript Baseline plus z left-parenthesis alpha right-parenthesis StartFraction upper A Subscript i i comma n Baseline Over StartRoot n EndRoot EndFraction EndSet

      The volume of this hyperrectangular confidence region is

      As more samples are obtained, upper V Subscript n Superscript upper E and upper V Subscript n Superscript upper R converge to 0 so that the variability in the estimator ModifyingAbove theta With Ì‚ Subscript h disappears. Sequential stopping rules in Section 5 will utilize this feature to terminate simulation.

      To construct confidence regions, the asymptotic variance requires estimation. For IID sampling, normal upper Lamda is estimated by the sample covariance matrix, as discussed in Section 2.3. For MCMC sampling, a rich literature of estimators of upper Sigma is available including spectral variance [14, 15], regeneration‐based [16, 17], and initial sequence estimators [5]18–20]. Considering the size of modern simulation output, we recommend the computationally efficient batch means estimators.

      The multivariate batch means estimator considers nonoverlapping batches and constructs a sample covariance matrix from the sample mean vectors of each batch. More formally, let n equals a b, where a is the number of batches, and b is the batch sizes. For k equals 0 comma ellipsis comma a minus 1, define upper Y overbar Subscript k Baseline equals b Superscript negative 1 Baseline sigma-summation Underscript t equals 1 Overscript b Endscripts h left-parenthesis upper X Subscript k b plus t Baseline right-parenthesis. The batch means estimator of upper Sigma is

ModifyingAbove upper Sigma With Ì‚ Subscript b Baseline equals StartFraction b Over a minus 1 EndFraction sigma-summation Underscript k equals 0 Overscript a minus 1 Endscripts left-parenthesis upper Y overbar Subscript k Baseline minus ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis left-parenthesis upper Y overbar Subscript k Baseline minus ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis Superscript upper T

      Univariate and multivariate batch means estimators have been studied in MCMC and operations research literature [21–26]. Although the batch means estimator has desirable asymptotic properties, it suffers from underestimation in finite samples, particularly for slowly mixing Markov chains. Specifically, let

normal upper Gamma equals minus sigma-summation Underscript k equals negative infinity Overscript infinity Endscripts StartAbsoluteValue k EndAbsoluteValue Cov Subscript upper F Baseline left-parenthesis upper X 1 comma upper X Subscript 1 plus k Baseline right-parenthesis

      Then, Vats and Flegal [27] show (ignoring smaller order terms)

normal upper E left-bracket ModifyingAbove upper Sigma With Ì‚ Subscript b Baseline right-bracket equals upper Sigma plus StartFraction normal upper Gamma Over b EndFraction

      When the autocorrelation in the Markov chain is large, or b is small, there is significant underestimation in upper Sigma. To combat this issue, Vats and Flegal [27] propose lugsail batch means estimators formed by a linear combination of two batch means estimators with different batch sizes. For r greater-than-or-equal-to 1 and 0 less-than-or-equal-to c less-than 1, the lugsail batch means estimator is

normal upper E left-bracket ModifyingAbove upper Sigma With Ì‚ Subscript upper L Baseline right-bracket equals upper Sigma plus left-parenthesis StartFraction 1 minus r c Over 1 minus c EndFraction right-parenthesis StartFraction normal upper Gamma Over b EndFraction