Computational Statistics in Data Science. Группа авторов

Читать онлайн.
Название Computational Statistics in Data Science
Автор произведения Группа авторов
Жанр Математика
Серия
Издательство Математика
Год выпуска 0
isbn 9781119561088



Скачать книгу

greater-than-or-equal-to n 0 colon less-than-or-equal-to less-than-or-equal-to plus plus plus Vn slash slash 1 p times times times epsilon double-vertical-bar double-vertical-bar theta Ì‚ ha of II left-parenthesis right-parenthesis less-than less-than nn asterisk n minus minus 1 times times epsilon double-vertical-bar double-vertical-bar theta Ì‚ ha"/>

      This termination rule essentially controls the coefficient of variation for ModifyingAbove theta With Ì‚ Subscript h. An advantage here is that problem‐free choices of epsilon can be used since problems where double-vertical-bar theta Subscript h Baseline double-vertical-bar Subscript a is small will automatically require smaller cutoff. A clear disadvantage is that this rule is ineffective when theta Subscript h Baseline equals 0.

      5.2 MCMC

      Although both upper T Subscript a Baseline left-parenthesis epsilon right-parenthesis and upper T Subscript m Baseline left-parenthesis epsilon right-parenthesis may be used in MCMC, a third alternative arises due to the correlation in the Markov chain. A relative‐standard deviation sequential stopping rule terminates the simulation when the Monte Carlo variability (as measured by the volume of the confidence region) is small compared to the underlying variability inherent to the problem left-parenthesis normal upper Lamda right-parenthesis. That is,

upper T Subscript s Baseline left-parenthesis epsilon right-parenthesis equals inf left-brace right-brace colon greater-than-or-equal-to greater-than-or-equal-to n 0 colon less-than-or-equal-to less-than-or-equal-to plus plus plus Vn slash slash 1 p times times times epsilon vertical-bar vertical-bar upper Lamda Ì‚ n slash slash 1 times times 2 p of II left-parenthesis right-parenthesis less-than less-than nn asterisk n minus minus 1 times times epsilon vertical-bar vertical-bar upper Lamda Ì‚ n slash slash 1 times times 2 p

      If this rule is used for IID Monte Carlo, then upper A Subscript n in Equation (2) is ModifyingAbove normal upper Lamda With Ì‚ Subscript n, and upper T Subscript s Baseline left-parenthesis epsilon right-parenthesis almost-equals upper T Subscript a Baseline left-parenthesis epsilon prime right-parenthesis for some other (deterministic) epsilon prime. For MCMC, this sequential stopping rule connects directly to the concept of effective sample size [26]. That is, stopping at upper T Subscript s Baseline left-parenthesis epsilon right-parenthesis is equivalent to stopping when

      In our examples, we assume that a CLT (or asymptotic distribution) for Monte Carlo estimators exists. However, extra care must be taken when working with a generic Monte Carlo procedure. Particularly, importance sampling can often yield estimators with infinite variances, where a CLT cannot hold. See Refs [3, 4] for more details. A CLT is particularly difficult to establish for MCMC due to serial correlation in the Markov chain. However, many individual Markov chains have been shown to be at least polynomially ergodic, for examples, see Jarner and Hansen [30], Roberts and Tweedie [31], Vats [32], Khare and Hobert [33], Tan et al. [34], Hobert and Geyer [35], Jones and Hobert [36].

      A similar workflow can be adopted for embarrassingly parallel implementations of Monte Carlo samplers. Given the power of the modern personal computer, most Monte Carlo samplers can run on multiple cores simultaneously, producing more samples in the same clock time. For IID Monte Carlo, averaging estimators across all independent runs is reasonable. However, for estimating upper Sigma in MCMC, estimation quality can be improved by sharing information across multiple runs at the end of the simulation, see Gupta and Vats [37] for more details.

      Sequential stopping rules, particularly in MCMC, should not be implemented as a black‐box procedure. Each implementation of the stopping rule must be accompanied with visualizations that give qualitative insights about the quality of the samplers. A better quality sampler can significantly improve estimation and lead to smaller run times. We illustrate this point by comparing samplers in our examples.

      7.1 Action Figure Collector Problem