Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data

Tom, Jennifer A.; Sinsheimer, Janet S.; Suchard, Marc A.

doi:10.1214/10-AOAS349

统计学 > 应用

arXiv:1101.0959 (stat)

[提交于 2011年1月5日 ]

标题：重复使用、回收、重新评估：通过高效贝叶斯序列计算应对大规模数据中的流感问题

标题： Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data

Authors:Jennifer A. Tom, Janet S. Sinsheimer, Marc A. Suchard

摘要：千兆字节和太兆字节规模的大型数据集与日益复杂的统计工具相结合，使得分析接近计算可行性的边界。面对这种计算负担，通过将数据集分割成更易于处理的大小来妥协，会导致分层分析，脱离了最初数据收集所依据的背景。在贝叶斯框架下，这些分层分析会产生中间结果，通常使用点估计进行比较，但这些点估计未能考虑这些结果所近似的分布之间的变异性和相关性。然而，尽管最初的妥协导致一般无法进行使用单一联合分层模型的更合理的分析，我们可以通过扩展动态迭代重加权MCMC算法来规避这一结果，并利用这些中间结果。通过这种方法，我们通过使用重要性权重重新加权这些可用的结果，将它们再循环到一个如今可处理的联合分层模型中。我们将此技术应用于来自687个流感A病毒基因组（跨越13年）的分层分析产生的中间结果，使我们能够在分层统计框架内重新审视关于流感进化历史的相关假设。

摘要： Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework.

评论：	发表于http://dx.doi.org/10.1214/10-AOAS349的《应用统计年鉴》(http://www.imstat.org/aoas/)，由数理统计研究所(http://www.imstat.org)出版。
主题：	应用 (stat.AP)
引用方式：	arXiv:1101.0959 [stat.AP]
	(或者 arXiv:1101.0959v1 [stat.AP] 对于此版本)
	https://doi.org/10.48550/arXiv.1101.0959
期刊参考：	IMS-AOAS-AOAS349
相关 DOI:	https://doi.org/10.1214/10-AOAS349

提交历史

来自： Jennifer A. Tom [查看电子邮件]
[v1] 星期三， 2011 年 1 月 5 日 12:56:49 UTC (167 KB)

统计学 > 应用

标题：重复使用、回收、重新评估：通过高效贝叶斯序列计算应对大规模数据中的流感问题

标题： Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 应用

标题： 重复使用、回收、重新评估：通过高效贝叶斯序列计算应对大规模数据中的流感问题 显示英文标题

标题： Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：重复使用、回收、重新评估：通过高效贝叶斯序列计算应对大规模数据中的流感问题