Multi-resolution subsampling for large-scale linear classification

Chen, Haolin; Dette, Holger; Yu, Jun

统计学 > 方法论

arXiv:2407.05691 (stat)

[提交于 2024年7月8日 ]

标题：大规模线性分类的多分辨率子采样

标题： Multi-resolution subsampling for large-scale linear classification

Authors:Haolin Chen, Holger Dette, Jun Yu

摘要：子采样是大数据时代平衡统计效率和计算效率的一种流行方法。大多数方法旨在选择具有信息量或代表性的样本点，以实现对完整数据的良好总体信息捕获。本研究认为，采样技术应根据精心设计的数据划分推荐用于我们关注的区域，而汇总度量足以收集其余部分的信息。我们提出了一种多分辨率子采样策略，该策略结合了由汇总度量描述的整体信息和从选定的子样本点获得的局部信息。我们证明，所提出的方法将导致更有效的基于子样本的估计器，适用于一般的大规模分类问题。此外，我们建立了所提出方法的一些渐近性质，并探讨了与现有子采样程序的联系。最后，我们通过模拟和真实世界实例展示了所提出的子采样策略。

摘要： Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information of the full data. The present work takes the view that sampling techniques are recommended for the region we focus on and summary measures are enough to collect the information for the rest according to a well-designed data partitioning. We propose a multi-resolution subsampling strategy that combines global information described by summary measures and local information obtained from selected subsample points. We show that the proposed method will lead to a more efficient subsample-based estimator for general large-scale classification problems. Some asymptotic properties of the proposed method are established and connections to existing subsampling procedures are explored. Finally, we illustrate the proposed subsampling strategy via simulated and real-world examples.

评论：	40页
主题：	方法论 (stat.ME) ; 统计理论 (math.ST)
引用方式：	arXiv:2407.05691 [stat.ME]
	(或者 arXiv:2407.05691v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2407.05691

提交历史

来自： Jun Yu [查看电子邮件]
[v1] 星期一， 2024 年 7 月 8 日 07:46:24 UTC (13,317 KB)

统计学 > 方法论

标题：大规模线性分类的多分辨率子采样

标题： Multi-resolution subsampling for large-scale linear classification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 大规模线性分类的多分辨率子采样 显示英文标题

标题： Multi-resolution subsampling for large-scale linear classification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：大规模线性分类的多分辨率子采样