Estimating the size of a set using cascading exclusion

Chatterjee, Sourav; Diaconis, Persi; Holmes, Susan

数学 > 统计理论

arXiv:2508.05901 (math)

[提交于 2025年8月7日 ]

标题：使用级联排除估计集合的大小

标题： Estimating the size of a set using cascading exclusion

Authors:Sourav Chatterjee, Persi Diaconis, Susan Holmes

摘要：设$S$为一个有限集合，$X_1,\ldots,X_n$为从$S$中独立同分布的均匀样本。为了估计大小$|S|$，在没有进一步结构的情况下，可以等待重复并利用生日问题。这需要样本量约为$|S|^\frac{1}{2}$。另一方面，如果$S=\{1,2,\ldots,|S|\}$，通过将样本放大$n/(n-1)$得到的最大值给出了一个基于任何增长样本量的有效估计量。本文给出了在这些极端情况之间的改进方法。发展了一种一般的非渐近理论。这包括估计紧凸集的体积、未被观测到的物种问题，以及从问题“这个新观测是否是从一个大预定义总体中典型的选取？”中得出的一系列检验问题。我们还处理了回归风格的预测器。一个普遍定理在所有情况下给出了非参数有限$n$误差界。

摘要： Let $S$ be a finite set, and $X_1,\ldots,X_n$ an i.i.d. uniform sample from $S$. To estimate the size $|S|$, without further structure, one can wait for repeats and use the birthday problem. This requires a sample size of the order $|S|^\frac{1}{2}$. On the other hand, if $S=\{1,2,\ldots,|S|\}$, the maximum of the sample blown up by $n/(n-1)$ gives an efficient estimator based on any growing sample size. This paper gives refinements that interpolate between these extremes. A general non-asymptotic theory is developed. This includes estimating the volume of a compact convex set, the unseen species problem, and a host of testing problems that follow from the question `Is this new observation a typical pick from a large prespecified population?' We also treat regression style predictors. A general theorem gives non-parametric finite $n$ error bounds in all cases.

评论：	46页，10图
主题：	统计理论 (math.ST) ; 信息论 (cs.IT); 概率 (math.PR); 机器学习 (stat.ML)
MSC 类：	62G05, 62G25
引用方式：	arXiv:2508.05901 [math.ST]
	(或者 arXiv:2508.05901v1 [math.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.05901

提交历史

来自： Sourav Chatterjee [查看电子邮件]
[v1] 星期四， 2025 年 8 月 7 日 23:36:42 UTC (879 KB)

数学 > 统计理论

标题：使用级联排除估计集合的大小

标题： Estimating the size of a set using cascading exclusion

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 统计理论

标题： 使用级联排除估计集合的大小 显示英文标题

标题： Estimating the size of a set using cascading exclusion

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用级联排除估计集合的大小