Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

Kalavasis, Alkis; Kothari, Pravesh K.; Li, Shuchen; Zampetakis, Manolis

计算机科学 > 数据结构与算法

arXiv:2601.05157 (cs)

[提交于 2026年1月8日 ]

标题：通过高效高维稀疏傅里叶变换学习混合模型

标题： Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

Authors:Alkis Kalavasis, Pravesh K. Kothari, Shuchen Li, Manolis Zampetakis

摘要：在本工作中，我们给出了一种${\rm poly}(d,k)$时间和样本的算法，用于高效学习 $k$个球形分布的混合参数，在 $d$维空间中。与所有以前的方法不同，我们的技术适用于重尾分布，并包括甚至没有有限协方差的例子。我们的方法在聚类分布具有足够重尾的特征函数时有效。这样的分布包括拉普拉斯分布，但关键的是排除了高斯分布。所有以前的学习混合模型的方法都隐式或显式地依赖于低阶矩。即使对于拉普拉斯分布的情况，我们证明任何此类算法都必须使用超多项式数量的样本。因此，我们的方法增加了绕过矩方法限制的技术短列表。令人惊讶的是，我们的算法不需要聚类均值之间的最小分离。这与球形高斯混合形成鲜明对比，其中即使在信息论上，最小的$\ell_2$-分离也是可证明必要的 [Regev 和 Vijayaraghavan '17]。我们的方法与现有技术很好地结合，并允许获得“两种世界最佳”保证的混合模型，其中每个组件要么具有重尾特征函数，要么具有轻尾特征函数的次高斯尾部。我们的算法基于通过高效高维稀疏傅里叶变换学习混合模型的新方法。我们认为这种方法将在统计估计中找到更多应用。作为例子，我们给出了一个对抗噪声无关对手的一致鲁棒均值估计算法，该模型在多重假设检验文献中具有实际动机。它由其中一位作者最近的硕士论文正式提出，并已激发后续工作。

摘要： In this work, we give a ${\rm poly}(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians. All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments. Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum $\ell_2$-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function. Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.

主题：	数据结构与算法 (cs.DS) ; 机器学习 (cs.LG); 机器学习 (stat.ML)
引用方式：	arXiv:2601.05157 [cs.DS]
	(或者 arXiv:2601.05157v1 [cs.DS] 对于此版本)
	https://doi.org/10.48550/arXiv.2601.05157

提交历史

来自： Shuchen Li [查看电子邮件]
[v1] 星期四， 2026 年 1 月 8 日 17:47:58 UTC (96 KB)

计算机科学 > 数据结构与算法

标题：通过高效高维稀疏傅里叶变换学习混合模型

标题： Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 数据结构与算法

标题： 通过高效高维稀疏傅里叶变换学习混合模型 显示英文标题

标题： Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过高效高维稀疏傅里叶变换学习混合模型