Beyond Regularization: Inherently Sparse Principal Component Analysis

Bauer, Jan O.

统计学 > 方法论

arXiv:2510.03729 (stat)

[提交于 2025年10月4日 ]

标题：超越正则化：固有稀疏主成分分析

标题： Beyond Regularization: Inherently Sparse Principal Component Analysis

Authors:Jan O. Bauer

摘要：稀疏主成分分析（稀疏PCA）是一种在多元分析中用于降维的广泛应用技术，解决了标准PCA的两个关键局限性。首先，稀疏PCA可以在高维低样本量的情况下实现，例如基因微阵列。其次，它通过将成分正则化为零来提高可解释性。然而，稀疏奇异向量的过度正则化可能导致它们与总体奇异向量有很大偏差，可能错误地表示数据结构。此外，稀疏奇异向量通常不是正交的，导致组件之间存在共享信息，这使得方差解释的计算变得复杂。为了解决这些挑战，我们提出了一种反映数据矩阵固有结构的稀疏PCA方法。具体来说，我们识别数据矩阵中的不相关子矩阵，这意味着协方差矩阵表现出稀疏块对角结构。这种稀疏矩阵在高维设置中很常见。这种数据矩阵的奇异向量本质上是稀疏的，这在提高可解释性的同时捕捉了底层数据结构。此外，这些奇异向量在构造上是正交的，确保它们不共享信息。我们通过模拟展示了我们方法的有效性，并提供了实际数据应用。本文的补充材料可在在线获取。

摘要： Sparse principal component analysis (sparse PCA) is a widely used technique for dimensionality reduction in multivariate analysis, addressing two key limitations of standard PCA. First, sparse PCA can be implemented in high-dimensional low sample size settings, such as genetic microarrays. Second, it improves interpretability as components are regularized to zero. However, over-regularization of sparse singular vectors can cause them to deviate greatly from the population singular vectors, potentially misrepresenting the data structure. Additionally, sparse singular vectors are often not orthogonal, resulting in shared information between components, which complicates the calculation of variance explained. To address these challenges, we propose a methodology for sparse PCA that reflects the inherent structure of the data matrix. Specifically, we identify uncorrelated submatrices of the data matrix, meaning that the covariance matrix exhibits a sparse block diagonal structure. Such sparse matrices commonly occur in high-dimensional settings. The singular vectors of such a data matrix are inherently sparse, which improves interpretability while capturing the underlying data structure. Furthermore, these singular vectors are orthogonal by construction, ensuring that they do not share information. We demonstrate the effectiveness of our method through simulations and provide real data applications. Supplementary materials for this article are available online.

主题：	方法论 (stat.ME)
引用方式：	arXiv:2510.03729 [stat.ME]
	(或者 arXiv:2510.03729v1 [stat.ME] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.03729

提交历史

来自： Jan O. Bauer [查看电子邮件]
[v1] 星期六， 2025 年 10 月 4 日 08:22:23 UTC (109 KB)

统计学 > 方法论

标题：超越正则化：固有稀疏主成分分析

标题： Beyond Regularization: Inherently Sparse Principal Component Analysis

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 方法论

标题： 超越正则化：固有稀疏主成分分析 显示英文标题

标题： Beyond Regularization: Inherently Sparse Principal Component Analysis

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：超越正则化：固有稀疏主成分分析