Optimised Feature Subset Selection via Simulated Annealing

Martínez-García, Fernando; Rubio-García, Álvaro; Fernández-Lorenzo, Samuel; García-Ripoll, Juan José; Porras, Diego

计算机科学 > 机器学习

arXiv:2507.23568 (cs)

[提交于 2025年7月31日 ]

标题：通过模拟退火优化特征子集选择

标题： Optimised Feature Subset Selection via Simulated Annealing

Authors:Fernando Martínez-García, Álvaro Rubio-García, Samuel Fernández-Lorenzo, Juan José García-Ripoll, Diego Porras

摘要：我们引入了SA-FDR，一种用于$\ell_0$-范数特征选择的新算法，该算法将此任务视为一个组合优化问题，并通过使用模拟退火在特征子集空间中进行全局搜索来解决它。优化过程由Fisher判别比引导，我们在分类任务中将其用作模型质量的计算效率较高的代理。我们在包含多达数十万样本和数百个特征的数据集上进行了实验，结果表明 SA-FDR始终能选择更紧凑的特征子集，同时实现高预测准确性。这种恢复信息丰富但最小特征集的能力源于其能够捕捉通常被贪心优化方法忽略的特征间依赖关系。因此，SA-FDR为在高维设置中设计可解释模型提供了一个灵活且有效的解决方案，特别是在模型稀疏性、可解释性和性能至关重要的情况下。

摘要： We introduce SA-FDR, a novel algorithm for $\ell_0$-norm feature selection that considers this task as a combinatorial optimisation problem and solves it by using simulated annealing to perform a global search over the space of feature subsets. The optimisation is guided by the Fisher discriminant ratio, which we use as a computationally efficient proxy for model quality in classification tasks. Our experiments, conducted on datasets with up to hundreds of thousands of samples and hundreds of features, demonstrate that SA-FDR consistently selects more compact feature subsets while achieving a high predictive accuracy. This ability to recover informative yet minimal sets of features stems from its capacity to capture inter-feature dependencies often missed by greedy optimisation approaches. As a result, SA-FDR provides a flexible and effective solution for designing interpretable models in high-dimensional settings, particularly when model sparsity, interpretability, and performance are crucial.

评论：	12页，2图
主题：	机器学习 (cs.LG) ; 统计力学 (cond-mat.stat-mech); 机器学习 (stat.ML)
引用方式：	arXiv:2507.23568 [cs.LG]
	(或者 arXiv:2507.23568v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.23568

提交历史

来自： Fernando Martinez-Garcia PhD [查看电子邮件]
[v1] 星期四， 2025 年 7 月 31 日 13:57:38 UTC (265 KB)

计算机科学 > 机器学习

标题：通过模拟退火优化特征子集选择

标题： Optimised Feature Subset Selection via Simulated Annealing

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 通过模拟退火优化特征子集选择 显示英文标题

标题： Optimised Feature Subset Selection via Simulated Annealing

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过模拟退火优化特征子集选择