Calibration improves detection of mislabeled examples

Chibane, Ilies; George, Thomas; Nodet, Pierre; Lemaire, Vincent

计算机科学 > 机器学习

arXiv:2511.02738 (cs)

[提交于 2025年11月4日 ]

标题：校准提高了错误标记样本的检测能力

标题： Calibration improves detection of mislabeled examples

Authors:Ilies Chibane, Thomas George, Pierre Nodet, Vincent Lemaire

摘要：误标记数据是一个普遍的问题，这会削弱机器学习系统在实际应用中的性能。缓解这个问题的有效方法是检测误标记实例并对它们进行特殊处理，例如过滤或重新标记。自动误标记检测方法通常依赖于训练一个基础机器学习模型，然后对每个实例进行探测以获得一个信任分数，以确定每个提供的标签是否真实或错误。因此，这个基础模型的特性至关重要。在本文中，我们研究了校准此模型的影响。我们的实证结果表明，使用校准方法可以提高误标记实例检测的准确性和鲁棒性，为工业应用提供了一个实用而有效的解决方案。

摘要： Mislabeled data is a pervasive issue that undermines the performance of machine learning systems in real-world applications. An effective approach to mitigate this problem is to detect mislabeled instances and subject them to special treatment, such as filtering or relabeling. Automatic mislabeling detection methods typically rely on training a base machine learning model and then probing it for each instance to obtain a trust score that each provided label is genuine or incorrect. The properties of this base model are thus of paramount importance. In this paper, we investigate the impact of calibrating this model. Our empirical results show that using calibration methods improves the accuracy and robustness of mislabeled instance detection, providing a practical and effective solution for industrial applications.

主题：	机器学习 (cs.LG)
引用方式：	arXiv:2511.02738 [cs.LG]
	(或者 arXiv:2511.02738v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2511.02738

提交历史

来自： Thomas George [查看电子邮件]
[v1] 星期二， 2025 年 11 月 4 日 17:03:33 UTC (194 KB)

计算机科学 > 机器学习

标题：校准提高了错误标记样本的检测能力

标题： Calibration improves detection of mislabeled examples

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 校准提高了错误标记样本的检测能力 显示英文标题

标题： Calibration improves detection of mislabeled examples

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：校准提高了错误标记样本的检测能力