ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

Hu, Minda; Qiu, Zexuan; Xu, Zenan; Li, Kun; Zhou, Bo; King, Irwin

计算机科学 > 人工智能

arXiv:2601.04973 (cs)

[提交于 2026年1月8日 ]

标题： ConMax：用于高效思维链推理的置信度最大化压缩

标题： ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

Authors:Minda Hu, Zexuan Qiu, Zenan Xu, Kun Li, Bo Zhou, Irwin King

摘要：大型推理模型（LRMs）的最新突破表明，广泛的思维链（CoT）生成对于实现复杂的认知行为（如自我验证和回溯）以解决复杂任务至关重要。然而，这种能力常常导致“过度思考”，即模型生成冗余的推理路径，增加了计算成本但并未提高准确性。虽然在“冷启动”阶段对推理轨迹进行监督微调（SFT）是一种标准范式，但将现有的压缩技术应用于这些轨迹往往会损害逻辑连贯性或产生高昂的采样成本。在本文中，我们引入了ConMax（置信度最大化压缩），一种新的强化学习框架，旨在自动压缩推理轨迹同时保留关键的推理模式。 ConMax将压缩形式化为一个奖励驱动的优化问题，通过一个冻结的辅助LRM训练策略，通过最大化答案置信度（用于预测保真度）和思考置信度（用于推理有效性）的加权组合来修剪冗余。在五个推理数据集上的广泛实验表明，ConMax实现了更优的效率-性能权衡。具体来说，它在仅损失0.7%准确率的情况下，比强基线模型减少了43%的推理长度，证明了其在为LRMs生成高质量、高效训练数据方面的有效性。

摘要： Recent breakthroughs in Large Reasoning Models (LRMs) have demonstrated that extensive Chain-of-Thought (CoT) generation is critical for enabling intricate cognitive behaviors, such as self-verification and backtracking, to solve complex tasks. However, this capability often leads to ``overthinking'', where models generate redundant reasoning paths that inflate computational costs without improving accuracy. While Supervised Fine-Tuning (SFT) on reasoning traces is a standard paradigm for the 'cold start' phase, applying existing compression techniques to these traces often compromises logical coherence or incurs prohibitive sampling costs. In this paper, we introduce ConMax (Confidence-Maximizing Compression), a novel reinforcement learning framework designed to automatically compress reasoning traces while preserving essential reasoning patterns. ConMax formulates compression as a reward-driven optimization problem, training a policy to prune redundancy by maximizing a weighted combination of answer confidence for predictive fidelity and thinking confidence for reasoning validity through a frozen auxiliary LRM. Extensive experiments across five reasoning datasets demonstrate that ConMax achieves a superior efficiency-performance trade-off. Specifically, it reduces inference length by 43% over strong baselines at the cost of a mere 0.7% dip in accuracy, proving its effectiveness in generating high-quality, efficient training data for LRMs.

主题：	人工智能 (cs.AI) ; 计算与语言 (cs.CL)
引用方式：	arXiv:2601.04973 [cs.AI]
	(或者 arXiv:2601.04973v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2601.04973

提交历史

来自： Minda Hu [查看电子邮件]
[v1] 星期四， 2026 年 1 月 8 日 14:22:58 UTC (2,037 KB)

计算机科学 > 人工智能

标题： ConMax：用于高效思维链推理的置信度最大化压缩

标题： ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： ConMax：用于高效思维链推理的置信度最大化压缩 显示英文标题

标题： ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： ConMax：用于高效思维链推理的置信度最大化压缩