RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Qu, Yuxiao; Singh, Anikait; Lee, Yoonho; Setlur, Amrith; Salakhutdinov, Ruslan; Finn, Chelsea; Kumar, Aviral

计算机科学 > 人工智能

arXiv:2510.02263 (cs)

[提交于 2025年10月2日 ]

标题： RLAD：训练大语言模型发现抽象以解决推理问题

标题： RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Authors:Yuxiao Qu, Anikait Singh, Yoonho Lee, Amrith Setlur, Ruslan Salakhutdinov, Chelsea Finn, Aviral Kumar

摘要：推理需要超越模式匹配或解决方案的记忆，以识别和实施可以用于推导困难问题答案的“算法过程”。做到这一点需要认识到最相关的原始概念、中间结果或共享过程，并在此基础上进行构建。尽管在长期思维链上的强化学习训练最终旨在揭示这种算法行为，但大型模型学到的大多数推理轨迹无法持续捕捉或重用过程，而是漂移到冗长且退化的探索中。为实现更有效的推理，我们引入了推理抽象：简洁的自然语言描述，用于表示程序性和事实性知识，引导模型学习成功的推理。我们训练模型在给定一个问题时能够提出多种抽象，随后通过强化学习激励模型在使用这些抽象提供的信息的同时构建解决方案。这产生了一种双玩家强化学习训练范式，简称为RLAD，它联合训练一个抽象生成器和一个解决方案生成器。这种设置有效地实现了结构化探索，解耦了抽象提议和解决方案生成的学习信号，并提高了对更难问题的泛化能力。我们还表明，在测试时分配更多计算资源来生成抽象比在大规模测试预算下生成更多解决方案更有助于性能，这说明了抽象在引导有意义探索中的作用。

摘要： Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.

主题：	人工智能 (cs.AI) ; 计算与语言 (cs.CL); 机器学习 (cs.LG)
引用方式：	arXiv:2510.02263 [cs.AI]
	(或者 arXiv:2510.02263v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.02263

提交历史

来自： Yuxiao Qu [查看电子邮件]
[v1] 星期四， 2025 年 10 月 2 日 17:44:23 UTC (6,410 KB)

计算机科学 > 人工智能

标题： RLAD：训练大语言模型发现抽象以解决推理问题

标题： RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： RLAD：训练大语言模型发现抽象以解决推理问题 显示英文标题

标题： RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： RLAD：训练大语言模型发现抽象以解决推理问题