Rethinking On-policy Optimization for Query Augmentation

Xu, Zhichao; Zhuang, Shengyao; Ma, Xueguang; Chen, Bingsen; Tian, Yijun; Mo, Fengran; Cao, Jie; Srikumar, Vivek

计算机科学 > 计算与语言

arXiv:2510.17139 (cs)

[提交于 2025年10月20日 ]

标题：重新思考基于策略的优化用于查询增强

标题： Rethinking On-policy Optimization for Query Augmentation

Authors:Zhichao Xu, Shengyao Zhuang, Xueguang Ma, Bingsen Chen, Yijun Tian, Fengran Mo, Jie Cao, Vivek Srikumar

摘要：近年来，大型语言模型（LLMs）的进展引发了对信息检索（IR）中查询增强的兴趣激增。两种主要方法已经出现。第一种方法是提示LLMs生成答案或伪文档，作为新的查询，纯粹依赖于模型的参数知识或上下文信息。第二种方法是应用强化学习（RL）来微调LLMs进行查询重写，直接优化检索指标。尽管各有优缺点，但两种方法在一致的实验条件下尚未进行比较。在本工作中，我们首次在包括证据寻求、临时和工具检索在内的多种基准上系统比较基于提示和基于RL的查询增强。我们的主要发现是，简单的、无需训练的查询增强通常表现与更昂贵的基于RL的方法相当，甚至在使用强大的LLMs时表现更好。受这一发现的启发，我们引入了一种新的混合方法，策略伪文档查询扩展（OPQE），该方法不是重写查询，而是让LLM策略生成一个最大化检索性能的伪文档，从而将提示的灵活性和生成结构与RL的目标优化结合起来。我们展示了OPQE优于单独的提示和基于RL的重写，证明了协同方法能取得最佳效果。我们的实现已提供，以促进可重复性。

摘要： Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or pseudo-documents that serve as new queries, relying purely on the model's parametric knowledge or contextual information. The second applies reinforcement learning (RL) to fine-tune LLMs for query rewriting, directly optimizing retrieval metrics. While having respective advantages and limitations, the two approaches have not been compared under consistent experimental conditions. In this work, we present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks, including evidence-seeking, ad hoc, and tool retrieval. Our key finding is that simple, training-free query augmentation often performs on par with, or even surpasses, more expensive RL-based counterparts, especially when using powerful LLMs. Motivated by this discovery, we introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which, instead of rewriting a query, the LLM policy learns to generate a pseudo-document that maximizes retrieval performance, thus merging the flexibility and generative structure of prompting with the targeted optimization of RL. We show OPQE outperforms both standalone prompting and RL-based rewriting, demonstrating that a synergistic approach yields the best results. Our implementation is made available to facilitate reproducibility.

主题：	计算与语言 (cs.CL) ; 信息检索 (cs.IR)
引用方式：	arXiv:2510.17139 [cs.CL]
	(或者 arXiv:2510.17139v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.17139

提交历史

来自： Zhichao Xu [查看电子邮件]
[v1] 星期一， 2025 年 10 月 20 日 04:16:28 UTC (784 KB)

计算机科学 > 计算与语言

标题：重新思考基于策略的优化用于查询增强

标题： Rethinking On-policy Optimization for Query Augmentation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 重新思考基于策略的优化用于查询增强 显示英文标题

标题： Rethinking On-policy Optimization for Query Augmentation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：重新思考基于策略的优化用于查询增强