Quantum Algorithms for Finite-horizon Markov Decision Processes

Luo, Bin; Huang, Yuwen; Allcock, Jonathan; Lin, Xiaojun; Zhang, Shengyu; Lui, John C. S.

量子物理

arXiv:2508.05712 (quant-ph)

[提交于 2025年8月7日 ]

标题：量子算法用于有限时间范围的马尔可夫决策过程

标题： Quantum Algorithms for Finite-horizon Markov Decision Processes

Authors:Bin Luo, Yuwen Huang, Jonathan Allcock, Xiaojun Lin, Shengyu Zhang, John C.S. Lui

摘要：在本工作中，我们设计了比经典算法更高效的量子算法来解决两种不同情况下的时变和有限时间范围的马尔可夫决策过程（MDPs）：(1) 在精确动力学设置中，代理完全了解环境的动力学（即转移概率），我们证明我们的$\textbf{Quantum Value Iteration (QVI)}$算法$\textbf{QVI-1}$在计算最优策略（$\pi^{*}$）和最优 V 值函数（$V_{0}^{*}$）方面相比经典值迭代算法在动作空间大小$(A)$上实现了二次加速。此外，当获得近似最优策略和 V 值函数时，我们的算法$\textbf{QVI-2}$在状态空间大小$(S)$上提供了额外的加速。两者$\textbf{QVI-1}$和$\textbf{QVI-2}$都实现了量子查询复杂度，其在对$S$和$A$的依赖上可证明优于经典下界。 (2) 在生成模型设置中，当环境中的样本以量子叠加态可访问时，我们证明了我们的算法$\textbf{QVI-3}$和$\textbf{QVI-4}$在样本复杂度方面相对于最先进的经典算法在$A$、估计误差$(\epsilon)$和时间范围$(H)$方面有所改进。更重要的是，我们证明了量子下界，以表明假设时间范围是常数，$\textbf{QVI-3}$和$\textbf{QVI-4}$在对数因子范围内是渐近最优的。

摘要： In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment's dynamics (i.e., transition probabilities), we prove that our $\textbf{Quantum Value Iteration (QVI)}$ algorithm $\textbf{QVI-1}$ achieves a quadratic speedup in the size of the action space $(A)$ compared with the classical value iteration algorithm for computing the optimal policy ($\pi^{*}$) and the optimal V-value function ($V_{0}^{*}$). Furthermore, our algorithm $\textbf{QVI-2}$ provides an additional speedup in the size of the state space $(S)$ when obtaining near-optimal policies and V-value functions. Both $\textbf{QVI-1}$ and $\textbf{QVI-2}$ achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on $S$ and $A$. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms $\textbf{QVI-3}$ and $\textbf{QVI-4}$ achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of $A$, estimation error $(\epsilon)$, and time horizon $(H)$. More importantly, we prove quantum lower bounds to show that $\textbf{QVI-3}$ and $\textbf{QVI-4}$ are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon.

评论：	被第42届国际机器学习大会（ICML 2025）接受
主题：	量子物理 (quant-ph)
引用方式：	arXiv:2508.05712 [quant-ph]
	(或者 arXiv:2508.05712v1 [quant-ph] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.05712

提交历史

来自： Bin Luo [查看电子邮件]
[v1] 星期四， 2025 年 8 月 7 日 09:00:23 UTC (109 KB)

量子物理

标题：量子算法用于有限时间范围的马尔可夫决策过程

标题： Quantum Algorithms for Finite-horizon Markov Decision Processes

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

量子物理

标题： 量子算法用于有限时间范围的马尔可夫决策过程 显示英文标题

标题： Quantum Algorithms for Finite-horizon Markov Decision Processes

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：量子算法用于有限时间范围的马尔可夫决策过程