Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

Wang, Feicheng; Janson, Lucas

计算机科学 > 机器学习

arXiv:2202.05799 (cs)

[提交于 2022年2月11日 ]

标题：在未知动力学的线性二次调节器中进行速率匹配的遗憾下界

标题： Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

Authors:Feicheng Wang, Lucas Janson

摘要：强化学习理论目前在其实证性能与性能的理论表征之间存在不匹配，这会对例如样本效率、安全性和鲁棒性的理解产生影响。未知动态的线性二次调节器是一个基础的强化学习设置，在其动态和成本函数中具有显著的结构，但即使在这个设置中，已知的最佳遗憾下界为$\Omega_p(\sqrt{T})$与最佳已知上界$O_p(\sqrt{T}\,\text{polylog}(T))$之间仍存在差距。本文的贡献是通过建立一个新颖的遗憾上界$O_p(\sqrt{T})$来填补这一差距。我们的证明是构造性的，因为它分析了一个具体算法的遗憾，并同时建立了对$O_p(T^{-1/4})$动态的估计误差界，这也是第一个与已知下界速率相匹配的。我们改进的证明技术的两个关键点是（1）对系统Gram矩阵更精确的上下界，以及（2）对最优控制器期望估计误差的自界论证。

摘要： The theory of reinforcement learning currently suffers from a mismatch between its empirical performance and the theoretical characterization of its performance, with consequences for, e.g., the understanding of sample efficiency, safety, and robustness. The linear quadratic regulator with unknown dynamics is a fundamental reinforcement learning setting with significant structure in its dynamics and cost function, yet even in this setting there is a gap between the best known regret lower-bound of $\Omega_p(\sqrt{T})$ and the best known upper-bound of $O_p(\sqrt{T}\,\text{polylog}(T))$. The contribution of this paper is to close that gap by establishing a novel regret upper-bound of $O_p(\sqrt{T})$. Our proof is constructive in that it analyzes the regret of a concrete algorithm, and simultaneously establishes an estimation error bound on the dynamics of $O_p(T^{-1/4})$ which is also the first to match the rate of a known lower-bound. The two keys to our improved proof technique are (1) a more precise upper- and lower-bound on the system Gram matrix and (2) a self-bounding argument for the expected estimation error of the optimal controller.

主题：	机器学习 (cs.LG) ; 系统与控制 (eess.SY); 统计理论 (math.ST)
引用方式：	arXiv:2202.05799 [cs.LG]
	(或者 arXiv:2202.05799v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2202.05799

提交历史

来自： Feicheng Wang [查看电子邮件]
[v1] 星期五， 2022 年 2 月 11 日 17:50:14 UTC (49 KB)

计算机科学 > 机器学习

标题：在未知动力学的线性二次调节器中进行速率匹配的遗憾下界

标题： Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 在未知动力学的线性二次调节器中进行速率匹配的遗憾下界 显示英文标题

标题： Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：在未知动力学的线性二次调节器中进行速率匹配的遗憾下界