CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

Yu, Zhuoyuan; Long, Yuxing; Yang, Zihan; Zeng, Chengyan; Fan, Hongwei; Zhang, Jiyao; Dong, Hao

计算机科学 > 机器人技术

arXiv:2508.10416 (cs)

[提交于 2025年8月14日 ]

标题： CorrectNav：自我校正飞轮增强视觉-语言-动作导航模型

标题： CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

Authors:Zhuoyuan Yu, Yuxing Long, Zihan Yang, Chengyan Zeng, Hongwei Fan, Jiyao Zhang, Hao Dong

摘要：现有的视觉-语言导航模型在执行指令时经常偏离正确路径。然而，这些模型缺乏有效的错误纠正能力，阻碍了它们从错误中恢复。为了解决这一挑战，我们提出了自纠正飞轮，一种新的后训练范式。我们的范式不将训练集上的模型错误轨迹视为缺点，而是强调它们作为有价值数据源的重要性。我们开发了一种方法来识别这些错误轨迹中的偏差，并设计了创新技术以自动生成感知和动作的自纠正数据。这些自纠正数据作为燃料，推动模型的持续训练。当我们在训练集上重新评估模型时，我们的范式展现出其卓越之处，揭示出新的错误轨迹。此时，自纠正飞轮开始运转。通过多次飞轮迭代，我们逐步提升了基于单目RGB的VLA导航模型CorrectNav。在R2R-CE和RxR-CE基准上的实验表明，CorrectNav实现了65.1%和69.3%的新最先进成功率，分别比之前的最佳VLA导航模型提高了8.2%和16.4%。在各种室内和室外环境中的真实机器人测试展示了\method 优越的错误纠正能力、动态障碍物避让能力和长指令遵循能力。

摘要： Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors. To address this challenge, we propose Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model's error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a method to identify deviations in these error trajectories and devised innovative techniques to automatically generate self-correction data for perception and action. These self-correction data serve as fuel to power the model's continued training. The brilliance of our paradigm is revealed when we re-evaluate the model on the training set, uncovering new error trajectories. At this time, the self-correction flywheel begins to spin. Through multiple flywheel iterations, we progressively enhance our monocular RGB-based VLA navigation model CorrectNav. Experiments on R2R-CE and RxR-CE benchmarks show CorrectNav achieves new state-of-the-art success rates of 65.1% and 69.3%, surpassing prior best VLA navigation models by 8.2% and 16.4%. Real robot tests in various indoor and outdoor environments demonstrate \method's superior capability of error correction, dynamic obstacle avoidance, and long instruction following.

主题：	机器人技术 (cs.RO) ; 人工智能 (cs.AI); 计算与语言 (cs.CL); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2508.10416 [cs.RO]
	(或者 arXiv:2508.10416v1 [cs.RO] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.10416

提交历史

来自： Yuxing Long [查看电子邮件]
[v1] 星期四， 2025 年 8 月 14 日 07:39:26 UTC (3,876 KB)

计算机科学 > 机器人技术

标题： CorrectNav：自我校正飞轮增强视觉-语言-动作导航模型

标题： CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器人技术

标题： CorrectNav：自我校正飞轮增强视觉-语言-动作导航模型 显示英文标题

标题： CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： CorrectNav：自我校正飞轮增强视觉-语言-动作导航模型