机器人技术

新提交
交叉列表
替换

查看最近的文章

显示 2025年08月15日，星期五新的列表

总共 49 条目

显示最多 2000 每页条目：较少 | 更多 | 所有

[1] arXiv:2508.10144 [中文pdf, pdf, html, 其他]: 标题：基于WiFi的大型环境全局定位利用osmAG的结构先验

标题： WiFi-based Global Localization in Large-Scale Environments Leveraging Structural Priors from osmAG

Xu Ma, Jiajie Zhang, Fujing Xie, Sören Schwertfeger

主题：机器人技术 (cs.RO)

全球定位对于自主机器人至关重要，尤其是在GPS信号不可用的室内环境中。我们提出了一种新的基于WiFi的定位框架，利用普遍的无线基础设施和OpenStreetMap区域图(osmAG)用于大规模室内环境。我们的方法将信号传播建模与osmAG的几何和拓扑先验相结合。在离线阶段，一种迭代优化算法通过建模墙体衰减来定位WiFi接入点，实现了3.79米的平均定位误差（比三边测量提高了35.3%）。在在线阶段，实时机器人定位使用增强的osmAG地图，在指纹区域的平均误差为3.12米（比KNN指纹匹配提高了8.77%），在非指纹区域的平均误差为3.83米（提高了81.05%）。与基于指纹的方法比较显示，我们的方法更加节省空间，并且在没有指纹数据的位置实现了更优的定位精度。在复杂的一个11,025平方米多层环境中验证，该框架为室内机器人定位提供了一个可扩展、成本效益高的解决方案，解决了被绑架机器人问题。代码和数据集可在https://github.com/XuMa369/osmag-wifi-localization获取。

Global localization is essential for autonomous robotics, especially in indoor environments where the GPS signal is denied. We propose a novel WiFi-based localization framework that leverages ubiquitous wireless infrastructure and the OpenStreetMap Area Graph (osmAG) for large-scale indoor environments. Our approach integrates signal propagation modeling with osmAG's geometric and topological priors. In the offline phase, an iterative optimization algorithm localizes WiFi Access Points (APs) by modeling wall attenuation, achieving a mean localization error of 3.79 m (35.3\% improvement over trilateration). In the online phase, real-time robot localization uses the augmented osmAG map, yielding a mean error of 3.12 m in fingerprinted areas (8.77\% improvement over KNN fingerprinting) and 3.83 m in non-fingerprinted areas (81.05\% improvement). Comparison with a fingerprint-based method shows that our approach is much more space efficient and achieves superior localization accuracy, especially for positions where no fingerprint data are available. Validated across a complex 11,025 &m^2& multi-floor environment, this framework offers a scalable, cost-effective solution for indoor robotic localization, solving the kidnapped robot problem. The code and dataset are available at https://github.com/XuMa369/osmag-wifi-localization.
[2] arXiv:2508.10203 [中文pdf, pdf, html, 其他]: 标题：基于凸集时空图的系统约束表述与无碰撞轨迹规划

标题： Systematic Constraint Formulation and Collision-Free Trajectory Planning Using Space-Time Graphs of Convex Sets

Matthew D. Osburn, Cameron K. Peterson, John L. Salmon

评论： 21页，参考文献，20幅图

主题：机器人技术 (cs.RO) ; 系统与控制 (eess.SY) ; 优化与控制 (math.OC)

在本文中，我们在杂乱的动态环境中创建最优、无碰撞、随时间变化的轨迹。许多空间和时间约束使得为数值求解器找到一个初始猜测变得困难。凸集图（GCS）以及最近开发的空间-时间凸集图公式（ST-GCS）使我们能够在不向求解器提供初始猜测的情况下生成最优最小距离无碰撞轨迹。我们还探讨了通用GCS兼容约束的推导，并记录了一种将通用约束适应到框架中的直观策略。我们证明当环境静态时，ST-GCS产生的轨迹与标准GCS公式产生相同的结果。然后我们展示ST-GCS在动态环境中运行以找到最小距离无碰撞轨迹。

In this paper, we create optimal, collision-free, time-dependent trajectories through cluttered dynamic environments. The many spatial and temporal constraints make finding an initial guess for a numerical solver difficult. Graphs of Convex Sets (GCS) and the recently developed Space-Time Graphs of Convex Sets formulation (ST-GCS) enable us to generate optimal minimum distance collision-free trajectories without providing an initial guess to the solver. We also explore the derivation of general GCS-compatible constraints and document an intuitive strategy for adapting general constraints to the framework. We show that ST-GCS produces equivalent trajectories to the standard GCS formulation when the environment is static. We then show ST-GCS operating in dynamic environments to find minimum distance collision-free trajectories.
[3] arXiv:2508.10269 [中文pdf, pdf, html, 其他]: 标题：用于稳健和响应外骨骼运动合成的混合数据驱动预测控制

标题： Hybrid Data-Driven Predictive Control for Robust and Reactive Exoskeleton Locomotion Synthesis

Kejun Li, Jeeseop Kim, Maxime Brunet, Marine Pétriaux, Yisong Yue, Aaron D. Ames

评论： 8页；8图

主题：机器人技术 (cs.RO)

在外骨骼中实现稳健的双足运动需要在实时动态响应环境变化的能力。本文介绍了混合数据驱动预测控制（HDDPC）框架，这是数据启用预测控制的扩展，通过同时规划脚部接触时间表和连续域轨迹来解决这些挑战。所提出的框架利用基于汉克尔矩阵的表示来建模系统动力学，结合步到步（S2S）转换以提高在动态环境中的适应性。通过将接触时间表与轨迹规划相结合，该框架为运动合成提供了一种高效且统一的解决方案，通过在线重新规划实现了稳健且具有反应能力的行走。我们在Atalante外骨骼上验证了该方法，展示了改进的鲁棒性和适应性。

Robust bipedal locomotion in exoskeletons requires the ability to dynamically react to changes in the environment in real time. This paper introduces the hybrid data-driven predictive control (HDDPC) framework, an extension of the data-enabled predictive control, that addresses these challenges by simultaneously planning foot contact schedules and continuous domain trajectories. The proposed framework utilizes a Hankel matrix-based representation to model system dynamics, incorporating step-to-step (S2S) transitions to enhance adaptability in dynamic environments. By integrating contact scheduling with trajectory planning, the framework offers an efficient, unified solution for locomotion motion synthesis that enables robust and reactive walking through online replanning. We validate the approach on the Atalante exoskeleton, demonstrating improved robustness and adaptability.
[4] arXiv:2508.10333 [中文pdf, pdf, html, 其他]: 标题： ReconVLA：重建性视觉-语言-动作模型作为有效的机器人感知器

标题： ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

Wenxuan Song, Ziyang Zhou, Han Zhao, Jiayi Chen, Pengxiang Ding, Haodong Yan, Yuxin Huang, Feilong Tang, Donglin Wang, Haoang Li

主题：机器人技术 (cs.RO) ; 计算机视觉与模式识别 (cs.CV)

视觉-语言-动作（VLA）模型的最新进展使机器人代理能够整合多模态理解与动作执行。然而，我们的实证分析显示，当前的VLA难以将视觉注意力分配到目标区域。相反，视觉注意力总是分散的。为了引导视觉注意力以正确目标为基础，我们提出了ReconVLA，这是一种具有隐式定位范式的重建VLA模型。在模型的视觉输出基础上，一个扩散变压器旨在重建图像的注视区域，这对应于被操作的目标物体。这个过程促使VLA模型学习细粒度表示并准确分配视觉注意力，从而有效利用特定任务的视觉信息并进行精确操作。此外，我们整理了一个大规模的预训练数据集，包含来自开源机器人数据集的超过10万条轨迹和200万个数据样本，进一步提升了模型在视觉重建中的泛化能力。仿真和现实世界中的大量实验展示了我们隐式定位方法的优势，展示了其精确操作和泛化能力。我们的项目页面是https://zionchow.github.io/ReconVLA/。

Recent advances in Vision-Language-Action (VLA) models have enabled robotic agents to integrate multimodal understanding with action execution. However, our empirical analysis reveals that current VLAs struggle to allocate visual attention to target regions. Instead, visual attention is always dispersed. To guide the visual attention grounding on the correct target, we propose ReconVLA, a reconstructive VLA model with an implicit grounding paradigm. Conditioned on the model's visual outputs, a diffusion transformer aims to reconstruct the gaze region of the image, which corresponds to the target manipulated objects. This process prompts the VLA model to learn fine-grained representations and accurately allocate visual attention, thus effectively leveraging task-specific visual information and conducting precise manipulation. Moreover, we curate a large-scale pretraining dataset comprising over 100k trajectories and 2 million data samples from open-source robotic datasets, further boosting the model's generalization in visual reconstruction. Extensive experiments in simulation and the real world demonstrate the superiority of our implicit grounding method, showcasing its capabilities of precise manipulation and generalization. Our project page is https://zionchow.github.io/ReconVLA/.
[5] arXiv:2508.10363 [中文pdf, pdf, html, 其他]: 标题： BEASST：基于行为熵梯度的移动机器人自适应源寻找方法

标题： BEASST: Behavioral Entropic Gradient based Adaptive Source Seeking for Mobile Robots

Donipolo Ghimire, Aamodh Suresh, Carlos Nieto-Granda, Solmaz S. Kia

主题：机器人技术 (cs.RO)

本文介绍了BEASST（行为熵梯度自适应源寻找用于移动机器人），这是一种在复杂未知环境中进行机器人源寻找的新框架。我们的方法通过将归一化信号强度建模为源位置的替代概率，使移动机器人能够有效地平衡探索与利用。基于行为熵（BE）和Prelec的概率加权函数，我们定义了一个目标函数，该函数根据信号可靠性及任务紧迫性，使机器人行为从风险规避转变为风险寻求。在单峰信号假设下，该框架提供了理论收敛保证，并在有界扰动下提供实际稳定性。在DARPA SubT和多房间场景中的实验验证表明，BEASST始终优于最先进方法，在智能不确定性驱动导航中动态转换激进追逐和谨慎探索，实现了路径长度减少15%和源定位速度提高20%。

This paper presents BEASST (Behavioral Entropic Gradient-based Adaptive Source Seeking for Mobile Robots), a novel framework for robotic source seeking in complex, unknown environments. Our approach enables mobile robots to efficiently balance exploration and exploitation by modeling normalized signal strength as a surrogate probability of source location. Building on Behavioral Entropy(BE) with Prelec's probability weighting function, we define an objective function that adapts robot behavior from risk-averse to risk-seeking based on signal reliability and mission urgency. The framework provides theoretical convergence guarantees under unimodal signal assumptions and practical stability under bounded disturbances. Experimental validation across DARPA SubT and multi-room scenarios demonstrates that BEASST consistently outperforms state-of-the-art methods, achieving 15% reduction in path length and 20% faster source localization through intelligent uncertainty-driven navigation that dynamically transitions between aggressive pursuit and cautious exploration.
[6] arXiv:2508.10371 [中文pdf, pdf, html, 其他]: 标题：基于多模态大语言模型的视觉强化学习的少样本视觉人类活动识别

标题： Few-shot Vision-based Human Activity Recognition with MLLM-based Visual Reinforcement Learning

Wenqi Zheng, Yutaka Arakawa

主题：机器人技术 (cs.RO)

在大型推理模型中使用强化学习可以使模型从输出的反馈中进行学习，在有限微调数据的场景中尤其有价值。然而，其在多模态人类活动识别（HAR）领域的应用仍大多未被探索。我们的工作将强化学习扩展到多模态大语言模型的人类活动识别领域。通过在训练过程中结合视觉强化学习，可以显著提高模型在少样本识别中的泛化能力。此外，视觉强化学习可以增强模型的推理能力，并在推理阶段实现可解释的分析。我们将这种结合视觉强化学习的少样本人类活动识别方法命名为FAVOR。具体来说，我们的方法首先利用多模态大语言模型（MLLM）为人类活动图像生成多个候选回答，每个回答都包含推理过程和最终答案。然后使用奖励函数对这些回答进行评估，并随后使用组相对策略优化（GRPO）算法对MLLM模型进行优化。这样，MLLM模型可以在仅有少量样本的情况下适应人类活动识别。在四个不同的人类活动识别数据集和五个不同设置下的大量实验表明了所提出方法的优势。

Reinforcement learning in large reasoning models enables learning from feedback on their outputs, making it particularly valuable in scenarios where fine-tuning data is limited. However, its application in multi-modal human activity recognition (HAR) domains remains largely underexplored. Our work extends reinforcement learning to the human activity recognition domain with multimodal large language models. By incorporating visual reinforcement learning in the training process, the model's generalization ability on few-shot recognition can be greatly improved. Additionally, visual reinforcement learning can enhance the model's reasoning ability and enable explainable analysis in the inference stage. We name our few-shot human activity recognition method with visual reinforcement learning FAVOR. Specifically, our approach first utilizes a multimodal large language model (MLLM) to generate multiple candidate responses for the human activity image, each containing reasoning traces and final answers. These responses are then evaluated using reward functions, and the MLLM model is subsequently optimized using the Group Relative Policy Optimization (GRPO) algorithm. In this way, the MLLM model can be adapted to human activity recognition with only a few samples. Extensive experiments on four human activity recognition datasets and five different settings demonstrate the superiority of the proposed method.
[7] arXiv:2508.10378 [中文pdf, pdf, html, 其他]: 标题：一种语义感知框架，用于上肢外骨骼的安全且意图整合的辅助

标题： A Semantic-Aware Framework for Safe and Intent-Integrative Assistance in Upper-Limb Exoskeletons

Yu Chen, Shu Miao, Chunyu Wu, Jingsong Mu, Bo OuYang, Xiang Li

主题：机器人技术 (cs.RO)

上肢外骨骼主要设计用于通过准确解释和响应人类意图来提供辅助支持。在家庭护理场景中，外骨骼应根据任务的语义信息调整其辅助配置，并根据所操作物体的性质进行适当调整。然而，现有解决方案通常缺乏理解任务语义或与用户协同规划动作的能力，限制了其通用性。为解决这一挑战，本文引入了一个语义感知框架，将大型语言模型整合到任务规划框架中，从而实现安全且意图整合的辅助。所提出的方法首先让外骨骼以透明模式运行，以在物体抓取过程中捕捉佩戴者的意图。一旦从任务描述中提取出语义信息，系统会自动配置适当的辅助参数。此外，使用基于扩散的异常检测器持续监控人机交互状态，并在检测到异常时触发实时重规划。在任务执行过程中，使用在线轨迹优化和阻抗控制来确保安全并调节人机交互。实验结果表明，所提出的方法能有效与佩戴者的认知对齐，适应语义变化的任务，并可靠地响应异常。

Upper-limb exoskeletons are primarily designed to provide assistive support by accurately interpreting and responding to human intentions. In home-care scenarios, exoskeletons are expected to adapt their assistive configurations based on the semantic information of the task, adjusting appropriately in accordance with the nature of the object being manipulated. However, existing solutions often lack the ability to understand task semantics or collaboratively plan actions with the user, limiting their generalizability. To address this challenge, this paper introduces a semantic-aware framework that integrates large language models into the task planning framework, enabling the delivery of safe and intent-integrative assistance. The proposed approach begins with the exoskeleton operating in transparent mode to capture the wearer's intent during object grasping. Once semantic information is extracted from the task description, the system automatically configures appropriate assistive parameters. In addition, a diffusion-based anomaly detector is used to continuously monitor the state of human-robot interaction and trigger real-time replanning in response to detected anomalies. During task execution, online trajectory refinement and impedance control are used to ensure safety and regulate human-robot interaction. Experimental results demonstrate that the proposed method effectively aligns with the wearer's cognition, adapts to semantically varying tasks, and responds reliably to anomalies.
[8] arXiv:2508.10398 [中文pdf, pdf, html, 其他]: 标题：超级LiDAR反射率用于机器人感知

标题： Super LiDAR Reflectance for Robotic Perception

Wei Gao, Jie Zhang, Mingle Zhao, Zhiyuan Zhang, Shu Kong, Maani Ghaffari, Dezhen Song, Cheng-Zhong Xu, Hui Kong

主题：机器人技术 (cs.RO)

通常，人类直觉将视觉定义为被动的光学感知模态，而主动光学感知通常被视为测量而不是视觉的默认模态。然而，现在的情况发生了变化：传感器技术和数据驱动范式使主动光学感知能够重新定义视觉的边界，迎来了主动视觉的新时代。光探测和测距（LiDAR）传感器捕获物体表面的反射率，在不同的光照条件下保持不变，显示出在机器人感知任务如检测、识别、分割和同时定位与地图构建（SLAM）中的巨大潜力。这些应用通常依赖于密集的感知能力，通常通过高分辨率、昂贵的LiDAR传感器实现。低成本LiDAR的关键挑战在于扫描数据的稀疏性，这限制了它们的广泛应用。为了解决这一限制，本工作引入了一种创新框架，从稀疏数据生成密集的LiDAR反射率图像，利用非重复扫描LiDAR（NRS-LiDAR）的独特属性。我们解决了关键挑战，包括反射率校准以及从静态场景域到动态场景域的转换，促进了现实世界环境中密集反射率图像的重建。本工作的主要贡献包括一个用于LiDAR反射率图像密集化的全面数据集，一种针对NRS-LiDAR的密集化网络，以及使用生成的密集反射率图像进行回环闭合和交通车道检测等多种应用。

Conventionally, human intuition often defines vision as a modality of passive optical sensing, while active optical sensing is typically regarded as measuring rather than the default modality of vision. However, the situation now changes: sensor technologies and data-driven paradigms empower active optical sensing to redefine the boundaries of vision, ushering in a new era of active vision. Light Detection and Ranging (LiDAR) sensors capture reflectance from object surfaces, which remains invariant under varying illumination conditions, showcasing significant potential in robotic perception tasks such as detection, recognition, segmentation, and Simultaneous Localization and Mapping (SLAM). These applications often rely on dense sensing capabilities, typically achieved by high-resolution, expensive LiDAR sensors. A key challenge with low-cost LiDARs lies in the sparsity of scan data, which limits their broader application. To address this limitation, this work introduces an innovative framework for generating dense LiDAR reflectance images from sparse data, leveraging the unique attributes of non-repeating scanning LiDAR (NRS-LiDAR). We tackle critical challenges, including reflectance calibration and the transition from static to dynamic scene domains, facilitating the reconstruction of dense reflectance images in real-world settings. The key contributions of this work include a comprehensive dataset for LiDAR reflectance image densification, a densification network tailored for NRS-LiDAR, and diverse applications such as loop closure and traffic lane detection using the generated dense reflectance images.
[9] arXiv:2508.10399 [中文pdf, pdf, html, 其他]: 标题：大型模型赋能的具身人工智能：决策与具身学习综述

标题： Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning

Wenlong Liang, Rui Zhou, Yang Ma, Bing Zhang, Songlin Li, Yijia Liao, Ping Kuang

主题：机器人技术 (cs.RO)

具身人工智能旨在开发具有物理形态的智能系统，这些系统能够在现实环境中进行感知、决策、行动和学习，为人工通用智能（AGI）提供一种有前景的方法。尽管经过数十年的探索，具身代理在开放动态环境中实现人类水平的通用任务智能仍面临挑战。大型模型的最新突破通过增强感知、交互、规划和学习，彻底改变了具身人工智能。在本文中，我们对大型模型赋能的具身人工智能进行了全面综述，重点研究自主决策和具身学习。我们研究了分层和端到端决策范式，详细说明大型模型如何增强分层决策中的高层规划、低层执行和反馈，以及大型模型如何增强视觉-语言-动作（VLA）模型以实现端到端决策。对于具身学习，我们介绍了主流的学习方法，深入阐述了大型模型如何增强模仿学习和强化学习。首次我们将世界模型纳入具身人工智能的综述中，介绍了它们的设计方法及其在增强决策和学习中的关键作用。尽管已取得显著进展，仍存在一些挑战，这些挑战在本综述的最后进行了讨论，可能作为进一步的研究方向。

Embodied AI aims to develop intelligent systems with physical forms capable of perceiving, decision-making, acting, and learning in real-world environments, providing a promising way to Artificial General Intelligence (AGI). Despite decades of explorations, it remains challenging for embodied agents to achieve human-level intelligence for general-purpose tasks in open dynamic environments. Recent breakthroughs in large models have revolutionized embodied AI by enhancing perception, interaction, planning and learning. In this article, we provide a comprehensive survey on large model empowered embodied AI, focusing on autonomous decision-making and embodied learning. We investigate both hierarchical and end-to-end decision-making paradigms, detailing how large models enhance high-level planning, low-level execution, and feedback for hierarchical decision-making, and how large models enhance Vision-Language-Action (VLA) models for end-to-end decision making. For embodied learning, we introduce mainstream learning methodologies, elaborating on how large models enhance imitation learning and reinforcement learning in-depth. For the first time, we integrate world models into the survey of embodied AI, presenting their design methods and critical roles in enhancing decision-making and learning. Though solid advances have been achieved, challenges still exist, which are discussed at the end of this survey, potentially as the further research directions.
[10] arXiv:2508.10416 [中文pdf, pdf, html, 其他]: 标题： CorrectNav：自我校正飞轮增强视觉-语言-动作导航模型

标题： CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

Zhuoyuan Yu, Yuxing Long, Zihan Yang, Chengyan Zeng, Hongwei Fan, Jiyao Zhang, Hao Dong

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 计算与语言 (cs.CL) ; 计算机视觉与模式识别 (cs.CV)

现有的视觉-语言导航模型在执行指令时经常偏离正确路径。然而，这些模型缺乏有效的错误纠正能力，阻碍了它们从错误中恢复。为了解决这一挑战，我们提出了自纠正飞轮，一种新的后训练范式。我们的范式不将训练集上的模型错误轨迹视为缺点，而是强调它们作为有价值数据源的重要性。我们开发了一种方法来识别这些错误轨迹中的偏差，并设计了创新技术以自动生成感知和动作的自纠正数据。这些自纠正数据作为燃料，推动模型的持续训练。当我们在训练集上重新评估模型时，我们的范式展现出其卓越之处，揭示出新的错误轨迹。此时，自纠正飞轮开始运转。通过多次飞轮迭代，我们逐步提升了基于单目RGB的VLA导航模型CorrectNav。在R2R-CE和RxR-CE基准上的实验表明，CorrectNav实现了65.1%和69.3%的新最先进成功率，分别比之前的最佳VLA导航模型提高了8.2%和16.4%。在各种室内和室外环境中的真实机器人测试展示了\method 优越的错误纠正能力、动态障碍物避让能力和长指令遵循能力。

Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors. To address this challenge, we propose Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model's error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a method to identify deviations in these error trajectories and devised innovative techniques to automatically generate self-correction data for perception and action. These self-correction data serve as fuel to power the model's continued training. The brilliance of our paradigm is revealed when we re-evaluate the model on the training set, uncovering new error trajectories. At this time, the self-correction flywheel begins to spin. Through multiple flywheel iterations, we progressively enhance our monocular RGB-based VLA navigation model CorrectNav. Experiments on R2R-CE and RxR-CE benchmarks show CorrectNav achieves new state-of-the-art success rates of 65.1% and 69.3%, surpassing prior best VLA navigation models by 8.2% and 16.4%. Real robot tests in various indoor and outdoor environments demonstrate \method's superior capability of error correction, dynamic obstacle avoidance, and long instruction following.
[11] arXiv:2508.10423 [中文pdf, pdf, html, 其他]: 标题： MASH：用于单个类人机器人移动的协作异构多智能体强化学习

标题： MASH: Cooperative-Heterogeneous Multi-Agent Reinforcement Learning for Single Humanoid Robot Locomotion

Qi Liu, Xiaopeng Zhang, Mingshan Tan, Shuaikang Ma, Jinliang Ding, Yanjie Li

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 系统与控制 (eess.SY)

本文提出了一种新方法，通过合作异构多智能体深度强化学习（MARL）来增强单个人形机器人的运动能力。尽管大多数现有方法通常为单个人形机器人使用单智能体强化学习算法或为多机器人系统任务使用MARL算法，我们提出了一种不同的范式：将合作异构MARL应用于优化单个人形机器人的运动能力。所提出的方法，单人形机器人运动的多智能体强化学习（MASH），将每个肢体（腿和臂）视为一个独立的智能体，在探索机器人动作空间的同时共享一个全局评论家以进行合作学习。实验表明，MASH加速了训练收敛并提高了全身协作能力，优于传统的单智能体强化学习方法。这项工作推进了MARL在单人形机器人控制中的集成，为高效的运动策略提供了新的见解。

This paper proposes a novel method to enhance locomotion for a single humanoid robot through cooperative-heterogeneous multi-agent deep reinforcement learning (MARL). While most existing methods typically employ single-agent reinforcement learning algorithms for a single humanoid robot or MARL algorithms for multi-robot system tasks, we propose a distinct paradigm: applying cooperative-heterogeneous MARL to optimize locomotion for a single humanoid robot. The proposed method, multi-agent reinforcement learning for single humanoid locomotion (MASH), treats each limb (legs and arms) as an independent agent that explores the robot's action space while sharing a global critic for cooperative learning. Experiments demonstrate that MASH accelerates training convergence and improves whole-body cooperation ability, outperforming conventional single-agent reinforcement learning methods. This work advances the integration of MARL into single-humanoid-robot control, offering new insights into efficient locomotion strategies.
[12] arXiv:2508.10497 [中文pdf, pdf, html, 其他]: 标题：使用面向对象编程实现通用机器人技能

标题： Enabling Generic Robot Skill Implementation Using Object Oriented Programming

Abdullah Farrukh, Achim Wagner, Martin Ruskowski

评论：第34届阿尔卑斯-亚得里亚-多瑙地区机器人国际会议（RAAD 2025）

主题：机器人技术 (cs.RO) ; 软件工程 (cs.SE)

开发机器人算法并将机器人子系统集成到更大的系统中可能是一项困难的任务。特别是在缺乏机器人专业知识的小型和中型企业（SME）中，实施、维护和开发机器人系统可能是一个挑战。因此，许多公司依赖系统集成商提供的外部专业知识，这在某些情况下可能导致供应商锁定和对外部的依赖。在智能制造系统的研究中，机器人在设计稳健的自主系统中起着关键作用。希望将机器人系统作为更大智能系统中的一个组件的研究人员也面临类似的挑战，而无需详细处理机器人接口的复杂性和广泛性。在本文中，我们提出了一种软件框架，以减少部署一个有效机器人系统所需的努力。重点仅在于提供一个简化现代机器人系统不同接口的概念，并使用抽象层来适应不同的制造商和型号。 Python编程语言用于实现该概念的原型。目标系统是一个包含Yaskawa Motoman GP4的分拣单元。

Developing robotic algorithms and integrating a robotic subsystem into a larger system can be a difficult task. Particularly in small and medium-sized enterprises (SMEs) where robotics expertise is lacking, implementing, maintaining and developing robotic systems can be a challenge. As a result, many companies rely on external expertise through system integrators, which, in some cases, can lead to vendor lock-in and external dependency. In the academic research on intelligent manufacturing systems, robots play a critical role in the design of robust autonomous systems. Similar challenges are faced by researchers who want to use robotic systems as a component in a larger smart system, without having to deal with the complexity and vastness of the robot interfaces in detail. In this paper, we propose a software framework that reduces the effort required to deploy a working robotic system. The focus is solely on providing a concept for simplifying the different interfaces of a modern robot system and using an abstraction layer for different manufacturers and models. The Python programming language is used to implement a prototype of the concept. The target system is a bin-picking cell containing a Yaskawa Motoman GP4.
[13] arXiv:2508.10511 [中文pdf, pdf, html, 其他]: 标题： KDPE：一种用于扩散策略轨迹选择的核密度估计策略

标题： KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection

Andrea Rosasco, Federico Ceola, Giulia Pasquale, Lorenzo Natale

评论：第九届机器人学习会议（CoRL 2025），首尔，韩国

主题：机器人技术 (cs.RO)

学习捕捉训练数据中多模态特性的机器人策略一直是行为克隆领域长期存在的开放挑战。最近的方法通过使用生成模型对条件动作分布进行建模来解决这个问题。其中一种方法是扩散策略，它依赖于一个扩散模型，将随机点去噪为机器人动作轨迹。虽然实现了最先进的性能，但它在政策执行过程中可能会导致机器人偏离数据分布，存在两个主要缺点。首先，去噪过程的随机性可能严重影响生成的动作轨迹质量。其次，作为一种监督学习方法，它可能从用于训练的数据集中学习到数据异常值。最近的工作通过将扩散策略与大规模训练或经典行为克隆算法结合来减轻这些限制。相反，我们提出了KDPE，一种基于核密度估计的策略，在保持较低测试时计算开销的同时过滤掉扩散策略输出的潜在有害轨迹。对于核密度估计，我们提出了一种流形感知的核，用于对由末端执行器笛卡尔位置、方向和夹爪状态组成的动作的概率密度函数进行建模。 KDPE在模拟的单臂任务和真实机器人实验中总体上比扩散策略表现更好。附加资料和代码可在我们的项目页面 https://hsp-iit.github.io/KDPE/ 上获得。

Learning robot policies that capture multimodality in the training data has been a long-standing open challenge for behavior cloning. Recent approaches tackle the problem by modeling the conditional action distribution with generative models. One of these approaches is Diffusion Policy, which relies on a diffusion model to denoise random points into robot action trajectories. While achieving state-of-the-art performance, it has two main drawbacks that may lead the robot out of the data distribution during policy execution. First, the stochasticity of the denoising process can highly impact on the quality of generated trajectory of actions. Second, being a supervised learning approach, it can learn data outliers from the dataset used for training. Recent work focuses on mitigating these limitations by combining Diffusion Policy either with large-scale training or with classical behavior cloning algorithms. Instead, we propose KDPE, a Kernel Density Estimation-based strategy that filters out potentially harmful trajectories output of Diffusion Policy while keeping a low test-time computational overhead. For Kernel Density Estimation, we propose a manifold-aware kernel to model a probability density function for actions composed of end-effector Cartesian position, orientation, and gripper state. KDPE overall achieves better performance than Diffusion Policy on simulated single-arm tasks and real robot experiments. Additional material and code are available on our project page https://hsp-iit.github.io/KDPE/.
[14] arXiv:2508.10538 [中文pdf, pdf, html, 其他]: 标题： MLM：使用手臂的四足机器人多任务运动操作全身控制学习

标题： MLM: Learning Multi-task Loco-Manipulation Whole-Body Control for Quadruped Robot with Arm

Xin Liu, Bida Ma, Chenkun Qi, Yan Ding, Zhaxizhuoma, Guorong Zhang, Pengan Chen, Kehui Liu, Zhongjie Jia, Chuyue Guan, Yule Mo, Jiaqi Liu, Feng Gao, Jiangwei Zhong, Bin Zhao, Xuelong Li

主题：机器人技术 (cs.RO)

全身运动操作对于四足机器人来说仍然是一个具有挑战性的问题，尤其是在实现多任务控制方面。为了解决这个问题，我们提出了MLM，这是一个由真实世界和仿真数据驱动的强化学习框架。它使配备六自由度机械臂的四足机器人能够自主或在人类远程操作下执行全身运动操作的多个任务。为了解决在运动操作学习过程中平衡多个任务的问题，我们引入了一个轨迹库，该库具有自适应的、基于课程的采样机制。这种方法使策略能够高效地利用真实世界收集的轨迹来进行多任务运动操作的学习。为了解决只有历史观察的部署场景，并增强在不同空间范围的任务上策略执行的性能，我们提出了一种轨迹-速度预测策略网络。它预测不可观测的未来轨迹和速度。通过利用大量的仿真数据和基于课程的奖励，我们的控制器在仿真中实现了全身行为，并实现了零样本转移到真实世界的部署。仿真中的消融实验验证了我们方法的必要性和有效性，而在Go2机器人上搭载Airbot机械臂的真实世界实验展示了策略在多任务执行中的良好性能。

Whole-body loco-manipulation for quadruped robots with arm remains a challenging problem, particularly in achieving multi-task control. To address this, we propose MLM, a reinforcement learning framework driven by both real-world and simulation data. It enables a six-DoF robotic arm--equipped quadruped robot to perform whole-body loco-manipulation for multiple tasks autonomously or under human teleoperation. To address the problem of balancing multiple tasks during the learning of loco-manipulation, we introduce a trajectory library with an adaptive, curriculum-based sampling mechanism. This approach allows the policy to efficiently leverage real-world collected trajectories for learning multi-task loco-manipulation. To address deployment scenarios with only historical observations and to enhance the performance of policy execution across tasks with different spatial ranges, we propose a Trajectory-Velocity Prediction policy network. It predicts unobservable future trajectories and velocities. By leveraging extensive simulation data and curriculum-based rewards, our controller achieves whole-body behaviors in simulation and zero-shot transfer to real-world deployment. Ablation studies in simulation verify the necessity and effectiveness of our approach, while real-world experiments on the Go2 robot with an Airbot robotic arm demonstrate the policy's good performance in multi-task execution.
[15] arXiv:2508.10603 [中文pdf, pdf, html, 其他]: 标题：为什么报告与机器人的失败互动？！基于情景的交互质量研究

标题： Why Report Failed Interactions With Robots?! Towards Vignette-based Interaction Quality

Agnes Axelsson, Merle Reimann, Ronald Cumbal, Hannah Pelikan, Divesh Lala

评论：被接受在2025年IEEE RO-MAN会议期间举行的“公共和私人空间中的现实世界人机交互研讨会：成功、失败与经验教训”（PubRob-Fails）上发表。6页

主题：机器人技术 (cs.RO) ; 人机交互 (cs.HC)

尽管随着大语言模型（LLM）的出现，人机交互的质量有所提高，但与人与人之间的交互相比，系统仍存在各种导致次优表现的因素。失败的性质和严重性通常取决于交互的上下文，因此无法在HRI研究中实施的广泛场景和实验中进行泛化。在本工作中，我们提出使用一种在HRI领域被忽视的技术——民族志小品，以明确突出这些失败，特别是那些很少被记录的失败。我们描述了撰写小品的方法，并基于我们在HRI系统中经历的失败的个人经验创作了我们自己的小品。我们强调小品的优势在于能够从多学科的角度传达失败，促进对机器人能力的透明度，并记录那些否则会被忽略的意外行为。我们鼓励使用小品来补充现有的交互评估方法。

Although the quality of human-robot interactions has improved with the advent of LLMs, there are still various factors that cause systems to be sub-optimal when compared to human-human interactions. The nature and criticality of failures are often dependent on the context of the interaction and so cannot be generalized across the wide range of scenarios and experiments which have been implemented in HRI research. In this work we propose the use of a technique overlooked in the field of HRI, ethnographic vignettes, to clearly highlight these failures, particularly those that are rarely documented. We describe the methodology behind the process of writing vignettes and create our own based on our personal experiences with failures in HRI systems. We emphasize the strength of vignettes as the ability to communicate failures from a multi-disciplinary perspective, promote transparency about the capabilities of robots, and document unexpected behaviours which would otherwise be omitted from research reports. We encourage the use of vignettes to augment existing interaction evaluation methods.
[16] arXiv:2508.10634 [中文pdf, pdf, html, 其他]: 标题：基于安全鲁棒自适应控制的深度神经网络合成，用于轮式移动机器人的可靠运行

标题： Synthesis of Deep Neural Networks with Safe Robust Adaptive Control for Reliable Operation of Wheeled Mobile Robots

Mehdi Heydari Shahna, Jouni Mattila

主题：机器人技术 (cs.RO) ; 系统与控制 (eess.SY)

深度神经网络（DNNs）可以通过避免动态建模的需要，实现精确控制的同时保持较低的计算成本。然而，对于受严格国际标准约束且容易发生故障和干扰的重型轮式移动机器人（WMRs），此类黑盒方法的部署仍然具有挑战性。我们为重型WMRs设计了一种分层控制策略，由两个具有不同权限级别的安全层进行监控。为此，训练并部署了一个DNN策略作为主要控制方案，在正常运行条件下提供高精度性能。当外部干扰出现并达到一定强度，导致系统性能低于预定义阈值时，低级安全层会通过停用主要控制策略并激活无模型鲁棒自适应控制（RAC）策略来介入。这种转换使系统能够在有效管理系统鲁棒性与响应性之间固有权衡的同时继续运行。无论使用哪种控制策略，高级安全层都会在运行过程中持续监控系统性能。只有当干扰变得足够严重，使得补偿不再可行，并且继续运行会危及系统或其环境时，才会启动关机。所提出的DNN和RAC策略的综合方案在一定程度上保证了整个WMR系统的统一指数稳定性。通过使用6,000公斤的WMR进行实时实验，进一步验证了所提出方法的有效性。

Deep neural networks (DNNs) can enable precise control while maintaining low computational costs by circumventing the need for dynamic modeling. However, the deployment of such black-box approaches remains challenging for heavy-duty wheeled mobile robots (WMRs), which are subject to strict international standards and prone to faults and disturbances. We designed a hierarchical control policy for heavy-duty WMRs, monitored by two safety layers with differing levels of authority. To this end, a DNN policy was trained and deployed as the primary control strategy, providing high-precision performance under nominal operating conditions. When external disturbances arise and reach a level of intensity such that the system performance falls below a predefined threshold, a low-level safety layer intervenes by deactivating the primary control policy and activating a model-free robust adaptive control (RAC) policy. This transition enables the system to continue operating while ensuring stability by effectively managing the inherent trade-off between system robustness and responsiveness. Regardless of the control policy in use, a high-level safety layer continuously monitors system performance during operation. It initiates a shutdown only when disturbances become sufficiently severe such that compensation is no longer viable and continued operation would jeopardize the system or its environment. The proposed synthesis of DNN and RAC policy guarantees uniform exponential stability of the entire WMR system while adhering to safety standards to some extent. The effectiveness of the proposed approach was further validated through real-time experiments using a 6,000 kg WMR.
[17] arXiv:2508.10686 [中文pdf, pdf, 其他]: 标题：一种使用模拟开放框架架构（SOFA）模拟磁性软机器人的开源易用界面

标题： An Open-Source User-Friendly Interface for Simulating Magnetic Soft Robots using Simulation Open Framework Architecture (SOFA)

Carla Wehner, Finn Schubert, Heiko Hellkamp, Julius Hahnewald, Kilian Scheafer, Muhammad Bilal Khan, Oliver Gutfleisch

主题：机器人技术 (cs.RO) ; 材料科学 (cond-mat.mtrl-sci)

软体机器人，特别是磁性软体机器人，需要专门的仿真工具来准确模拟其在外加磁场下的变形。然而，现有的平台通常缺乏对磁性材料的专用支持，使得不同专业水平的研究人员难以使用。本工作介绍了一个开源的、用户友好的仿真界面，使用仿真开放框架架构（SOFA），专门用于模拟磁性软体机器人。该工具使用户能够定义材料特性，施加磁场，并实时观察 resulting 变形。通过集成直观的控制和应力分析功能，旨在弥合理论建模与实际设计之间的差距。四个基准模型——一个梁、三指和四指夹爪以及一只蝴蝶——展示了其功能。该软件易于使用，对初学者和高级研究人员都适用。未来改进将通过实验验证和与行业标准有限元求解器的比较来提高准确性，确保磁性软体机器人的现实和预测仿真。

Soft robots, particularly magnetic soft robots, require specialized simulation tools to accurately model their deformation under external magnetic fields. However, existing platforms often lack dedicated support for magnetic materials, making them difficult to use for researchers at different expertise levels. This work introduces an open-source, user-friendly simulation interface using the Simulation Open Framework Architecture (SOFA), specifically designed to model magnetic soft robots. The tool enables users to define material properties, apply magnetic fields, and observe resulting deformations in real time. By integrating intuitive controls and stress analysis capabilities, it aims to bridge the gap between theoretical modeling and practical design. Four benchmark models - a beam, three- and four-finger grippers, and a butterfly - demonstrate its functionality. The software's ease of use makes it accessible to both beginners and advanced researchers. Future improvements will refine accuracy through experimental validation and comparison with industry-standard finite element solvers, ensuring realistic and predictive simulations of magnetic soft robots.
[18] arXiv:2508.10689 [中文pdf, pdf, html, 其他]: 标题：基于显著区域的前沿引导探索

标题： Biasing Frontier-Based Exploration with Saliency Areas

Matteo Luperto, Valerii Stakanov, Giacomo Boracchi, Nicola Basilico, Francesco Amigoni

评论：被欧洲移动机器人会议（ECMR）2025接受

主题：机器人技术 (cs.RO)

自主探索是一个被广泛研究的问题，其中机器人逐步构建对之前未知环境的地图。机器人使用探索策略选择下一个要到达的位置。为此，机器人必须在竞争性目标之间取得平衡，例如探索整个环境，同时尽可能快速。大多数探索策略试图最大化探索区域以加快探索；然而，它们没有考虑到环境的不同部分比其他部分更重要，因为它们可能导致发现大片未知区域。我们提出了一种方法，通过使用从神经网络获得的显著性图来识别\emph{显著区域}作为高兴趣的探索区域，该神经网络给定当前地图，实现一个终止准则来估计环境是否可以被认为是完全探索的。我们使用显著性区域来影响一些广泛使用的探索策略，通过广泛的实验活动表明，这种知识可以显著影响机器人在探索过程中的行为。

Autonomous exploration is a widely studied problem where a robot incrementally builds a map of a previously unknown environment. The robot selects the next locations to reach using an exploration strategy. To do so, the robot has to balance between competing objectives, like exploring the entirety of the environment, while being as fast as possible. Most exploration strategies try to maximise the explored area to speed up exploration; however, they do not consider that parts of the environment are more important than others, as they lead to the discovery of large unknown areas. We propose a method that identifies \emph{saliency areas} as those areas that are of high interest for exploration, by using saliency maps obtained from a neural network that, given the current map, implements a termination criterion to estimate whether the environment can be considered fully-explored or not. We use saliency areas to bias some widely used exploration strategies, showing, with an extensive experimental campaign, that this knowledge can significantly influence the behavior of the robot during exploration.
[19] arXiv:2508.10780 [中文pdf, pdf, html, 其他]: 标题：冗余机器人学习任务执行层次结构

标题： Learning Task Execution Hierarchies for Redundant Robots

Alessandro Adami, Aris Synodinos, Matteo Iovino, Ruggero Carli, Pietro Falco

主题：机器人技术 (cs.RO) ; 系统与控制 (eess.SY)

现代机器人系统，如移动机械臂、人形机器人和带手臂的空中机器人，通常具有高冗余度，使其能够同时执行多项任务。管理这种冗余是实现可靠和灵活行为的关键。一种广泛使用的方法是任务堆叠（SoT），它在统一框架内按优先级组织控制目标。然而，传统的SoT是由专家手动设计的，限制了它们的适应性和可访问性。本文介绍了一种新框架，可以从用户定义的目标中自动学习SoT的层次结构和参数。通过结合强化学习和遗传编程，系统在无需人工干预的情况下发现任务优先级和控制策略。基于直观指标的成本函数，如精度、安全性和执行时间，指导学习过程。我们通过在移动-YuMi平台上的仿真和实验验证了我们的方法，该平台是一个高冗余的双臂移动机械臂。结果表明，学习到的SoT使机器人能够动态适应变化的环境和输入，在保持强大任务执行的同时平衡竞争目标。这种方法为复杂机器人的冗余管理提供了一个通用且用户友好的解决方案，推动了以人为中心的机器人编程，并减少了对专家设计的需求。

Modern robotic systems, such as mobile manipulators, humanoids, and aerial robots with arms, often possess high redundancy, enabling them to perform multiple tasks simultaneously. Managing this redundancy is key to achieving reliable and flexible behavior. A widely used approach is the Stack of Tasks (SoT), which organizes control objectives by priority within a unified framework. However, traditional SoTs are manually designed by experts, limiting their adaptability and accessibility. This paper introduces a novel framework that automatically learns both the hierarchy and parameters of a SoT from user-defined objectives. By combining Reinforcement Learning and Genetic Programming, the system discovers task priorities and control strategies without manual intervention. A cost function based on intuitive metrics such as precision, safety, and execution time guides the learning process. We validate our method through simulations and experiments on the mobile-YuMi platform, a dual-arm mobile manipulator with high redundancy. Results show that the learned SoTs enable the robot to dynamically adapt to changing environments and inputs, balancing competing objectives while maintaining robust task execution. This approach provides a general and user-friendly solution for redundancy management in complex robots, advancing human-centered robot programming and reducing the need for expert design.
[20] arXiv:2508.10798 [中文pdf, pdf, html, 其他]: 标题： SET感知因素框架：面向自主系统的可信感知

标题： The SET Perceptual Factors Framework: Towards Assured Perception for Autonomous Systems

Troi Williams

评论： 4页，4个图表，已被接受至2025年IEEE国际机器人与自动化会议的自主系统公众信任研讨会

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI)

未来自主系统有望带来重大的社会利益，但其部署引发了关于安全性和可信度的担忧。一个主要的问题是确保机器人感知的可靠性，因为感知是安全决策的基础。感知失败通常是由于复杂但常见的环境因素造成的，可能导致事故并削弱公众信任。为解决这一问题，我们引入了SET（自我、环境和目标）感知因素框架。我们设计该框架以系统地分析诸如天气、遮挡或传感器限制等因素如何对感知产生负面影响。为了实现这一点，该框架使用SET状态树来分类这些因素的来源，使用SET因素树来建模这些来源和因素如何影响感知任务，如物体检测或姿态估计。接下来，我们利用这两种树开发感知因素模型，以量化给定任务的不确定性。我们的框架旨在通过提供一种透明且标准化的方法来识别、建模和传达感知风险，从而促进严格的安全保障，并培养公众对自主系统的更大理解和信任。

Future autonomous systems promise significant societal benefits, yet their deployment raises concerns about safety and trustworthiness. A key concern is assuring the reliability of robot perception, as perception seeds safe decision-making. Failures in perception are often due to complex yet common environmental factors and can lead to accidents that erode public trust. To address this concern, we introduce the SET (Self, Environment, and Target) Perceptual Factors Framework. We designed the framework to systematically analyze how factors such as weather, occlusion, or sensor limitations negatively impact perception. To achieve this, the framework employs SET State Trees to categorize where such factors originate and SET Factor Trees to model how these sources and factors impact perceptual tasks like object detection or pose estimation. Next, we develop Perceptual Factor Models using both trees to quantify the uncertainty for a given task. Our framework aims to promote rigorous safety assurances and cultivate greater public understanding and trust in autonomous systems by offering a transparent and standardized method for identifying, modeling, and communicating perceptual risks.
[21] arXiv:2508.10828 [中文pdf, pdf, html, 其他]: 标题：面向社交机器人的主观自我披露识别的多模态神经网络

标题： A Multimodal Neural Network for Recognizing Subjective Self-Disclosure Towards Social Robots

Henry Powell, Guy Laban, Emily S. Cross

评论：被2025年IEEE/RSJ智能机器人与系统国际会议（IROS）接收

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI)

主观自我披露是人类社会互动的重要特征。尽管在社会和行为文献中已经做了很多工作来描述主观自我披露的特征和后果，但到目前为止，很少有工作致力于开发能够准确建模它的计算系统。更少的工作尝试具体建模人类互动者如何与机器人伙伴进行自我披露。随着我们要求社交机器人在各种社会环境中与人类协同工作并建立关系，这一点变得越来越紧迫。在本文中，我们的目标是开发一个基于情感识别文献中的模型的定制多模态注意力网络，将该模型在大规模自收集的自我披露视频语料库上进行训练，并构建一个新的损失函数，即尺度保持交叉熵损失，该损失函数在该问题的分类和回归版本上都有所改进。我们的结果表明，使用我们新损失函数训练的最佳模型实现了0.83的F1分数，比最佳基线模型提高了0.48。这一结果在使社交机器人能够察觉互动伙伴的自我披露方面取得了重要进展，这种能力对于具有社会认知的社交机器人来说将是必不可少的。

Subjective self-disclosure is an important feature of human social interaction. While much has been done in the social and behavioural literature to characterise the features and consequences of subjective self-disclosure, little work has been done thus far to develop computational systems that are able to accurately model it. Even less work has been done that attempts to model specifically how human interactants self-disclose with robotic partners. It is becoming more pressing as we require social robots to work in conjunction with and establish relationships with humans in various social settings. In this paper, our aim is to develop a custom multimodal attention network based on models from the emotion recognition literature, training this model on a large self-collected self-disclosure video corpus, and constructing a new loss function, the scale preserving cross entropy loss, that improves upon both classification and regression versions of this problem. Our results show that the best performing model, trained with our novel loss function, achieves an F1 score of 0.83, an improvement of 0.48 from the best baseline model. This result makes significant headway in the aim of allowing social robots to pick up on an interaction partner's self-disclosures, an ability that will be essential in social robots with social cognition.
[22] arXiv:2508.10867 [中文pdf, pdf, html, 其他]: 标题： CVIRO：一种一致且紧密耦合的李群视觉-惯性-测距里程计

标题： CVIRO: A Consistent and Tightly-Coupled Visual-Inertial-Ranging Odometry on Lie Groups

Yizhi Zhou, Ziwei Kang, Jiawei Xia, Xuan Wang

主题：机器人技术 (cs.RO) ; 系统与控制 (eess.SY)

超宽带（UWB）被广泛用于减轻视觉惯性里程计（VIO）系统中的漂移。一致性对于确保UWB辅助的VIO系统的估计准确性至关重要。不一致的估计器会降低定位性能，其中不一致主要源于两个主要原因：（1）估计器无法保持正确的系统可观测性，以及（2）假设UWB锚点位置已知，导致对校准不确定性处理不当。在本文中，我们提出了一种基于李群的一致且紧密耦合的视觉惯性测距里程计（CVIRO）系统。我们的方法将UWB锚点状态纳入系统状态，显式考虑UWB校准不确定性，并实现机器人和锚点状态的联合和一致估计。此外，通过利用李群的不变误差特性来确保可观测性一致性。我们分析证明，CVIRO算法自然保持系统的正确不可观测子空间，从而保持估计一致性。大量的仿真和实验表明，与现有方法相比，CVIRO实现了更优的定位精度和一致性。

Ultra Wideband (UWB) is widely used to mitigate drift in visual-inertial odometry (VIO) systems. Consistency is crucial for ensuring the estimation accuracy of a UWBaided VIO system. An inconsistent estimator can degrade localization performance, where the inconsistency primarily arises from two main factors: (1) the estimator fails to preserve the correct system observability, and (2) UWB anchor positions are assumed to be known, leading to improper neglect of calibration uncertainty. In this paper, we propose a consistent and tightly-coupled visual-inertial-ranging odometry (CVIRO) system based on the Lie group. Our method incorporates the UWB anchor state into the system state, explicitly accounting for UWB calibration uncertainty and enabling the joint and consistent estimation of both robot and anchor states. Furthermore, observability consistency is ensured by leveraging the invariant error properties of the Lie group. We analytically prove that the CVIRO algorithm naturally maintains the system's correct unobservable subspace, thereby preserving estimation consistency. Extensive simulations and experiments demonstrate that CVIRO achieves superior localization accuracy and consistency compared to existing methods.
[23] arXiv:2508.10872 [中文pdf, pdf, html, 其他]: 标题：基于TLE的A2C智能体用于地面覆盖轨道路径规划

标题： TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning

Anantha Narayanan, Battu Bhanu Teja, Pruthwik Mishra

评论： 8页，6图，5表

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI)

随着低地球轨道（LEO）的日益拥挤，对地球观测卫星的有效部署和安全运行构成了持续的挑战。任务规划者现在不仅要考虑任务特定的要求，还要考虑与活跃卫星和空间碎片的碰撞风险。本工作提出了一种基于优势Actor-Critic（A2C）算法的强化学习框架，以优化卫星轨道参数，在预定义的地面半径内实现精确的陆地覆盖。通过在自定义的OpenAI Gymnasium环境中将问题表述为马尔可夫决策过程（MDP），我们的方法使用经典开普勒元素模拟轨道动力学。智能体逐步学习调整五个轨道参数——半长轴、偏心率、倾角、升交点赤经和近地点幅角，以实现目标陆地覆盖。与近端策略优化（PPO）的对比评估表明，A2C表现出更优的性能，在31.5倍更少的时间步数（2,000 vs 63,000）内实现了5.8倍更高的累积奖励（10.0 vs 9.263025）。 A2C智能体在多种目标坐标下始终满足任务目标，同时保持适合实时任务规划应用的计算效率。主要贡献包括：（1）一种基于TLE的轨道仿真环境，结合物理约束，（2）验证了Actor-Critic方法在连续轨道控制中优于信任区域方法，（3）展示了快速收敛特性，使自适应卫星部署成为可能。这种方法确立了强化学习作为可扩展和智能LEO任务规划的计算高效替代方案。

The increasing congestion of Low Earth Orbit (LEO) poses persistent challenges to the efficient deployment and safe operation of Earth observation satellites. Mission planners must now account not only for mission-specific requirements but also for the increasing collision risk with active satellites and space debris. This work presents a reinforcement learning framework using the Advantage Actor-Critic (A2C) algorithm to optimize satellite orbital parameters for precise terrestrial coverage within predefined surface radii. By formulating the problem as a Markov Decision Process (MDP) within a custom OpenAI Gymnasium environment, our method simulates orbital dynamics using classical Keplerian elements. The agent progressively learns to adjust five of the orbital parameters - semi-major axis, eccentricity, inclination, right ascension of ascending node, and the argument of perigee-to achieve targeted terrestrial coverage. Comparative evaluation against Proximal Policy Optimization (PPO) demonstrates A2C's superior performance, achieving 5.8x higher cumulative rewards (10.0 vs 9.263025) while converging in 31.5x fewer timesteps (2,000 vs 63,000). The A2C agent consistently meets mission objectives across diverse target coordinates while maintaining computational efficiency suitable for real-time mission planning applications. Key contributions include: (1) a TLE-based orbital simulation environment incorporating physics constraints, (2) validation of actor-critic methods' superiority over trust region approaches in continuous orbital control, and (3) demonstration of rapid convergence enabling adaptive satellite deployment. This approach establishes reinforcement learning as a computationally efficient alternative for scalable and intelligent LEO mission planning.

[24] arXiv:2508.10413 (交叉列表自 cs.NI) [中文pdf, pdf, html, 其他]: 标题： ROS 2 中数据分发服务的概率延迟分析

标题： Probabilistic Latency Analysis of the Data Distribution Service in ROS 2

Sanghoon Lee, Hyung-Seok Park, Jiyeong Chae, Kyung-Joon Park

评论： 12页，5图

主题：网络与互联网架构 (cs.NI) ; 机器人技术 (cs.RO)

机器人操作系统 2（ROS 2）现在是机器人通信的事实标准，它将 UDP 传输与数据分发服务（DDS）发布-订阅中间件结合在一起。 DDS 通过周期性的心跳来实现可靠性，这些心跳会请求对丢失的样本进行确认，并触发选择性重传。在有损的无线网络中，心跳周期、IP 分片和重传间隔之间的紧密耦合使得端到端延迟行为变得模糊，并且实践者在如何调整这些参数方面缺乏指导。为了解决这些问题，我们提出了一种概率延迟分析（PLA），该分析使用离散状态方法对 ROS 2 DDS 通信的可靠传输过程进行解析建模。通过对中间件级别和传输级别事件进行系统分析，PLA 计算了未确认消息的稳态概率分布和重传延迟。我们在 270 种场景中验证了我们的 PLA，探索了包传递率、消息大小以及发布和重传间隔的变化，证明了分析预测与实验结果之间高度一致。我们的研究结果建立了一个理论基础，以系统地优化无线工业机器人中的可靠性、延迟和性能。

Robot Operating System 2 (ROS 2) is now the de facto standard for robotic communication, pairing UDP transport with the Data Distribution Service (DDS) publish-subscribe middleware. DDS achieves reliability through periodic heartbeats that solicit acknowledgments for missing samples and trigger selective retransmissions. In lossy wireless networks, the tight coupling among heartbeat period, IP fragmentation, and retransmission interval obscures end to end latency behavior and leaves practitioners with little guidance on how to tune these parameters. To address these challenges, we propose a probabilistic latency analysis (PLA) that analytically models the reliable transmission process of ROS 2 DDS communication using a discrete state approach. By systematically analyzing both middleware level and transport level events, PLA computes the steady state probability distribution of unacknowledged messages and the retransmission latency. We validate our PLA across 270 scenarios, exploring variations in packet delivery ratios, message sizes, and both publishing and retransmission intervals, demonstrating a close alignment between analytical predictions and experimental results. Our findings establish a theoretical basis to systematically optimize reliability, latency, and performance in wireless industrial robotics.
[25] arXiv:2508.10567 (交叉列表自 cs.CV) [中文pdf, pdf, html, 其他]: 标题： SpaRC-AD：端到端自动驾驶中的雷达-相机融合基线

标题： SpaRC-AD: A Baseline for Radar-Camera Fusion in End-to-End Autonomous Driving

Philipp Wolters, Johannes Gilg, Torben Teepe, Gerhard Rigoll

评论： 8页，4图，5表

主题：计算机视觉与模式识别 (cs.CV) ; 机器人技术 (cs.RO)

端到端自动驾驶系统通过感知、运动预测和规划的统一优化，有望实现更强的性能。然而，基于视觉的方法在恶劣天气条件、部分遮挡和精确速度估计方面面临根本性的限制——这些是在安全敏感场景中至关重要的挑战，其中准确的运动理解和长时程轨迹预测对于避撞至关重要。为解决这些限制，我们提出了SpaRC-AD，这是一种面向规划的基于查询的端到端相机雷达融合框架。通过稀疏的3D特征对齐和多普勒-based速度估计，我们实现了强大的3D场景表示，用于优化代理锚点、地图折线和运动建模。我们的方法在多个自动驾驶任务中优于最先进的纯视觉基线，包括3D检测（+4.8% mAP）、多目标跟踪（+8.3% AMOTA）、在线地图（+1.8% mAP）、运动预测（-4.0% mADE）和轨迹规划（-0.1m L2和-9% TPC）。我们在多个具有挑战性的基准上实现了空间一致性和时间一致性，包括真实世界开放环nuScenes、长时程T-nuScenes和闭环模拟器Bench2Drive。我们展示了在安全关键场景中基于雷达的融合的有效性，其中准确的运动理解和长时程轨迹预测对于避撞至关重要。所有实验的源代码可在https://phi-wol.github.io/sparcad/获得

End-to-end autonomous driving systems promise stronger performance through unified optimization of perception, motion forecasting, and planning. However, vision-based approaches face fundamental limitations in adverse weather conditions, partial occlusions, and precise velocity estimation - critical challenges in safety-sensitive scenarios where accurate motion understanding and long-horizon trajectory prediction are essential for collision avoidance. To address these limitations, we propose SpaRC-AD, a query-based end-to-end camera-radar fusion framework for planning-oriented autonomous driving. Through sparse 3D feature alignment, and doppler-based velocity estimation, we achieve strong 3D scene representations for refinement of agent anchors, map polylines and motion modelling. Our method achieves strong improvements over the state-of-the-art vision-only baselines across multiple autonomous driving tasks, including 3D detection (+4.8% mAP), multi-object tracking (+8.3% AMOTA), online mapping (+1.8% mAP), motion prediction (-4.0% mADE), and trajectory planning (-0.1m L2 and -9% TPC). We achieve both spatial coherence and temporal consistency on multiple challenging benchmarks, including real-world open-loop nuScenes, long-horizon T-nuScenes, and closed-loop simulator Bench2Drive. We show the effectiveness of radar-based fusion in safety-critical scenarios where accurate motion understanding and long-horizon trajectory prediction are essential for collision avoidance. The source code of all experiments is available at https://phi-wol.github.io/sparcad/
[26] arXiv:2508.10747 (交叉列表自 cs.AI) [中文pdf, pdf, html, 其他]: 标题：无需褪色的扩展：基于目标的稀疏GNN用于基于RL的广义规划

标题： Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

Sangwoo Jeon, Juchul Shin, Gyeong-Tae Kim, YeonJe Cho, Seongwoo Kim

评论： 16页，10图

主题：人工智能 (cs.AI) ; 机器人技术 (cs.RO)

使用深度强化学习（RL）结合图神经网络（GNNs）的广义规划在由PDDL描述的各种符号规划领域中已显示出有前景的结果。然而，现有方法通常将规划状态表示为全连接图，导致边信息出现组合爆炸，并随着问题规模的增长出现显著的稀疏性，特别是在大型网格环境中尤为明显。这种密集表示导致节点级信息被稀释，内存需求呈指数增长，并最终使大规模问题的学习变得不可行。为了解决这些挑战，我们提出了一种稀疏的目标感知GNN表示，该表示选择性地编码相关的局部关系，并显式地整合与目标相关的空间特征。我们通过在网格世界中基于PDDL设计新颖的无人机任务场景来验证我们的方法，有效模拟了现实的任务执行环境。我们的实验结果表明，我们的方法能够有效地扩展到之前使用密集图表示无法实现的大网格尺寸，并显著提高了策略的泛化能力和成功率。我们的研究结果为解决现实中的大规模广义规划任务提供了实用的基础。

Generalized planning using deep reinforcement learning (RL) combined with graph neural networks (GNNs) has shown promising results in various symbolic planning domains described by PDDL. However, existing approaches typically represent planning states as fully connected graphs, leading to a combinatorial explosion in edge information and substantial sparsity as problem scales grow, especially evident in large grid-based environments. This dense representation results in diluted node-level information, exponentially increases memory requirements, and ultimately makes learning infeasible for larger-scale problems. To address these challenges, we propose a sparse, goal-aware GNN representation that selectively encodes relevant local relationships and explicitly integrates spatial features related to the goal. We validate our approach by designing novel drone mission scenarios based on PDDL within a grid world, effectively simulating realistic mission execution environments. Our experimental results demonstrate that our method scales effectively to larger grid sizes previously infeasible with dense graph representations and substantially improves policy generalization and success rates. Our findings provide a practical foundation for addressing realistic, large-scale generalized planning tasks.

[27] arXiv:2209.00334 (替换) [中文pdf, pdf, 其他]: 标题：基于视觉和地形探测的步行机器人安全导航可行驶性分析

标题： Traversability analysis with vision and terrain probing for safe legged robot navigation

Garen Haddeler, Meng Yee Michael Chuah, Yangwei You, Jianle Chan, Albertus H. Adiwahono, Wei Yun Yau, Chee-Meng Chew

期刊参考：机器人学与人工智能前沿，第9卷 - 2022

主题：机器人技术 (cs.RO)

受人类在未知地形上行进时行为的启发，本研究提出使用探测策略，并将其整合到可通行性分析框架中，以解决在未知粗糙地形上的安全导航问题。我们的框架将可塌陷性信息整合到我们现有的可通行性分析中，因为仅靠视觉和几何信息可能会被不可预测的非刚性地形（如软土、灌木区或水坑）误导。通过新的可通行性分析框架，我们的机器人对不可预测地形有更全面的评估，这对于其在户外环境中的安全性至关重要。该流程首先使用RGB-D相机识别地形的几何和语义特性，并确定可疑地形上的期望探测位置。这些区域通过力传感器进行探测，以确定机器人跨越时地形塌陷的风险。这种风险被表述为一个可塌陷性度量，用于估计不可预测区域的地面塌陷可能性。之后，将可塌陷性度量与几何和语义空间数据结合并进行分析，以生成全局和局部的可通行性网格地图。这些可通行性网格地图告诉机器人是否可以安全地跨越地图的不同区域。然后利用这些网格地图为机器人生成最优路径，以安全地导航到目标位置。我们的方法已在四足机器人上通过仿真和真实实验成功验证。

Inspired by human behavior when traveling over unknown terrain, this study proposes the use of probing strategies and integrates them into a traversability analysis framework to address safe navigation on unknown rough terrain. Our framework integrates collapsibility information into our existing traversability analysis, as vision and geometric information alone could be misled by unpredictable non-rigid terrains such as soft soil, bush area, or water puddles. With the new traversability analysis framework, our robot has a more comprehensive assessment of unpredictable terrain, which is critical for its safety in outdoor environments. The pipeline first identifies the terrain's geometric and semantic properties using an RGB-D camera and desired probing locations on questionable terrains. These regions are probed using a force sensor to determine the risk of terrain collapsing when the robot steps over it. This risk is formulated as a collapsibility metric, which estimates an unpredictable region's ground collapsibility. Thereafter, the collapsibility metric, together with geometric and semantic spatial data, is combined and analyzed to produce global and local traversability grid maps. These traversability grid maps tell the robot whether it is safe to step over different regions of the map. The grid maps are then utilized to generate optimal paths for the robot to safely navigate to its goal. Our approach has been successfully verified on a quadrupedal robot in both simulation and real-world experiments.
[28] arXiv:2209.09508 (替换) [中文pdf, pdf, html, 其他]: 标题：实时数字双框架用于预测腿式机器人的可塌陷地形

标题： Real-time Digital Double Framework to Predict Collapsible Terrains for Legged Robots

Garen Haddeler, Hari P. Palanivelu, Yung Chuen Ng, Fabien Colonnier, Albertus H. Adiwahono, Zhibin Li, Chee-Meng Chew, Meng Yee Michael Chuah

评论： IEEE/RSJ 国际智能机器人与系统会议（IROS）。预印本版本。2022年6月被接受

期刊参考： 2022年IEEE/RSJ智能机器人与系统国际会议（IROS），日本京都，2022年，第10387-10394页

主题：机器人技术 (cs.RO)

受数字孪生系统的启发，开发了一种新颖的实时数字双胞胎框架，以增强机器人对地形条件的感知能力。基于相同的物理模型和运动控制，本研究利用与真实机器人同步的模拟数字双胞胎，捕获并提取两个系统之间的差异信息，这提供了多物理量的高维线索，以表示模型世界与现实世界之间的差异。柔软的非刚性地形会导致腿部运动的常见故障，仅靠视觉感知不足以估计这些地形的物理特性。我们使用数字双胞胎来开发塌陷性的估计，通过动态行走期间的物理交互解决了这个问题。真实机器人与其数字双胞胎之间的传感测量差异被用作基于学习的算法的输入，用于地形塌陷性分析。尽管仅在仿真中进行训练，所学模型能够在仿真和现实世界中成功进行塌陷性估计。我们的结果评估显示了在不同场景中的泛化能力，以及数字双胞胎在可靠检测地面条件细微差别的优势。

Inspired by the digital twinning systems, a novel real-time digital double framework is developed to enhance robot perception of the terrain conditions. Based on the very same physical model and motion control, this work exploits the use of such simulated digital double synchronized with a real robot to capture and extract discrepancy information between the two systems, which provides high dimensional cues in multiple physical quantities to represent differences between the modelled and the real world. Soft, non-rigid terrains cause common failures in legged locomotion, whereby visual perception solely is insufficient in estimating such physical properties of terrains. We used digital double to develop the estimation of the collapsibility, which addressed this issue through physical interactions during dynamic walking. The discrepancy in sensory measurements between the real robot and its digital double are used as input of a learning-based algorithm for terrain collapsibility analysis. Although trained only in simulation, the learned model can perform collapsibility estimation successfully in both simulation and real world. Our evaluation of results showed the generalization to different scenarios and the advantages of the digital double to reliably detect nuances in ground conditions.
[29] arXiv:2310.17879 (替换) [中文pdf, pdf, html, 其他]: 标题：基于分割协方差交集滤波器的精确AprilTag地图视觉定位用于仓库机器人导航

标题： Split Covariance Intersection Filter Based Visual Localization With Accurate AprilTag Map For Warehouse Robot Navigation

Susu Fang, Yanhao Li, Hao Li

主题：机器人技术 (cs.RO)

在仓库环境中，准确且高效的定位需要方便建立的地图，这是移动机器人操作的基本要求。借助基于激光雷达的SLAM，可以方便地建立准确的AprilTag地图。确实，与基于视觉的系统相比，基于激光雷达的系统通常在商业上不具备竞争力，但幸运的是，对于仓库应用来说，只需要一个基于激光雷达的SLAM系统来建立准确的AprilTag地图，而大量视觉定位系统可以共享这个已建立的AprilTag地图用于它们自身的操作。因此，基于激光雷达的SLAM系统的成本实际上由大量视觉定位系统分摊，对于实际的仓库应用来说，其成本是可接受的，甚至是微不足道的。一旦有了准确的AprilTag地图，视觉定位就实现了递归估计，该估计融合了AprilTag测量值（即AprilTag检测结果）和机器人运动数据。 AprilTag测量值可能是非线性的部分测量值；这可以通过众所周知的扩展卡尔曼滤波器（EKF）在局部线性化的理念下进行处理。 AprilTag测量值也倾向于有时间相关性；然而，这不能被EKF合理处理。采用分割协方差交集滤波器（Split CIF）来处理AprilTag测量值之间的时间相关性。 Split CIF（在局部线性化的理念下）也可以处理AprilTag的非线性部分测量值。基于Split CIF的视觉定位系统包含一个测量自适应机制，以处理AprilTag测量值中的异常值，并采用动态初始化机制来解决绑架问题。在真实仓库环境中的比较研究表明了基于Split CIF的视觉定位解决方案的潜力和优势。

Accurate and efficient localization with conveniently-established map is the fundamental requirement for mobile robot operation in warehouse environments. An accurate AprilTag map can be conveniently established with the help of LiDAR-based SLAM. It is true that a LiDAR-based system is usually not commercially competitive in contrast with a vision-based system, yet fortunately for warehouse applications, only a single LiDAR-based SLAM system is needed to establish an accurate AprilTag map, whereas a large amount of visual localization systems can share this established AprilTag map for their own operations. Therefore, the cost of a LiDAR-based SLAM system is actually shared by the large amount of visual localization systems, and turns to be acceptable and even negligible for practical warehouse applications. Once an accurate AprilTag map is available, visual localization is realized as recursive estimation that fuses AprilTag measurements (i.e. AprilTag detection results) and robot motion data. AprilTag measurements may be nonlinear partial measurements; this can be handled by the well-known extended Kalman filter (EKF) in the spirit of local linearization. AprilTag measurements tend to have temporal correlation as well; however, this cannot be reasonably handled by the EKF. The split covariance intersection filter (Split CIF) is adopted to handle temporal correlation among AprilTag measurements. The Split CIF (in the spirit of local linearization) can also handle AprilTag nonlinear partial measurements. The Split CIF based visual localization system incorporates a measurement adaptive mechanism to handle outliers in AprilTag measurements and adopts a dynamic initialization mechanism to address the kidnapping problem. A comparative study in real warehouse environments demonstrates the potential and advantage of the Split CIF based visual localization solution.
[30] arXiv:2405.02754 (替换) [中文pdf, pdf, html, 其他]: 标题：隐式安全集算法用于可证明安全的强化学习

标题： Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

Weiye Zhao, Feihan Li, Changliu Liu

评论：被《人工智能研究杂志》接受。arXiv管理员注：与arXiv:2308.13140存在文字重叠。

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)

深度强化学习（DRL）在许多连续控制任务中表现出色。然而，DRL在现实世界应用中的一个重大障碍是缺乏安全保证。尽管DRL代理可以通过奖励塑造在期望上满足系统安全，但设计能够在每个时间步始终满足硬约束（例如，安全规范）的代理仍然是一个巨大的挑战。相比之下，安全控制领域的现有工作提供了对硬安全约束持续满足的保证。然而，这些方法需要显式的解析系统动态模型来合成安全控制，而在DRL环境中通常无法获得这些模型。在本文中，我们提出了一种无模型的安全控制算法，即隐式安全集算法，用于为DRL代理合成保障措施，确保在整个训练过程中可证明的安全性。所提出的算法仅通过查询一个黑盒动态函数（例如，数字孪生模拟器）来合成一个安全指数（屏障证书）和后续的安全控制律。此外，我们理论证明了隐式安全集算法保证了对连续时间和离散时间系统的安全集有限时间收敛和前向不变性。我们在最先进的Safety Gym基准上验证了所提出的算法，在此基准上，它在与最先进的安全DRL方法相比获得$95\% \pm 9\%$累积奖励的同时实现了零安全违规。此外，该算法在具有并行计算的高维系统中表现良好。

Deep reinforcement learning (DRL) has demonstrated remarkable performance in many continuous control tasks. However, a significant obstacle to the real-world application of DRL is the lack of safety guarantees. Although DRL agents can satisfy system safety in expectation through reward shaping, designing agents to consistently meet hard constraints (e.g., safety specifications) at every time step remains a formidable challenge. In contrast, existing work in the field of safe control provides guarantees on persistent satisfaction of hard safety constraints. However, these methods require explicit analytical system dynamics models to synthesize safe control, which are typically inaccessible in DRL settings. In this paper, we present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents that ensure provable safety throughout training. The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function (e.g., a digital twin simulator). Moreover, we theoretically prove that the implicit safe set algorithm guarantees finite time convergence to the safe set and forward invariance for both continuous-time and discrete-time systems. We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95\% \pm 9\%$ cumulative reward compared to state-of-the-art safe DRL methods. Furthermore, the resulting algorithm scales well to high-dimensional systems with parallel computing.
[31] arXiv:2406.13434 (替换) [中文pdf, pdf, html, 其他]: 标题：拥挤环境中基于深度强化学习的触觉感知动态障碍避让

标题： Tactile Aware Dynamic Obstacle Avoidance in Crowded Environment with Deep Reinforcement Learning

Yung Chuen Ng, Qi Wen Shervina Lim, Chun Ye Tan, Zhen Hao Gan, Meng Yee Michael Chuah

主题：机器人技术 (cs.RO)

在拥挤环境中运行的移动机器人需要能够高效地在人类和周围障碍物之间导航，同时遵守安全标准和社会合规的行为方式。这种规模的机器人导航问题可以被归类为局部路径规划和轨迹优化问题。本工作提出了一组力传感器，作为触觉层，以补充使用LiDAR的目的，以诱导对移动机器人附近未被LiDAR检测到的任何周围物体接触的意识。通过结合触觉层，机器人可以在其运动中承担更多风险，并可能直接靠近障碍物或墙壁，并轻轻挤过去。此外，我们通过Pybullet构建了一个仿真平台，该平台集成了机器人操作系统（ROS）和强化学习（RL）。在一个上面训练了一个触觉感知神经网络模型，以创建基于RL的局部路径规划器用于动态障碍物避让。我们的方法成功地在全向移动机器人上进行了演示，该机器人能够在拥挤环境中以高敏捷性和运动多样性进行导航，而不会对非接触的附近障碍物过于敏感。

Mobile robots operating in crowded environments require the ability to navigate among humans and surrounding obstacles efficiently while adhering to safety standards and socially compliant mannerisms. This scale of the robot navigation problem may be classified as both a local path planning and trajectory optimization problem. This work presents an array of force sensors that act as a tactile layer to complement the use of a LiDAR for the purpose of inducing awareness of contact with any surrounding objects within immediate vicinity of a mobile robot undetected by LiDARs. By incorporating the tactile layer, the robot can take more risks in its movements and possibly go right up to an obstacle or wall, and gently squeeze past it. In addition, we built up a simulation platform via Pybullet which integrates Robot Operating System (ROS) and reinforcement learning (RL) together. A touch-aware neural network model was trained on it to create an RL-based local path planner for dynamic obstacle avoidance. Our proposed method was demonstrated successfully on an omni-directional mobile robot who was able to navigate in a crowded environment with high agility and versatility in movement, while not being overly sensitive to nearby obstacles-not-in-contact.
[32] arXiv:2408.08555 (替换) [中文pdf, pdf, html, 其他]: 标题：使用玫瑰花形扫描模式激光雷达检测和跟踪微型无人机

标题： Detection and Tracking of MAVs Using a Rosette Scanning Pattern LiDAR

Sándor Gazdag, Tom Möller, Anita Keszler, András L. Majdik

主题：机器人技术 (cs.RO) ; 计算机视觉与模式识别 (cs.CV)

商用微型飞行器（MAVs）在过去十年中得到了广泛应用，带来了社会效益，但也引发了如空域违规和隐私问题等风险。由于安全风险的增加，自主无人机检测和跟踪系统的发展已成为优先事项。在本研究中，我们通过使用非重复玫瑰花扫描模式的激光雷达来应对这一挑战，特别关注利用传感器特性来提高检测距离。所提出的方法使用带有速度分量的粒子滤波器来进行无人机的检测和跟踪，这提供了额外的重新检测能力。使用云台平台以利用玫瑰花扫描模式激光雷达的特定特性，通过将跟踪目标保持在测量最密集的中心区域。系统的检测能力和准确性通过室内实验进行了验证，而最大检测距离在我们的室外实验中得到展示。我们的方法在室内达到了与最先进的方法相当的准确性，同时将最大检测范围提高了约80%超过最先进的室外方法。

The use of commercial Micro Aerial Vehicles (MAVs) has surged in the past decade, offering societal benefits but also raising risks such as airspace violations and privacy concerns. Due to the increased security risks, the development of autonomous drone detection and tracking systems has become a priority. In this study, we tackle this challenge, by using non-repetitive rosette scanning pattern LiDARs, particularly focusing on increasing the detection distance by leveraging the characteristics of the sensor. The presented method utilizes a particle filter with a velocity component for the detection and tracking of the drone, which offers added re-detection capability. A Pan-Tilt platform is utilized to take advantage of the specific characteristics of the rosette scanning pattern LiDAR by keeping the tracked object in the center where the measurement is most dense. The detection capabilities and accuracy of the system are validated through indoor experiments, while the maximum detection distance is shown in our outdoor experiments. Our approach achieved accuracy on par with the state-of-the-art indoor method while increasing the maximum detection range by approximately 80\% beyond the state-of-the-art outdoor method.
[33] arXiv:2409.17702 (替换) [中文pdf, pdf, html, 其他]: 标题：使用终身机器人经验的分层表示进行情景记忆语言化

标题： Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience

Leonard Bärmann, Chad DeChant, Joana Plewnia, Fabian Peller-Konrad, Daniel Bauer, Tamim Asfour, Alex Waibel

评论：人形机器人2025。代码、数据和演示视频见 https://hierarchical-emv.github.io

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI)

机器人经验的表述，即对机器人过去的总结和问答，是提升人机交互的关键能力。先前的研究应用了基于规则的系统或微调的深度模型来表述短期（几分钟长）的情节数据流，限制了泛化性和可迁移性。在我们的工作中，我们应用大规模预训练模型，以零样本或少量样本的方式解决这个问题，并特别关注于表述长期的经历。为此，我们从情节记忆（EM）中推导出一种树状的数据结构，较低层级表示原始感知和本体感觉数据，较高层级将事件抽象为自然语言概念。给定这种从经验流构建的层次化表示，我们应用一个大型语言模型作为代理，根据用户的查询与EM进行交互式搜索，动态扩展（最初坍缩的）树节点以找到相关信息。这种方法即使在扩展到数月的机器人经验数据时也能保持计算成本较低。我们在模拟的家庭机器人数据、人类第一视角视频和真实世界机器人记录上评估了我们的方法，证明了其灵活性和可扩展性。

Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short (several-minute-long) streams of episodic data, limiting generalization and transferability. In our work, we apply large pretrained models to tackle this task with zero or few examples, and specifically focus on verbalizing life-long experiences. For this, we derive a tree-like data structure from episodic memory (EM), with lower levels representing raw perception and proprioception data, and higher levels abstracting events to natural language concepts. Given such a hierarchical representation built from the experience stream, we apply a large language model as an agent to interactively search the EM given a user's query, dynamically expanding (initially collapsed) tree nodes to find the relevant information. The approach keeps computational costs low even when scaling to months of robot experience data. We evaluate our method on simulated household robot data, human egocentric videos, and real-world robot recordings, demonstrating its flexibility and scalability.
[34] arXiv:2411.07699 (替换) [中文pdf, pdf, html, 其他]: 标题： RINO：具有非迭代估计的精确鲁棒雷达惯性里程计

标题： RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation

Shuocheng Yang, Yueming Cao, Shengbo Eben Li, Jianqiang Wang, Shaobing Xu

主题：机器人技术 (cs.RO)

里程计在恶劣天气条件下，如雾、雨和雪，面临重大挑战，因为传统的视觉和LiDAR方法通常会表现出性能下降。雷达惯性里程计（RIO）由于其在这些环境中的鲁棒性而成为一种有前景的解决方案。在本文中，我们提出了RINO，这是一种非迭代的RIO框架，以自适应松耦合的方式实现。基于ORORA作为雷达里程计的基线， RINO引入了几个关键改进，包括关键点提取、运动失真补偿以及通过自适应投票机制进行位姿估计。这种投票策略促进了高效的多项式时间优化，同时量化了雷达模块位姿估计的不确定性。估计的不确定性随后被整合到卡尔曼滤波框架内的最大后验（MAP）估计中。与之前的松耦合里程计系统不同，RINO不仅保留了雷达组件的全局和鲁棒配准能力，而且在融合过程中动态考虑了每个传感器的实时操作状态。在公开数据集上进行的实验结果表明，与基线方法相比，RINO分别将平移和旋转误差减少了1.06%和 0.09{\deg }/100m，从而显著提高了其准确性。此外，RINO实现了与最先进方法相当的性能。

Odometry in adverse weather conditions, such as fog, rain, and snow, presents significant challenges, as traditional vision and LiDAR-based methods often suffer from degraded performance. Radar-Inertial Odometry (RIO) has emerged as a promising solution due to its resilience in such environments. In this paper, we present RINO, a non-iterative RIO framework implemented in an adaptively loosely coupled manner. Building upon ORORA as the baseline for radar odometry, RINO introduces several key advancements, including improvements in keypoint extraction, motion distortion compensation, and pose estimation via an adaptive voting mechanism. This voting strategy facilitates efficient polynomial-time optimization while simultaneously quantifying the uncertainty in the radar module's pose estimation. The estimated uncertainty is subsequently integrated into the maximum a posteriori (MAP) estimation within a Kalman filter framework. Unlike prior loosely coupled odometry systems, RINO not only retains the global and robust registration capabilities of the radar component but also dynamically accounts for the real-time operational state of each sensor during fusion. Experimental results conducted on publicly available datasets demonstrate that RINO reduces translation and rotation errors by 1.06% and 0.09{\deg}/100m, respectively, when compared to the baseline method, thus significantly enhancing its accuracy. Furthermore, RINO achieves performance comparable to state-of-the-art methods.
[35] arXiv:2411.19134 (替换) [中文pdf, pdf, html, 其他]: 标题：考虑多种运动模型的视觉SLAMMOT

标题： Visual SLAMMOT Considering Multiple Motion Models

Peilin Tian, Hao Li

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 计算机视觉与模式识别 (cs.CV)

同时定位与建图（SLAM）和多目标跟踪（MOT）是自动驾驶领域的重要任务，吸引了大量的研究关注。虽然SLAM旨在生成实时地图并在陌生环境中确定车辆的位姿，而MOT则专注于对多个动态物体进行实时识别和跟踪。尽管它们的重要性，普遍的方法将SLAM和MOT视为自主车辆系统中的独立模块，导致了固有的限制。经典SLAM方法通常依赖于静态环境假设，适用于室内而非动态室外场景。相反，传统的MOT技术通常依赖于车辆已知的状态，这限制了基于此先验的对象状态估计的准确性。为了解决这些挑战，之前的工作引入了统一的SLAMMOT范式，但主要集中在简单的运动模式上。在我们团队之前的工作IMM-SLAMMOT\cite{IMM-SLAMMOT}中，我们提出了一种新方法，将多种运动模型考虑纳入SLAMMOT，即紧密耦合的SLAM和MOT，在基于激光雷达的系统中展示了其有效性。本文研究了将这种方法实例化为视觉SLAMMOT的可行性和优势，弥合了激光雷达和视觉传感机制之间的差距。具体而言，我们提出了一种考虑多种运动模型的视觉SLAMMOT解决方案，并验证了IMM-SLAMMOT在视觉领域的固有优势。

Simultaneous Localization and Mapping (SLAM) and Multi-Object Tracking (MOT) are pivotal tasks in the realm of autonomous driving, attracting considerable research attention. While SLAM endeavors to generate real-time maps and determine the vehicle's pose in unfamiliar settings, MOT focuses on the real-time identification and tracking of multiple dynamic objects. Despite their importance, the prevalent approach treats SLAM and MOT as independent modules within an autonomous vehicle system, leading to inherent limitations. Classical SLAM methodologies often rely on a static environment assumption, suitable for indoor rather than dynamic outdoor scenarios. Conversely, conventional MOT techniques typically rely on the vehicle's known state, constraining the accuracy of object state estimations based on this prior. To address these challenges, previous efforts introduced the unified SLAMMOT paradigm, yet primarily focused on simplistic motion patterns. In our team's previous work IMM-SLAMMOT\cite{IMM-SLAMMOT}, we present a novel methodology incorporating consideration of multiple motion models into SLAMMOT i.e. tightly coupled SLAM and MOT, demonstrating its efficacy in LiDAR-based systems. This paper studies feasibility and advantages of instantiating this methodology as visual SLAMMOT, bridging the gap between LiDAR and vision-based sensing mechanisms. Specifically, we propose a solution of visual SLAMMOT considering multiple motion models and validate the inherent advantages of IMM-SLAMMOT in the visual domain.
[36] arXiv:2412.19948 (替换) [中文pdf, pdf, html, 其他]: 标题：运动规划扩散：使用扩散模型学习和适应机器人运动规划

标题： Motion Planning Diffusion: Learning and Adapting Robot Motion Planning with Diffusion Models

J. Carvalho, A. Le, P. Kicki, D. Koert, J. Peters

主题：机器人技术 (cs.RO)

优化机器人运动规划算法的性能高度依赖于初始解，通常通过运行基于采样的规划器来获得无碰撞路径。然而，这些方法在高维和复杂场景中可能较慢，并且会产生不平滑的解。鉴于之前解决的路径规划问题，学习其分布并将其作为新类似问题的先验是非常可取的。一些工作提出利用这一先验来引导运动规划问题，要么通过从该先验中采样初始解，要么在轨迹优化中使用其分布进行最大后验公式。在本工作中，我们引入了运动规划扩散（MPD），一种使用扩散模型学习轨迹分布先验的算法。这些生成模型在编码多模态数据方面显示出越来越大的成功，并具有适用于基于梯度的运动规划的优良特性，如成本指导。给定一个运动规划问题，我们构建一个成本函数，并在去噪过程中结合学习到的先验和成本函数梯度对后验分布进行采样。我们不是在整个轨迹点上学习先验，而是提出使用线性运动基元，特别是B样条曲线，来学习轨迹的低维表示。这种参数化保证生成的轨迹是平滑的，可以在更高频率下进行插值，并且比密集的点表示需要更少的参数。我们展示了我们的方法在从简单的2D到更复杂的任务中的结果，使用的是7自由度机械臂。除了从模拟数据中学习外，我们还使用了现实世界中抓取和放置任务的人类示范。

The performance of optimization-based robot motion planning algorithms is highly dependent on the initial solutions, commonly obtained by running a sampling-based planner to obtain a collision-free path. However, these methods can be slow in high-dimensional and complex scenes and produce non-smooth solutions. Given previously solved path-planning problems, it is highly desirable to learn their distribution and use it as a prior for new similar problems. Several works propose utilizing this prior to bootstrap the motion planning problem, either by sampling initial solutions from it, or using its distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we introduce Motion Planning Diffusion (MPD), an algorithm that learns trajectory distribution priors with diffusion models. These generative models have shown increasing success in encoding multimodal data and have desirable properties for gradient-based motion planning, such as cost guidance. Given a motion planning problem, we construct a cost function and sample from the posterior distribution using the learned prior combined with the cost function gradients during the denoising process. Instead of learning the prior on all trajectory waypoints, we propose learning a lower-dimensional representation of a trajectory using linear motion primitives, particularly B-spline curves. This parametrization guarantees that the generated trajectory is smooth, can be interpolated at higher frequencies, and needs fewer parameters than a dense waypoint representation. We demonstrate the results of our method ranging from simple 2D to more complex tasks using a 7-dof robot arm manipulator. In addition to learning from simulated data, we also use human demonstrations on a real-world pick-and-place task.
[37] arXiv:2503.04798 (替换) [中文pdf, pdf, html, 其他]: 标题：向现实世界推进MAPF：一个可扩展的多智能体现实测试平台（SMART）

标题： Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)

Jingtian Yan, Zhifei Li, William Kang, Kevin Zheng, Yulun Zhang, Zhe Chen, Yue Zhang, Daniel Harabor, Stephen F. Smith, Jiaoyang Li

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI)

我们介绍了可扩展多智能体现实测试平台（SMART），这是一个用于评估多智能体路径规划（MAPF）算法的现实且高效的软件工具。MAPF专注于为一组智能体规划无碰撞的路径。虽然最先进的MAPF算法可以在几秒钟内为数百个机器人规划路径，但它们通常依赖于简化的机器人模型，使其在现实世界中的性能不明确。研究人员通常缺乏在实验室环境中访问数百个物理机器人的机会来评估这些算法。同时，缺乏MAPF专业知识的工业专业人士需要一个易于使用的模拟器，以在其特定设置中高效地测试和理解MAPF算法的性能。SMART具有几个优势：（1）SMART使用基于物理引擎的模拟器创建现实的仿真环境，考虑了机器人运动学和执行不确定性等复杂的现实因素，（2）SMART基于动作依赖图的执行监控框架，促进了与各种MAPF算法和机器人模型的无缝集成，（3）SMART可扩展到数千个机器人。代码可在https://github.com/smart-mapf/smart公开获取。

We present Scalable Multi-Agent Realistic Testbed (SMART), a realistic and efficient software tool for evaluating Multi-Agent Path Finding (MAPF) algorithms. MAPF focuses on planning collision-free paths for a group of agents. While state-ofthe-art MAPF algorithms can plan paths for hundreds of robots in seconds, they often rely on simplified robot models, making their real-world performance unclear. Researchers typically lack access to hundreds of physical robots in laboratory settings to evaluate the algorithms. Meanwhile, industrial professionals who lack expertise in MAPF require an easy-to-use simulator to efficiently test and understand the performance of MAPF algorithms in their specific settings. SMART fills this gap with several advantages: (1) SMART uses physics-engine-based simulators to create realistic simulation environments, accounting for complex real-world factors such as robot kinodynamics and execution uncertainties, (2) SMART uses an execution monitor framework based on the Action Dependency Graph, facilitating seamless integration with various MAPF algorithms and robot models, and (3) SMART scales to thousands of robots. The code is publicly available at https://github.com/smart-mapf/smart.
[38] arXiv:2503.06795 (替换) [中文pdf, pdf, html, 其他]: 标题：机器人超声引导下解剖代表性假体的股动脉重建

标题： Robotic Ultrasound-Guided Femoral Artery Reconstruction of Anatomically-Representative Phantoms

Lidia Al-Zogbi, Deepak Raina, Vinciya Pandian, Thorsten Fleiter, Axel Krieger

主题：机器人技术 (cs.RO) ; 计算机视觉与模式识别 (cs.CV)

股动脉入路对于许多临床操作至关重要，包括诊断性血管造影、治疗性导管插入和紧急干预。尽管其作用至关重要，但由于解剖结构的变异性、覆盖的脂肪组织以及需要精确的超声（US）引导，成功的血管入路仍然具有挑战性。针头放置错误可能导致严重并发症，因此该操作仅限于在受控医院环境中经验丰富的临床医生进行。虽然机器人系统通过自主扫描和血管重建在解决这些挑战方面显示出潜力，但由于依赖于无法捕捉人体解剖复杂性的简化假体模型，临床转化仍然有限。在本研究中，我们提出了一种用于分叉股动脉自主机器人超声扫描的方法，并在五个从真实患者计算机断层扫描（CT）数据创建的血管假体上进行了验证。此外，我们引入了一个基于视频的深度学习超声分割网络，专门用于血管成像，实现了改进的三维动脉重建。所提出的网络在新的血管数据集上达到了89.21%的Dice分数和80.54%的交并比。重建的动脉中心线与真实CT数据进行对比，平均L2误差为0.91+/-0.70毫米，平均Hausdorff距离为4.36+/-1.11毫米。本研究首次在多样化患者特异性假体上验证了用于股动脉超声扫描的自主机器人系统，为评估机器人在血管成像和介入中的性能提供了一个更先进的框架。

Femoral artery access is essential for numerous clinical procedures, including diagnostic angiography, therapeutic catheterization, and emergency interventions. Despite its critical role, successful vascular access remains challenging due to anatomical variability, overlying adipose tissue, and the need for precise ultrasound (US) guidance. Needle placement errors can result in severe complications, thereby limiting the procedure to highly skilled clinicians operating in controlled hospital environments. While robotic systems have shown promise in addressing these challenges through autonomous scanning and vessel reconstruction, clinical translation remains limited due to reliance on simplified phantom models that fail to capture human anatomical complexity. In this work, we present a method for autonomous robotic US scanning of bifurcated femoral arteries, and validate it on five vascular phantoms created from real patient computed tomography (CT) data. Additionally, we introduce a video-based deep learning US segmentation network tailored for vascular imaging, enabling improved 3D arterial reconstruction. The proposed network achieves a Dice score of 89.21% and an Intersection over Union of 80.54% on a new vascular dataset. The reconstructed artery centerline is evaluated against ground truth CT data, showing an average L2 error of 0.91+/-0.70 mm, with an average Hausdorff distance of 4.36+/-1.11mm. This study is the first to validate an autonomous robotic system for US scanning of the femoral artery on a diverse set of patient-specific phantoms, introducing a more advanced framework for evaluating robotic performance in vascular imaging and intervention.
[39] arXiv:2503.20839 (替换) [中文pdf, pdf, html, 其他]: 标题： TAR：通过对比学习实现的四足运动教师对齐表示

标题： TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion

Amr Mousa, Neil Karavis, Michele Caprio, Wei Pan, Richard Allmendinger

评论：这项工作已被接受在IEEE/RSJ智能机器人与系统国际会议（IROS）2025上发表。

主题：机器人技术 (cs.RO) ; 机器学习 (cs.LG) ; 系统与控制 (eess.SY)

四足运动通过强化学习（RL）通常使用教师-学生范式来解决，其中特权教师指导本体感觉学生策略。然而，诸如特权教师与仅本体感觉的学生之间的表示不匹配、由于行为克隆导致的协变量偏移以及缺乏可部署的适应性等问题，导致在现实场景中泛化能力较差。我们提出了通过对比学习对齐教师表示（TAR），一种利用特权信息和自监督对比学习来弥合这一差距的框架。通过在模拟中使用对比目标将表示对齐到特权教师，我们的学生策略学习到结构化的潜在空间，并在分布外（OOD）场景中表现出稳健的泛化能力，超过了完全特权的“教师”。结果表明，与最先进的基线相比，训练速度提高了2倍，以达到最佳性能。与现有方法相比，OOD场景的泛化能力平均提高了40%。此外，TAR在部署期间无缝过渡到学习，而无需特权状态，为样本高效、自适应的运动设定了新基准，并在现实场景中实现了持续微调。开源代码和视频可在 https://amrmousa.com/TARLoco/ 获取。

Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between privileged teacher and proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaptation; lead to poor generalization in real-world scenarios. We propose Teacher-Aligned Representations via Contrastive Learning (TAR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "Teacher". Results showed accelerated training by 2x compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40% on average compared to existing methods. Moreover, TAR transitions seamlessly into learning during deployment without requiring privileged states, setting a new benchmark in sample-efficient, adaptive locomotion and enabling continual fine-tuning in real-world scenarios. Open-source code and videos are available at https://amrmousa.com/TARLoco/.
[40] arXiv:2504.05287 (替换) [中文pdf, pdf, html, 其他]: 标题： RobustDexGrasp：通用物体的鲁棒灵巧抓取

标题： RobustDexGrasp: Robust Dexterous Grasping of General Objects

Hui Zhang, Zijian Wu, Linyi Huang, Sammy Christen, Jie Song

评论：相机准备就绪用于CoRL2025。项目页面：https://zdchan.github.io/Robust_DexGrasp/

主题：机器人技术 (cs.RO)

稳健抓取各种物体的能力对于灵巧机器人至关重要。在本文中，我们提出了一种框架，使用单视角视觉输入实现零样本动态灵巧抓取，旨在对各种干扰具有鲁棒性。我们的方法利用基于手指关节与物体表面之间动态距离向量的以手为中心的物体形状表示。这种表示捕捉潜在接触区域周围的局部形状，而不是关注详细的全局物体几何形状，从而增强对形状变化和不确定性的泛化能力。为了解决感知限制，我们结合了一个特权教师策略和混合课程学习方法，使学生策略能够有效地提炼抓取能力并探索以适应干扰。在模拟中训练，我们的方法在247,786个模拟物体上取得了97.0%的成功率，在512个真实物体上取得了94.6%的成功率，证明了其出色的泛化能力。定量和定性结果验证了我们的策略对各种干扰的鲁棒性。

The ability to robustly grasp a variety of objects is essential for dexterous robots. In this paper, we present a framework for zero-shot dynamic dexterous grasping using single-view visual inputs, designed to be resilient to various disturbances. Our approach utilizes a hand-centric object shape representation based on dynamic distance vectors between finger joints and object surfaces. This representation captures the local shape around potential contact regions rather than focusing on detailed global object geometry, thereby enhancing generalization to shape variations and uncertainties. To address perception limitations, we integrate a privileged teacher policy with a mixed curriculum learning approach, allowing the student policy to effectively distill grasping capabilities and explore for adaptation to disturbances. Trained in simulation, our method achieves success rates of 97.0% across 247,786 simulated objects and 94.6% across 512 real objects, demonstrating remarkable generalization. Quantitative and qualitative results validate the robustness of our policy against various disturbances.
[41] arXiv:2504.06866 (替换) [中文pdf, pdf, html, 其他]: 标题： GraspClutter6D：一个大规模真实世界数据集，用于杂乱场景中的鲁棒感知和抓取

标题： GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes

Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, Kyoobin Lee

评论：已被IEEE机器人与自动化字母期刊（RA-L）接受。项目网站：https://sites.google.com/view/graspclutter6d

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 计算机视觉与模式识别 (cs.CV)

在杂乱环境中进行鲁棒抓取仍然是机器人学中的一个开放性挑战。虽然基准数据集显著推动了深度学习方法的发展，但它们主要关注具有轻微遮挡和多样性不足的简单场景，限制了其在实际场景中的适用性。我们提出了 GraspClutter6D，这是一个大规模的真实世界抓取数据集，具有以下特点： (1) 1000个高度杂乱的场景，密集排列（每场景14.1个物体，62.6%的遮挡），(2) 覆盖200个物体在75种环境配置（箱子、架子和桌子）中，使用四个RGB-D相机从多个视角捕获，以及(3) 丰富的标注，包括736K个6D物体姿态和93亿个可行的机器人抓取，针对52K个RGB-D图像。我们对最先进的分割、物体姿态估计和抓取检测方法进行了基准测试，以提供对杂乱环境中挑战的关键见解。此外，我们验证了该数据集作为训练资源的有效性，结果表明，在模拟和真实世界实验中，基于 GraspClutter6D 训练的抓取网络明显优于基于现有数据集训练的网络。该数据集、工具包和标注工具可在我们的项目网站上公开获取：https://sites.google.com/view/graspclutter6d。

Robust grasping in cluttered environments remains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasping dataset featuring: (1) 1,000 highly cluttered scenes with dense arrangements (14.1 objects/scene, 62.6\% occlusion), (2) comprehensive coverage across 200 objects in 75 environment configurations (bins, shelves, and tables) captured using four RGB-D cameras from multiple viewpoints, and (3) rich annotations including 736K 6D object poses and 9.3B feasible robotic grasps for 52K RGB-D images. We benchmark state-of-the-art segmentation, object pose estimation, and grasp detection methods to provide key insights into challenges in cluttered environments. Additionally, we validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments. The dataset, toolkit, and annotation tools are publicly available on our project website: https://sites.google.com/view/graspclutter6d.
[42] arXiv:2505.11528 (替换) [中文pdf, pdf, html, 其他]: 标题： LaDi-WM：一种基于潜在扩散的世界模型用于预测操作

标题： LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation

Yuhang Huang, Jiazhao Zhang, Shilong Zou, Xinwang Liu, Ruizhen Hu, Kai Xu

评论： CoRL 2025

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)

预测性操作最近在具身人工智能社区中引起了广泛关注，因为它有望通过利用预测状态来提高机器人策略性能。然而，从世界模型中生成机器人-物体交互的准确未来视觉状态仍然是一个众所周知的挑战，特别是在实现高质量的像素级表示方面。为此，我们提出了LaDi-WM，这是一种使用扩散建模预测未来状态潜在空间的世界模型。具体来说，LaDi-WM利用了与预训练视觉基础模型（VFMs）对齐的潜在空间，该空间包括几何特征（基于DINO）和语义特征（基于CLIP）。我们发现，预测潜在空间的演变比直接预测像素级图像更容易学习且更具泛化性。基于LaDi-WM，我们设计了一种扩散策略，通过结合预测状态迭代地优化输出动作，从而生成更一致和准确的结果。在合成和真实世界基准上的大量实验表明，LaDi-WM在LIBERO-LONG基准上将策略性能提高了27.9%，在真实世界场景中提高了20%。此外，我们的世界模型和策略在真实世界实验中表现出令人印象深刻的泛化能力。

Predictive manipulation has recently gained considerable attention in the Embodied AI community due to its potential to improve robot policy performance by leveraging predicted states. However, generating accurate future visual states of robot-object interactions from world models remains a well-known challenge, particularly in achieving high-quality pixel-level representations. To this end, we propose LaDi-WM, a world model that predicts the latent space of future states using diffusion modeling. Specifically, LaDi-WM leverages the well-established latent space aligned with pre-trained Visual Foundation Models (VFMs), which comprises both geometric features (DINO-based) and semantic features (CLIP-based). We find that predicting the evolution of the latent space is easier to learn and more generalizable than directly predicting pixel-level images. Building on LaDi-WM, we design a diffusion policy that iteratively refines output actions by incorporating forecasted states, thereby generating more consistent and accurate results. Extensive experiments on both synthetic and real-world benchmarks demonstrate that LaDi-WM significantly enhances policy performance by 27.9\% on the LIBERO-LONG benchmark and 20\% on the real-world scenario. Furthermore, our world model and policies achieve impressive generalizability in real-world experiments.
[43] arXiv:2507.15604 (替换) [中文pdf, pdf, html, 其他]: 标题：通过手动引导从人类示范中估计有效载荷惯性参数

标题： Estimation of Payload Inertial Parameters from Human Demonstrations by Hand Guiding

Johannes Hartwig, Philipp Lienhardt, Dominik Henrich

评论：这是以下作品的预印本（已接受发表）：《2025年科学学会会议、处理和工业机器人年鉴》。最终认证版本将在此链接：http://dx.doi.org/[tba]

主题：机器人技术 (cs.RO)

随着协作机器人（cobots）的普及，有必要解决没有编程知识的用户操作这些系统以提高效率的需求。编程概念通常使用直观的交互方式，如手动引导，来解决这一问题。在编程接触运动时，这些框架除了需要演示的速度和力之外，还需要了解机器人工具的负载惯性参数（PIP），以确保有效的混合运动-力控制。本文旨在通过消除对专用PIP校准的需求，使非专家用户更高效地编程接触运动，从而实现灵活的机器人工具更换。由于演示的任务通常也包含非接触运动，我们的方法利用这些部分，使用已建立的估计技术来估算机器人的PIP。结果表明，负载质量的估计是准确的，而质心和惯性张量则受到噪声和激励不足的影响。总体而言，这些发现展示了在手动引导过程中PIP估计的可行性，但也强调了需要足够的负载加速度以实现准确估计。

As the availability of cobots increases, it is essential to address the needs of users with little to no programming knowledge to operate such systems efficiently. Programming concepts often use intuitive interaction modalities, such as hand guiding, to address this. When programming in-contact motions, such frameworks require knowledge of the robot tool's payload inertial parameters (PIP) in addition to the demonstrated velocities and forces to ensure effective hybrid motion-force control. This paper aims to enable non-expert users to program in-contact motions more efficiently by eliminating the need for a dedicated PIP calibration, thereby enabling flexible robot tool changes. Since demonstrated tasks generally also contain motions with non-contact, our approach uses these parts to estimate the robot's PIP using established estimation techniques. The results show that the estimation of the payload's mass is accurate, whereas the center of mass and the inertia tensor are affected by noise and a lack of excitation. Overall, these findings show the feasibility of PIP estimation during hand guiding but also highlight the need for sufficient payload accelerations for an accurate estimation.
[44] arXiv:2507.15608 (替换) [中文pdf, pdf, html, 其他]: 标题：从人接触运动示范中优化力信号

标题： Optimizing Force Signals from Human Demonstrations of In-Contact Motions

Johannes Hartwig, Fabian Viessmann, Dominik Henrich

评论：这是以下作品的预印本（已接受发表）：《2024年科学学会会议、处理与工业机器人年鉴》。最终认证版本将在此链接：http://dx.doi.org/[tba]

主题：机器人技术 (cs.RO)

对于非机器人编程专家来说，运动引导可以作为一种直观的输入方法，因为接触任务的机器人编程正变得越来越重要。然而，来自人类演示的不精确和噪声输入信号在直接再现运动或将其作为机器学习方法的输入时会带来问题。本文探讨了优化力信号以更好地对应演示信号的人类意图。我们比较了不同的信号过滤方法，并提出了一种峰值检测方法来处理信号中的首次接触偏差。这些方法的评估考虑了输入信号与人类意图信号之间的专门误差准则。此外，我们分析了关键参数对过滤方法的影响。就误差准则而言，单个运动的质量可以提高多达\SI{20}{\percent}。所提出的贡献可以提高机器人编程的可用性以及人机之间的交互。

For non-robot-programming experts, kinesthetic guiding can be an intuitive input method, as robot programming of in-contact tasks is becoming more prominent. However, imprecise and noisy input signals from human demonstrations pose problems when reproducing motions directly or using the signal as input for machine learning methods. This paper explores optimizing force signals to correspond better to the human intention of the demonstrated signal. We compare different signal filtering methods and propose a peak detection method for dealing with first-contact deviations in the signal. The evaluation of these methods considers a specialized error criterion between the input and the human-intended signal. In addition, we analyze the critical parameters' influence on the filtering methods. The quality for an individual motion could be increased by up to \SI{20}{\percent} concerning the error criterion. The proposed contribution can improve the usability of robot programming and the interaction between humans and robots.
[45] arXiv:2508.05294 (替换) [中文pdf, pdf, html, 其他]: 标题：面向具身代理人工智能：大模型和视觉语言模型驱动的机器人自主性和交互性的综述与分类

标题： Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction

Sahar Salimpour, Lei Fu, Farhad Keramat, Leonardo Militano, Giovanni Toffetti, Harry Edelman, Jorge Peña Queralta

主题：机器人技术 (cs.RO) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG)

基础模型，包括大型语言模型（LLMs）和视觉语言模型（VLMs），最近使得机器人自主性和人机接口有了新的方法。同时，视觉语言动作模型（VLAs）或大型行为模型（LBMs）正在提高机器人系统的灵巧性和能力。这篇综述论文关注那些朝着代理应用和架构发展的作品。这包括最初的尝试探索GPT风格的工具接口，以及更复杂的系统，在这些系统中AI代理是协调者、规划者、感知执行者或通用接口。这样的代理架构使机器人能够对自然语言指令进行推理，调用API，规划任务序列，或在操作和诊断中提供帮助。除了同行评审的研究外，由于该领域的快速发展，我们还强调并包括展示新兴趋势的社区驱动项目、ROS包和工业框架。我们提出了一种分类模型集成方法的分类法，并展示了代理在当今文献中不同解决方案中所扮演角色的比较分析。

Foundation models, including large language models (LLMs) and vision-language models (VLMs), have recently enabled novel approaches to robot autonomy and human-robot interfaces. In parallel, vision-language-action models (VLAs) or large behavior models (LBMs) are increasing the dexterity and capabilities of robotic systems. This survey paper focuses on those works advancing towards agentic applications and architectures. This includes initial efforts exploring GPT-style interfaces to tooling, as well as more complex system where AI agents are coordinators, planners, perception actors, or generalist interfaces. Such agentic architectures allow robots to reason over natural language instructions, invoke APIs, plan task sequences, or assist in operations and diagnostics. In addition to peer-reviewed research, due to the fast-evolving nature of the field, we highlight and include community-driven projects, ROS packages, and industrial frameworks that show emerging trends. We propose a taxonomy for classifying model integration approaches and present a comparative analysis of the role that agents play in different solutions in today's literature.
[46] arXiv:2408.03551 (替换) [中文pdf, pdf, html, 其他]: 标题： VPOcc：利用消失点进行3D语义占据预测

标题： VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

Junsu Kim, Junhee Lee, Ukcheol Shin, Jean Oh, Kyungdon Joo

主题：计算机视觉与模式识别 (cs.CV) ; 机器人技术 (cs.RO)

理解3D场景的语义和空间信息对于机器人和自动驾驶车辆的安全导航至关重要，有助于障碍物避让和准确轨迹规划。基于摄像头的3D语义占用预测，从2D图像推断完整的体素网格，在机器人视觉中越来越重要，因为它相比3D传感器更具资源效率。然而，该任务本质上存在2D-3D差异，由于透视投影，同一大小的3D空间中的物体在2D图像中显示的尺度不同，这取决于它们与相机的距离。为了解决这个问题，我们提出了一种新的框架称为VPOcc，它利用消失点（VP）在像素级别和特征级别减轻2D-3D差异。作为像素级别的解决方案，我们引入了一个VPZoomer模块，该模块通过基于VP的单应性变换抵消透视效应来扭曲图像。此外，作为特征级别的解决方案，我们提出了一个VP引导的交叉注意力（VPCA）模块，该模块执行透视感知的特征聚合，利用更适合3D空间的2D图像特征。最后，我们将从原始图像和扭曲图像中提取的两个特征体积通过空间体积融合（SVF）模块进行互补。通过有效地将VP纳入网络，我们的框架在SemanticKITTI和SSCBench-KITTI360数据集上的IoU和mIoU指标上取得了改进。更多细节请访问https://vision3d-lab.github.io/vpocc/。

Understanding 3D scenes semantically and spatially is crucial for the safe navigation of robots and autonomous vehicles, aiding obstacle avoidance and accurate trajectory planning. Camera-based 3D semantic occupancy prediction, which infers complete voxel grids from 2D images, is gaining importance in robot vision for its resource efficiency compared to 3D sensors. However, this task inherently suffers from a 2D-3D discrepancy, where objects of the same size in 3D space appear at different scales in a 2D image depending on their distance from the camera due to perspective projection. To tackle this issue, we propose a novel framework called VPOcc that leverages a vanishing point (VP) to mitigate the 2D-3D discrepancy at both the pixel and feature levels. As a pixel-level solution, we introduce a VPZoomer module, which warps images by counteracting the perspective effect using a VP-based homography transformation. In addition, as a feature-level solution, we propose a VP-guided cross-attention (VPCA) module that performs perspective-aware feature aggregation, utilizing 2D image features that are more suitable for 3D space. Lastly, we integrate two feature volumes extracted from the original and warped images to compensate for each other through a spatial volume fusion (SVF) module. By effectively incorporating VP into the network, our framework achieves improvements in both IoU and mIoU metrics on SemanticKITTI and SSCBench-KITTI360 datasets. Additional details are available at https://vision3d-lab.github.io/vpocc/.
[47] arXiv:2409.07563 (替换) [中文pdf, pdf, html, 其他]: 标题： MPPI-通用：一种用于随机轨迹优化的CUDA库

标题： MPPI-Generic: A CUDA Library for Stochastic Trajectory Optimization

Bogdan Vlahov, Jason Gibson, Manan Gandhi, Evangelos A. Theodorou

评论：添加了缺失的致谢部分

主题：数学软件 (cs.MS) ; 分布式、并行与集群计算 (cs.DC) ; 机器人技术 (cs.RO) ; 系统与控制 (eess.SY)

本文介绍了一种新的C++/CUDA库，用于GPU加速的随机优化，称为MPPI-Generic。它提供了模型预测路径积分控制、管状模型预测路径积分控制和鲁棒模型预测路径积分控制的实现，并允许这些算法在许多现有的动力学模型和成本函数中使用。此外，研究人员可以按照我们的API定义创建自己的动力学模型或成本函数，而无需更改实际的模型预测路径积分控制代码。最后，我们在各种GPU上与其他流行的模型预测路径积分控制实现进行了计算性能比较，以展示我们的库所能提供的实时能力。库代码可在以下网址找到：https://acdslab.github.io/mppi-generic-website/ 。

This paper introduces a new C++/CUDA library for GPU-accelerated stochastic optimization called MPPI-Generic. It provides implementations of Model Predictive Path Integral control, Tube-Model Predictive Path Integral Control, and Robust Model Predictive Path Integral Control, and allows for these algorithms to be used across many pre-existing dynamics models and cost functions. Furthermore, researchers can create their own dynamics models or cost functions following our API definitions without needing to change the actual Model Predictive Path Integral Control code. Finally, we compare computational performance to other popular implementations of Model Predictive Path Integral Control over a variety of GPUs to show the real-time capabilities our library can allow for. Library code can be found at: https://acdslab.github.io/mppi-generic-website/ .
[48] arXiv:2503.24381 (替换) [中文pdf, pdf, html, 其他]: 标题： UniOcc：自动驾驶中的占用预测和预报统一基准

标题： UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li

评论： IEEE/CVF 国际计算机视觉会议（ICCV 2025）；项目网站：https://uniocc.github.io/

主题：计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI) ; 机器学习 (cs.LG) ; 多智能体系统 (cs.MA) ; 机器人技术 (cs.RO)

我们引入了UniOcc，这是一个全面的、统一的基准和工具包，用于占用预测（即基于历史信息预测未来占用情况）和占用预测（即从摄像头图像预测当前帧的占用情况）。 UniOcc整合了多个真实世界数据集（即nuScenes，Waymo）和高保真驾驶模拟器（即CARLA，OpenCOOD）的数据，提供2D/3D占用标签并标注创新的每个体素流。与现有研究依赖次优伪标签进行评估不同，UniOcc结合了新颖的评估指标，这些指标不依赖于真实标签，从而能够在占用质量的其他方面进行稳健评估。通过在最先进模型上的广泛实验，我们证明大规模、多样化的训练数据和显式流信息显著提升了占用预测和预测性能。我们的数据和代码可在https://uniocc.github.io/获取。

We introduce UniOcc, a comprehensive, unified benchmark and toolkit for occupancy forecasting (i.e., predicting future occupancies based on historical information) and occupancy prediction (i.e., predicting current-frame occupancy from camera images. UniOcc unifies the data from multiple real-world datasets (i.e., nuScenes, Waymo) and high-fidelity driving simulators (i.e., CARLA, OpenCOOD), providing 2D/3D occupancy labels and annotating innovative per-voxel flows. Unlike existing studies that rely on suboptimal pseudo labels for evaluation, UniOcc incorporates novel evaluation metrics that do not depend on ground-truth labels, enabling robust assessment on additional aspects of occupancy quality. Through extensive experiments on state-of-the-art models, we demonstrate that large-scale, diverse training data and explicit flow information significantly enhance occupancy prediction and forecasting performance. Our data and code are available at https://uniocc.github.io/.
[49] arXiv:2508.09560 (替换) [中文pdf, pdf, html, 其他]: 标题： WeatherPrompt：多模态表示学习用于全天候无人机视觉地理定位

标题： WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization

Jiahao Wen, Hang Yu, Zhedong Zheng

主题：计算机视觉与模式识别 (cs.CV) ; 机器人技术 (cs.RO)

无人机的视觉地理定位在天气扰动下会出现严重的性能下降，如\eg 、雨和雾，现有方法在两个固有局限性上存在困难：1）对有限天气类别的高度依赖，这限制了泛化能力，2）通过伪天气类别对纠缠的场景-天气特征进行解耦的效果不佳。我们提出了WeatherPrompt，一种多模态学习范式，通过融合图像嵌入和文本上下文来建立与天气无关的表示。我们的框架引入了两个关键贡献：首先，一种无需训练的天气推理机制，该机制利用现成的大规模多模态模型通过类似人类的推理合成多天气文本描述。它提高了对未见过或复杂天气的可扩展性，并能够反映不同的天气强度。其次，为了更好地解耦场景和天气特征，我们提出了一种由文本嵌入驱动的多模态框架，具有动态门控机制，以自适应地重新加权和融合跨模态的视觉特征。该框架通过跨模态目标进一步优化，包括图像-文本对比学习和图像-文本匹配，这将不同天气条件下的同一场景映射到表示空间中更接近的位置。大量实验验证表明，在各种天气条件下，我们的方法相比最先进的无人机地理定位方法实现了具有竞争力的召回率。值得注意的是，在夜间条件下，Recall@1提升了+13.37%，在雾和雪条件下提升了18.69%。

Visual geo-localization for drones faces critical degradation under weather perturbations, \eg, rain and fog, where existing methods struggle with two inherent limitations: 1) Heavy reliance on limited weather categories that constrain generalization, and 2) Suboptimal disentanglement of entangled scene-weather features through pseudo weather categories. We present WeatherPrompt, a multi-modality learning paradigm that establishes weather-invariant representations through fusing the image embedding with the text context. Our framework introduces two key contributions: First, a Training-free Weather Reasoning mechanism that employs off-the-shelf large multi-modality models to synthesize multi-weather textual descriptions through human-like reasoning. It improves the scalability to unseen or complex weather, and could reflect different weather strength. Second, to better disentangle the scene and weather feature, we propose a multi-modality framework with the dynamic gating mechanism driven by the text embedding to adaptively reweight and fuse visual features across modalities. The framework is further optimized by the cross-modal objectives, including image-text contrastive learning and image-text matching, which maps the same scene with different weather conditions closer in the respresentation space. Extensive experiments validate that, under diverse weather conditions, our method achieves competitive recall rates compared to state-of-the-art drone geo-localization methods. Notably, it improves Recall@1 by +13.37\% under night conditions and by 18.69\% under fog and snow conditions.

总共 49 条目

显示最多 2000 每页条目：较少 | 更多 | 所有

机器人技术

显示 2025年08月15日，星期五新的列表

新提交 (展示 23 之 23 条目 )

交叉提交 (展示 3 之 3 条目 )

替换提交 (展示 23 之 23 条目 )

机器人技术

显示 2025年08月15日， 星期五 新的列表

新提交 (展示 23 之 23 条目 )

交叉提交 (展示 3 之 3 条目 )

替换提交 (展示 23 之 23 条目 )

显示 2025年08月15日，星期五新的列表