CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Tang, Zongheng; Liu, Yi; Sun, Yifan; Gao, Yulu; Chen, Jinyu; Xu, Runsheng; Liu, Si

计算机科学 > 计算机视觉与模式识别

arXiv:2508.00359v1 (cs)

[提交于 2025年8月1日 ]

标题： CoST：从统一时空视角出发的高效协作感知

标题： CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

Authors:Zongheng Tang, Yi Liu, Yifan Sun, Yulu Gao, Jinyu Chen, Runsheng Xu, Si Liu

摘要：协作感知在不同代理之间共享信息，有助于解决个体代理可能面临的问题，例如遮挡和较小的感知范围。先前的方法通常将多代理融合和多时间融合分为两个连续的步骤。相反，本文提出了一种高效的协作感知，同时将来自不同代理（空间）和不同时间的观测结果聚合到一个统一的时空空间中。统一的时空空间带来了两个好处，即高效的特征传输和优越的特征融合。 1）高效的特征传输：每个静态物体在时空空间中产生一个观测结果，因此只需要传输一次（而先前的方法多次重新传输所有物体特征）。 2）优越的特征融合：将多代理和多时间融合合并到一个统一的空间时间聚合中，能够提供更全面的视角，从而提升在挑战性场景中的感知性能。因此，我们的基于时空变换器的协作感知（CoST）在效率和准确性方面都得到了提升。值得注意的是，CoST不依赖于任何特定方法，并且与大多数先前方法兼容，在提高准确性的同时减少了传输带宽。

摘要： Collaborative perception shares information among different agents and helps solving problems that individual agents may face, e.g., occlusions and small sensing range. Prior methods usually separate the multi-agent fusion and multi-time fusion into two consecutive steps. In contrast, this paper proposes an efficient collaborative perception that aggregates the observations from different agents (space) and different times into a unified spatio-temporal space simultanesouly. The unified spatio-temporal space brings two benefits, i.e., efficient feature transmission and superior feature fusion. 1) Efficient feature transmission: each static object yields a single observation in the spatial temporal space, and thus only requires transmission only once (whereas prior methods re-transmit all the object features multiple times). 2) superior feature fusion: merging the multi-agent and multi-time fusion into a unified spatial-temporal aggregation enables a more holistic perspective, thereby enhancing perception performance in challenging scenarios. Consequently, our Collaborative perception with Spatio-temporal Transformer (CoST) gains improvement in both efficiency and accuracy. Notably, CoST is not tied to any specific method and is compatible with a majority of previous methods, enhancing their accuracy while reducing the transmission bandwidth.

评论：	ICCV25（亮点）
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2508.00359 [cs.CV]
	(或者 arXiv:2508.00359v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.00359

提交历史

来自： Zongheng Tang [查看电子邮件]
[v1] 星期五， 2025 年 8 月 1 日 06:45:12 UTC (2,666 KB)

计算机科学 > 计算机视觉与模式识别

标题： CoST：从统一时空视角出发的高效协作感知

标题： CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： CoST：从统一时空视角出发的高效协作感知 显示英文标题

标题： CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： CoST：从统一时空视角出发的高效协作感知