SPATIALGEN: Layout-guided 3D Indoor Scene Generation

Fang, Chuan; Li, Heng; Liang, Yixun; Zheng, Jia; Mao, Yongsen; Liu, Yuan; Tang, Rui; Zhou, Zihan; Tan, Ping

计算机科学 > 计算机视觉与模式识别

arXiv:2509.14981 (cs)

[提交于 2025年9月18日 ]

标题： SPATIALGEN：布局引导的3D室内场景生成

标题： SPATIALGEN: Layout-guided 3D Indoor Scene Generation

Authors:Chuan Fang, Heng Li, Yixun Liang, Jia Zheng, Yongsen Mao, Yuan Liu, Rui Tang, Zihan Zhou, Ping Tan

摘要：创建高保真度的室内环境3D模型对于设计、虚拟现实和机器人技术的应用至关重要。然而，手动3D建模仍然耗时且劳动密集。尽管生成式AI的最新进展实现了自动场景合成，但现有方法在平衡视觉质量、多样性、语义一致性和用户控制方面仍面临挑战。一个主要瓶颈是缺乏针对此任务的大规模高质量数据集。为解决这一差距，我们引入了一个综合的合成数据集，包含12,328个结构化注释场景，57,440个房间，以及470万张逼真的2D渲染图像。利用这个数据集，我们提出了SpatialGen，一种新颖的多视图多模态扩散模型，可以生成真实且语义一致的3D室内场景。给定一个3D布局和参考图像（来自文本提示），我们的模型从任意视角合成外观（颜色图像）、几何（场景坐标图）和语义（语义分割图），同时在不同模态之间保持空间一致性。在我们的实验中，SpatialGen始终生成优于之前方法的结果。我们开源了我们的数据和模型，以赋能社区并推进室内场景理解和生成领域的发展。

摘要： Creating high-fidelity 3D models of indoor environments is essential for applications in design, virtual reality, and robotics. However, manual 3D modeling remains time-consuming and labor-intensive. While recent advances in generative AI have enabled automated scene synthesis, existing methods often face challenges in balancing visual quality, diversity, semantic consistency, and user control. A major bottleneck is the lack of a large-scale, high-quality dataset tailored to this task. To address this gap, we introduce a comprehensive synthetic dataset, featuring 12,328 structured annotated scenes with 57,440 rooms, and 4.7M photorealistic 2D renderings. Leveraging this dataset, we present SpatialGen, a novel multi-view multi-modal diffusion model that generates realistic and semantically consistent 3D indoor scenes. Given a 3D layout and a reference image (derived from a text prompt), our model synthesizes appearance (color image), geometry (scene coordinate map), and semantic (semantic segmentation map) from arbitrary viewpoints, while preserving spatial consistency across modalities. SpatialGen consistently generates superior results to previous methods in our experiments. We are open-sourcing our data and models to empower the community and advance the field of indoor scene understanding and generation.

评论：	3D场景生成；扩散模型；场景重建与理解
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2509.14981 [cs.CV]
	(或者 arXiv:2509.14981v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.14981

提交历史

来自： Chuan Fang [查看电子邮件]
[v1] 星期四， 2025 年 9 月 18 日 14:12:32 UTC (45,573 KB)

计算机科学 > 计算机视觉与模式识别

标题： SPATIALGEN：布局引导的3D室内场景生成

标题： SPATIALGEN: Layout-guided 3D Indoor Scene Generation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： SPATIALGEN：布局引导的3D室内场景生成 显示英文标题

标题： SPATIALGEN: Layout-guided 3D Indoor Scene Generation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SPATIALGEN：布局引导的3D室内场景生成