LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Song, Wenhui; Li, Hanhui; Huang, Jiehui; Hu, Panwen; Cheng, Yuhao; Chen, Long; Yan, Yiqiang; Liang, Xiaodan

计算机科学 > 计算机视觉与模式识别

arXiv:2508.07603 (cs)

[提交于 2025年8月11日 ]

标题： LaVieID：保留身份的视频生成局部自回归扩散变换器

标题： LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Authors:Wenhui Song, Hanhui Li, Jiehui Huang, Panwen Hu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang

摘要： In this paper, we present LaVieID, a novel \underline{l}ocal \underline{a}utoregressive \underline{vi}d\underline{e}o diffusion framework designed to tackle the challenging \underline{id}entity-preserving text-to-video task. The key idea of LaVieID is to mitigate the loss of identity information inherent in the stochastic global generation process of diffusion transformers (DiTs) from both spatial and temporal perspectives. Specifically, unlike the global and unstructured modeling of facial latent states in existing DiTs, LaVieID introduces a local router to explicitly represent latent states by weighted combinations of fine-grained local facial structures. This alleviates undesirable feature interference and encourages DiTs to capture distinctive facial characteristics. Furthermore, a temporal autoregressive module is integrated into LaVieID to refine denoised latent tokens before video decoding. This module divides latent tokens temporally into chunks, exploiting their long-range temporal dependencies to predict biases for rectifying tokens, thereby significantly enhancing inter-frame identity consistency. Consequently, LaVieID can generate high-fidelity personalized videos and achieve state-of-the-art performance. Our code and models are available at https://github.com/ssugarwh/LaVieID.

评论：	被ACM MM 2025接受
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2508.07603 [cs.CV]
	(或者 arXiv:2508.07603v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07603

提交历史

来自： Hanhui Li Dr. [查看电子邮件]
[v1] 星期一， 2025 年 8 月 11 日 04:13:32 UTC (28,103 KB)

计算机科学 > 计算机视觉与模式识别

标题： LaVieID：保留身份的视频生成局部自回归扩散变换器

标题： LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： LaVieID：保留身份的视频生成局部自回归扩散变换器 显示英文标题

标题： LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： LaVieID：保留身份的视频生成局部自回归扩散变换器