Latent Domain Prompt Learning for Vision-Language Models

Li, Zhixing; Khoee, Arsham Gholamzadeh; Yu, Yinan

计算机科学 > 机器学习

arXiv:2511.00067v1 (cs)

[提交于 2025年10月29日 ]

标题：潜在领域提示学习用于视觉-语言模型

标题： Latent Domain Prompt Learning for Vision-Language Models

Authors:Zhixing Li, Arsham Gholamzadeh Khoee, Yinan Yu

摘要：领域泛化（DG）的目标是使模型能够应对领域偏移。 DG 对于在现实世界应用中部署视觉-语言模型（VLMs）至关重要，但大多数现有方法依赖于可能不可用且常常模糊的领域标签。我们则研究一种 DG 设置，在这种设置中，模型必须在无法访问显式领域标签的情况下表现良好。我们的核心思想是将未见过的目标领域表示为从训练数据中自动发现的潜在领域的组合，从而使模型能够在不同领域之间自适应地迁移知识。在四个基准测试上的实验表明，这种策略在基于 VLM 的基线上取得了稳定的提升，并为在领域偏移下提高鲁棒性提供了新的见解。

摘要： The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet most existing methods rely on domain labels that may not be available and often ambiguous. We instead study the DG setting where models must generalize well without access to explicit domain labels. Our key idea is to represent an unseen target domain as a combination of latent domains automatically discovered from training data, enabling the model to adaptively transfer knowledge across domains. To realize this, we perform latent domain clustering on image features and fuse domain-specific text features based on the similarity between the input image and each latent domain. Experiments on four benchmarks show that this strategy yields consistent gains over VLM-based baselines and provides new insights into improving robustness under domain shift.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI)
引用方式：	arXiv:2511.00067 [cs.LG]
	(或者 arXiv:2511.00067v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2511.00067

提交历史

来自： Zhixing Li [查看电子邮件]
[v1] 星期三， 2025 年 10 月 29 日 08:09:07 UTC (337 KB)

计算机科学 > 机器学习

标题：潜在领域提示学习用于视觉-语言模型

标题： Latent Domain Prompt Learning for Vision-Language Models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 潜在领域提示学习用于视觉-语言模型 显示英文标题

标题： Latent Domain Prompt Learning for Vision-Language Models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：潜在领域提示学习用于视觉-语言模型