Harnessing Textual Semantic Priors for Knowledge Transfer and Refinement in CLIP-Driven Continual Learning

He, Lingfeng; Cheng, De; Wang, Huaijie; Wang, Nannan

计算机科学 > 计算机视觉与模式识别

arXiv:2508.01579 (cs)

[提交于 2025年8月3日 ]

标题：利用文本语义先验进行CLIP驱动的持续学习中的知识迁移与精炼

标题： Harnessing Textual Semantic Priors for Knowledge Transfer and Refinement in CLIP-Driven Continual Learning

Authors:Lingfeng He, De Cheng, Huaijie Wang, Nannan Wang

摘要：持续学习（CL）旨在赋予模型从任务流中学习的能力，同时不会遗忘之前的知识。随着视觉-语言模型如对比语言-图像预训练（CLIP）的进步，它们在持续学习中的潜力由于其强大的泛化能力而受到越来越多的关注。然而，CLIP中丰富的文本语义先验在解决稳定性与可塑性矛盾方面的潜力仍未得到充分探索。在主干训练过程中，大多数方法在不考虑语义相关性的情况下转移过去的知识，导致不相关任务的干扰，破坏了稳定性和可塑性之间的平衡。此外，尽管基于文本的分类器提供了强大的泛化能力，但由于CLIP中固有的模态差距，它们的可塑性有限。视觉分类器有助于弥合这一差距，但它们的原型缺乏丰富且精确的语义。为了解决这些挑战，我们提出了语义增强的持续适应（SECA），这是一个统一的框架，利用文本先验的抗遗忘和结构特性，引导主干中的语义感知知识迁移，并强化视觉分类器的语义结构。具体而言，提出了一种语义引导的自适应知识迁移（SG-AKT）模块，通过文本线索评估新图像与多样化历史视觉知识的相关性，并以实例自适应的方式聚合相关知识作为蒸馏信号。此外，引入了一种语义增强的视觉原型优化（SE-VPR）模块，利用类别文本嵌入中捕捉的类间语义关系来优化视觉原型。在多个基准上的广泛实验验证了我们方法的有效性。

摘要： Continual learning (CL) aims to equip models with the ability to learn from a stream of tasks without forgetting previous knowledge. With the progress of vision-language models like Contrastive Language-Image Pre-training (CLIP), their promise for CL has attracted increasing attention due to their strong generalizability. However, the potential of rich textual semantic priors in CLIP in addressing the stability-plasticity dilemma remains underexplored. During backbone training, most approaches transfer past knowledge without considering semantic relevance, leading to interference from unrelated tasks that disrupt the balance between stability and plasticity. Besides, while text-based classifiers provide strong generalization, they suffer from limited plasticity due to the inherent modality gap in CLIP. Visual classifiers help bridge this gap, but their prototypes lack rich and precise semantics. To address these challenges, we propose Semantic-Enriched Continual Adaptation (SECA), a unified framework that harnesses the anti-forgetting and structured nature of textual priors to guide semantic-aware knowledge transfer in the backbone and reinforce the semantic structure of the visual classifier. Specifically, a Semantic-Guided Adaptive Knowledge Transfer (SG-AKT) module is proposed to assess new images' relevance to diverse historical visual knowledge via textual cues, and aggregate relevant knowledge in an instance-adaptive manner as distillation signals. Moreover, a Semantic-Enhanced Visual Prototype Refinement (SE-VPR) module is introduced to refine visual prototypes using inter-class semantic relations captured in class-wise textual embeddings. Extensive experiments on multiple benchmarks validate the effectiveness of our approach.

评论：	预印本
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2508.01579 [cs.CV]
	(或者 arXiv:2508.01579v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.01579

提交历史

来自： Lingfeng He [查看电子邮件]
[v1] 星期日， 2025 年 8 月 3 日 04:09:00 UTC (28,861 KB)

计算机科学 > 计算机视觉与模式识别

标题：利用文本语义先验进行CLIP驱动的持续学习中的知识迁移与精炼

标题： Harnessing Textual Semantic Priors for Knowledge Transfer and Refinement in CLIP-Driven Continual Learning

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 利用文本语义先验进行CLIP驱动的持续学习中的知识迁移与精炼 显示英文标题

标题： Harnessing Textual Semantic Priors for Knowledge Transfer and Refinement in CLIP-Driven Continual Learning

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：利用文本语义先验进行CLIP驱动的持续学习中的知识迁移与精炼