CapText: Large Language Model-based Caption Generation From Image Context and Description

Ghosh, Shinjini; Anupam, Sagnik

计算机科学 > 机器学习

arXiv:2306.00301 (cs)

[提交于 2023年6月1日 (v1) ，最后修订 2023年6月6日 (此版本， v2)]

标题： CapText：基于大型语言模型的图像上下文和描述的标题生成

标题： CapText: Large Language Model-based Caption Generation From Image Context and Description

Authors:Shinjini Ghosh, Sagnik Anupam

摘要：虽然深度学习模型在图像到文本数据集上已被证明表现良好，但在实际中用于图像描述却很困难。这是因为传统上，描述通常依赖于上下文，并提供关于图像的补充信息，而模型往往生成描述图像视觉特征的描述。在图像描述生成的先前研究中，探索了在提供图像及其相应描述或上下文时生成描述的模型的使用。我们提出并评估了一种新方法，该方法利用现有的大型语言模型仅从文本描述和上下文生成描述，而无需直接处理图像。我们证明，在微调后，我们的方法在CIDEr指标上优于当前最先进的图像-文本对齐模型如OSCAR-VinVL。

摘要： While deep-learning models have been shown to perform well on image-to-text datasets, it is difficult to use them in practice for captioning images. This is because captions traditionally tend to be context-dependent and offer complementary information about an image, while models tend to produce descriptions that describe the visual features of the image. Prior research in caption generation has explored the use of models that generate captions when provided with the images alongside their respective descriptions or contexts. We propose and evaluate a new approach, which leverages existing large language models to generate captions from textual descriptions and context alone, without ever processing the image directly. We demonstrate that after fine-tuning, our approach outperforms current state-of-the-art image-text alignment models like OSCAR-VinVL on this task on the CIDEr metric.

评论：	2023年6月6日更新：修正了摘要中的排版错误
主题：	机器学习 (cs.LG) ; 计算与语言 (cs.CL)
引用方式：	arXiv:2306.00301 [cs.LG]
	(或者 arXiv:2306.00301v2 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.00301

提交历史

来自： Sagnik Anupam [查看电子邮件]
[v1] 星期四， 2023 年 6 月 1 日 02:40:44 UTC (1,729 KB)
[v2] 星期二， 2023 年 6 月 6 日 03:41:05 UTC (1,729 KB)

计算机科学 > 机器学习

标题： CapText：基于大型语言模型的图像上下文和描述的标题生成

标题： CapText: Large Language Model-based Caption Generation From Image Context and Description

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： CapText：基于大型语言模型的图像上下文和描述的标题生成 显示英文标题

标题： CapText: Large Language Model-based Caption Generation From Image Context and Description

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： CapText：基于大型语言模型的图像上下文和描述的标题生成