AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

Sung, Nak-Jun; Lee, Donghyun; Choi, Bo Hwa; Park, Chae Jung

电气工程与系统科学 > 图像与视频处理

arXiv:2508.09225 (eess)

[提交于 2025年8月12日 ]

标题： AMRG：扩展视觉语言模型以用于自动乳腺X线摄影报告生成

标题： AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

Authors:Nak-Jun Sung, Donghyun Lee, Bo Hwa Choi, Chae Jung Park

摘要：乳腺摄影报告生成是医疗人工智能中一个关键但研究不足的任务，其特点包括多视图图像推理、高分辨率视觉线索和非结构化的放射学语言。在本工作中，我们引入了AMRG（自动乳腺摄影报告生成），这是首个使用大型视觉-语言模型（VLMs）生成叙述性乳腺摄影报告的端到端框架。基于MedGemma-4B-it这一领域专业化、指令调优的视觉-语言模型，我们通过低秩适应（LoRA）采用了一种参数高效的微调（PEFT）策略，实现了计算开销最小的轻量级适应。我们在DMID上训练和评估AMRG，这是一个公开可用的配对高分辨率乳腺X光片和诊断报告的数据集。这项工作建立了乳腺摄影报告生成的第一个可重复基准，解决了多模态临床人工智能中的长期空白。我们系统地探索了LoRA超参数配置，并在多个VLM主干模型上进行了比较实验，包括在统一调优协议下的领域特定和通用模型。我们的框架在语言生成和临床指标方面表现出色，达到了ROUGE-L得分为0.5691，METEOR为0.6152，CIDEr为0.5818，BI-RADS准确率为0.5582。定性分析进一步突显了诊断一致性的提高和幻觉的减少。 AMRG为放射学报告生成提供了一个可扩展和可适应的基础，并为多模态医疗人工智能的未来研究铺平了道路。

摘要： Mammography report generation is a critical yet underexplored task in medical AI, characterized by challenges such as multiview image reasoning, high-resolution visual cues, and unstructured radiologic language. In this work, we introduce AMRG (Automatic Mammography Report Generation), the first end-to-end framework for generating narrative mammography reports using large vision-language models (VLMs). Building upon MedGemma-4B-it-a domain-specialized, instruction-tuned VLM-we employ a parameter-efficient fine-tuning (PEFT) strategy via Low-Rank Adaptation (LoRA), enabling lightweight adaptation with minimal computational overhead. We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports. This work establishes the first reproducible benchmark for mammography report generation, addressing a longstanding gap in multimodal clinical AI. We systematically explore LoRA hyperparameter configurations and conduct comparative experiments across multiple VLM backbones, including both domain-specific and general-purpose models under a unified tuning protocol. Our framework demonstrates strong performance across both language generation and clinical metrics, achieving a ROUGE-L score of 0.5691, METEOR of 0.6152, CIDEr of 0.5818, and BI-RADS accuracy of 0.5582. Qualitative analysis further highlights improved diagnostic consistency and reduced hallucinations. AMRG offers a scalable and adaptable foundation for radiology report generation and paves the way for future research in multimodal medical AI.

主题：	图像与视频处理 (eess.IV) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2508.09225 [eess.IV]
	(或者 arXiv:2508.09225v1 [eess.IV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.09225

提交历史

来自： Nak-Jun Sung Ph.D. [查看电子邮件]
[v1] 星期二， 2025 年 8 月 12 日 06:37:41 UTC (14,730 KB)

电气工程与系统科学 > 图像与视频处理

标题： AMRG：扩展视觉语言模型以用于自动乳腺X线摄影报告生成

标题： AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 图像与视频处理

标题： AMRG：扩展视觉语言模型以用于自动乳腺X线摄影报告生成 显示英文标题

标题： AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： AMRG：扩展视觉语言模型以用于自动乳腺X线摄影报告生成