AI-Generated Image Detection: An Empirical Study and Future Research Directions

Tasnim, Nusrat; Uddin, Kutub; Malik, Khalid Mahmood

计算机科学 > 计算机视觉与模式识别

arXiv:2511.02791 (cs)

[提交于 2025年11月4日 ]

标题： AI生成图像检测：一项实证研究和未来研究方向

标题： AI-Generated Image Detection: An Empirical Study and Future Research Directions

Authors:Nusrat Tasnim, Kutub Uddin, Khalid Mahmood Malik

摘要：人工智能生成媒体，特别是深度伪造，现在对多媒体取证、虚假信息检测和生物识别系统构成了重大威胁，导致公众对法律系统的信任度下降，欺诈行为显著增加，以及社会工程攻击。尽管已经提出了几种取证方法，但它们存在三个关键差距：(i) 使用非标准化基准，包括GAN或扩散生成的图像，(ii) 训练协议不一致（例如，从头训练、冻结、微调），以及(iii) 评估指标有限，无法捕捉泛化能力和可解释性。这些限制阻碍了公平比较，掩盖了真正的鲁棒性，并限制了在安全关键应用中的部署。本文介绍了一个统一的基准测试框架，用于在受控和可重复条件下对取证方法进行系统评估。我们对十种最先进的取证方法（从头训练、冻结和微调）以及七个公开可用的数据集（GAN和扩散）进行了基准测试，以进行广泛而系统的评估。我们使用多个指标评估性能，包括准确率、平均精度、ROC-AUC、错误率和类别敏感度。我们还进一步利用置信度曲线和Grad-CAM热图分析模型的可解释性。我们的评估结果表明泛化能力存在显著差异，某些方法在分布内表现良好，但在跨模型迁移中性能下降。本研究旨在引导研究界更深入地理解当前取证方法的优势和局限性，并激发开发更稳健、泛化性和可解释性更强的解决方案。

摘要： The threats posed by AI-generated media, particularly deepfakes, are now raising significant challenges for multimedia forensics, misinformation detection, and biometric system resulting in erosion of public trust in the legal system, significant increase in frauds, and social engineering attacks. Although several forensic methods have been proposed, they suffer from three critical gaps: (i) use of non-standardized benchmarks with GAN- or diffusion-generated images, (ii) inconsistent training protocols (e.g., scratch, frozen, fine-tuning), and (iii) limited evaluation metrics that fail to capture generalization and explainability. These limitations hinder fair comparison, obscure true robustness, and restrict deployment in security-critical applications. This paper introduces a unified benchmarking framework for systematic evaluation of forensic methods under controlled and reproducible conditions. We benchmark ten SoTA forensic methods (scratch, frozen, and fine-tuned) and seven publicly available datasets (GAN and diffusion) to perform extensive and systematic evaluations. We evaluate performance using multiple metrics, including accuracy, average precision, ROC-AUC, error rate, and class-wise sensitivity. We also further analyze model interpretability using confidence curves and Grad-CAM heatmaps. Our evaluations demonstrate substantial variability in generalization, with certain methods exhibiting strong in-distribution performance but degraded cross-model transferability. This study aims to guide the research community toward a deeper understanding of the strengths and limitations of current forensic approaches, and to inspire the development of more robust, generalizable, and explainable solutions.

主题：	计算机视觉与模式识别 (cs.CV) ; 计算机科学与博弈论 (cs.GT)
引用方式：	arXiv:2511.02791 [cs.CV]
	(或者 arXiv:2511.02791v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2511.02791

提交历史

来自： Kutub Uddin [查看电子邮件]
[v1] 星期二， 2025 年 11 月 4 日 18:13:48 UTC (11,340 KB)

计算机科学 > 计算机视觉与模式识别

标题： AI生成图像检测：一项实证研究和未来研究方向

标题： AI-Generated Image Detection: An Empirical Study and Future Research Directions

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： AI生成图像检测：一项实证研究和未来研究方向 显示英文标题

标题： AI-Generated Image Detection: An Empirical Study and Future Research Directions

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： AI生成图像检测：一项实证研究和未来研究方向