Can Contextual Biasing Remain Effective with Whisper and GPT-2?

Sun, Guangzhi; Zheng, Xianrui; Zhang, Chao; Woodland, Philip C.

计算机科学 > 计算与语言

arXiv:2306.01942 (cs)

[提交于 2023年6月2日 ]

标题： Whisper和GPT-2中情境偏差是否仍然有效？

标题： Can Contextual Biasing Remain Effective with Whisper and GPT-2?

Authors:Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland

摘要：端到端自动语音识别（ASR）和大型语言模型，如Whisper和GPT-2，最近已被扩展以使用大量训练数据。尽管有大量训练数据，特定任务中出现的不常见内容词可能仍然表现出较差的ASR性能，而上下文偏差可能是解决方法。本文研究了将神经上下文偏差应用于Whisper并与GPT-2结合的效果。具体而言，本文提出集成一个适应的树约束指针生成器（TCPGen）组件用于Whisper，并提出一种专用训练方案，在不修改任何Whisper模型参数的情况下动态调整最终输出。在三个数据集上的实验显示，使用1000个词的偏差列表时，偏差词的错误显著减少。当应用于领域特定数据时，上下文偏差更为有效，并且可以在不损失其通用性的情况下提升Whisper和GPT-2的性能。

摘要： End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite the large amount of training data, infrequent content words that occur in a particular task may still exhibit poor ASR performance, with contextual biasing a possible remedy. This paper investigates the effectiveness of neural contextual biasing for Whisper combined with GPT-2. Specifically, this paper proposes integrating an adapted tree-constrained pointer generator (TCPGen) component for Whisper and a dedicated training scheme to dynamically adjust the final output without modifying any Whisper model parameters. Experiments across three datasets show a considerable reduction in errors on biasing words with a biasing list of 1000 words. Contextual biasing was more effective when applied to domain-specific data and can boost the performance of Whisper and GPT-2 without losing their generality.

评论：	将出现在2023年国际语音会议上
主题：	计算与语言 (cs.CL) ; 声音 (cs.SD); 音频与语音处理 (eess.AS)
引用方式：	arXiv:2306.01942 [cs.CL]
	(或者 arXiv:2306.01942v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.01942

提交历史

来自： Guangzhi Sun [查看电子邮件]
[v1] 星期五， 2023 年 6 月 2 日 22:56:01 UTC (617 KB)

计算机科学 > 计算与语言

标题： Whisper和GPT-2中情境偏差是否仍然有效？

标题： Can Contextual Biasing Remain Effective with Whisper and GPT-2?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： Whisper和GPT-2中情境偏差是否仍然有效？ 显示英文标题

标题： Can Contextual Biasing Remain Effective with Whisper and GPT-2?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： Whisper和GPT-2中情境偏差是否仍然有效？