Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Sanabria, Ramon; Klejch, Ondrej; Tang, Hao; Goldwater, Sharon

计算机科学 > 计算与语言

arXiv:2306.02153v1 (cs)

[提交于 2023年6月3日 ]

标题：声学词嵌入用于未转录目标语言的持续预训练和学习池化

标题： Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Authors:Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater

摘要：声学词嵌入通常是通过使用类似单词的单元对来训练池化函数创建的。对于无监督系统，这些是通过k最近邻（KNN）搜索挖掘出来的，这很慢。最近，建议使用预训练的自监督英语模型的均值池化表示作为有前途的替代方法，但它们在目标语言上的表现并未完全具有竞争力。在此，我们探索了这两种方法的改进：我们使用持续预训练将自监督模型适应到目标语言，并使用多语言音素识别器（MPR）来挖掘音素n-gram对以训练池化函数。在四种语言上进行评估，我们证明这两种方法在单词区分上都优于一种近期的方法。此外，MPR方法比KNN快几个数量级，并且数据效率很高。我们还展示了在持续预训练表示之上进行学习池化带来了一点改进。

摘要： Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a pre-trained self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive. Here, we explore improvements to both approaches: we use continued pre-training to adapt the self-supervised model to the target language, and we use a multilingual phone recognizer (MPR) to mine phone n-gram pairs for training the pooling function. Evaluating on four languages, we show that both methods outperform a recent approach on word discrimination. Moreover, the MPR method is orders of magnitude faster than KNN, and is highly data efficient. We also show a small improvement from performing learned pooling on top of the continued pre-trained representations.

评论：	被国际语音会议2023接收
主题：	计算与语言 (cs.CL) ; 机器学习 (cs.LG); 声音 (cs.SD); 音频与语音处理 (eess.AS)
引用方式：	arXiv:2306.02153 [cs.CL]
	(或者 arXiv:2306.02153v1 [cs.CL] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.02153

提交历史

来自： Ramon Sanabria [查看电子邮件]
[v1] 星期六， 2023 年 6 月 3 日 16:44:21 UTC (479 KB)

计算机科学 > 计算与语言

标题：声学词嵌入用于未转录目标语言的持续预训练和学习池化

标题： Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算与语言

标题： 声学词嵌入用于未转录目标语言的持续预训练和学习池化 显示英文标题

标题： Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：声学词嵌入用于未转录目标语言的持续预训练和学习池化