Auditory Representation Effective for Estimating Vocal Tract Information

Irino, Toshio; Doan, Shintaro

doi:10.1109/APSIPAASC58517.2023.10317258

电气工程与系统科学 > 音频与语音处理

arXiv:2306.01522 (eess)

[提交于 2023年6月2日 (v1) ，最后修订 2023年9月14日 (此版本， v2)]

标题：听觉表征对于估计声道信息有效

标题： Auditory Representation Effective for Estimating Vocal Tract Information

Authors:Toshio Irino, Shintaro Doan

摘要：我们可以通过语音声音本身来估计说话者的大小。我们提出了一个听觉计算理论，即稳定小波-梅林变换（SWMT），该理论能够分离关于声道大小和形状以及声门振动的信息，以解释这一观察结果。已经证明，与基于SWMT的加权函数相关的听觉表示或激励模式（EP），称为“SSI权重”，可以解释大小感知的心理物理函数。在本研究中，我们调查了带有SSI权重的EP是否能准确估计通过磁共振成像（MRI）测量的男性和女性受试者的声道长度（VTL）。发现使用SSI权重显著提高了VTL的估计效果。此外，带有SSI权重的EP的估计误差明显小于从傅里叶变换、梅尔滤波器组和WORLD语音合成器得到的常用频谱的误差。还表明，SSI权重可以轻松引入到这些频谱中以提高性能。

摘要： We can estimate the size of the speakers based on their speech sounds alone. We had proposed an auditory computational theory of the Stabilised Wavelet-Mellin Transform (SWMT), which segregates information about the size and shape of the vocal tract and glottal vibration, to explain this observation. It has been shown that the auditory representation or excitation pattern (EP) associated with a weighting function based on the SWMT, termed the ``SSI weight,'' can account for the psychometric functions of size perception. In this study, we investigated whether EP with SSI weight can accurately estimate vocal tract lengths (VTLs) which were measured by magnetic resonance imaging (MRI) in male and female subjects. It was found that the use of SSI weight significantly improved the VTL estimation. Furthermore, the estimation errors in the EP with the SSI weight were significantly smaller than those in the commonly used spectra derived from the Fourier transform, Mel filterbank, and WORLD vocoder. It was also shown that the SSI weight can be easily introduced into these spectra to improve the performance.

评论：	该手稿是2023年8月25日被Proc. APSIPA ASC 2023接受发表后的修订版本
主题：	音频与语音处理 (eess.AS) ; 声音 (cs.SD)
引用方式：	arXiv:2306.01522 [eess.AS]
	(或者 arXiv:2306.01522v2 [eess.AS] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.01522
期刊参考：	Proc. APSIPA ASC 2023
相关 DOI:	https://doi.org/10.1109/APSIPAASC58517.2023.10317258

提交历史

来自： Toshio Irino [查看电子邮件]
[v1] 星期五， 2023 年 6 月 2 日 13:15:48 UTC (4,809 KB)
[v2] 星期四， 2023 年 9 月 14 日 05:04:35 UTC (4,878 KB)

电气工程与系统科学 > 音频与语音处理

标题：听觉表征对于估计声道信息有效

标题： Auditory Representation Effective for Estimating Vocal Tract Information

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 音频与语音处理

标题： 听觉表征对于估计声道信息有效 显示英文标题

标题： Auditory Representation Effective for Estimating Vocal Tract Information

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：听觉表征对于估计声道信息有效