Towards transferable speech emotion representation

Towards transferable speech emotion representation: on loss functions for cross-lingual latent representations

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

Fulltext
Indsendt manuskript, 798 KB, PDF-dokument

Sneha Das
Nicole Nadine Lønfeldt
Pagsberg, Anne Katrine
Line H. Clemmensen

In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which provide transfer learning possibilities. However, generalizing over languages, corpora and recording conditions is still an open challenge. In this work we address this gap by exploring loss functions that aid in transferability, specifically to non-tonal languages. We propose a variational autoencoder (VAE) with KL annealing and a semi-supervised VAE to obtain more consistent latent embedding distributions across data sets. To ensure transferability, the distribution of the latent embedding should be similar across non-tonal languages (data sets). We start by presenting a low-complexity SER based on a denoising-autoencoder, which achieves an unweighted classification accuracy of over 52.09% for four-class emotion classification. This performance is comparable to that of similar baseline methods. Following this, we employ a VAE, the semi-supervised VAE and the VAE with KL annealing to obtain a more regularized latent space. We show that while the DAE has the highest classification accuracy among the methods, the semi-supervised VAE has a comparable classification accuracy and a more consistent latent embedding distribution over data sets.

Originalsprog	Engelsk
Titel	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
Forlag	IEEE
Publikationsdato	2022
Sider	6452-6456
ISBN (Elektronisk)	9781665405409
DOI	https://doi.org/10.1109/ICASSP43922.2022.9746450
Status	Udgivet - 2022
Begivenhed	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Varighed: 23 maj 2022 → 27 maj 2022

Konference

Konference	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Land	Singapore
By	Virtual, Online
Periode	23/05/2022 → 27/05/2022
Sponsor	Chinese and Oriental Languages Information Processing Society (COLPIS), Singapore Exhibition and Convention Bureau, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), The Institute of Electrical and Electronics Engineers Signal Processing Society

Navn	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Vol/bind	2022-May
ISSN	1520-6149

Bibliografisk note

ID: 324664969