Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods

Título: Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods

Autores: Lara Toledo Cordeiro Ottoni, Jes de Jesus Fiais Cerqueira

Resumo: Speech emotion recognition is a challenging and essential task with numerous applications in human-computer interaction, healthcare, and entertainment. However, achieving high accuracy in this task is complicated by the need to select the best combination of machine learning algorithms, databases, data augmentation techniques, and feature extraction methods. This paper discusses the difficulty of choosing appropriate combinations of these factors and proposes a methodology to address this challenge. The proposed method evaluates the performance of various combinations of databases, data augmentation techniques, and feature extraction methods to determine the most effective approach for speech emotion recognition. The paper also presents a convolutional neural network to classify the emotions of happiness, sadness, fear, anger, surprise, disgust, and neutral. The results showed that the optimal combination proposed, with 94% accuracy, uses the combined RAVDESS and TESS databases, using data augmentation with noise, stretch, and pitch, and using MFCC to extract the characteristics of the audios

Palavras-chave: convolutional neural network, data augmentation, MFCC, RAVDESS, speech emotion recognition.

Páginas: 8

Código DOI: 10.21528/CBIC2023-051

Artigo em pdf: CBIC_2023_paper051.pdf

Arquivo BibTeX: CBIC_2023_051.bib