Título: Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods
Autores: Lara Toledo Cordeiro Ottoni, Jes de Jesus Fiais Cerqueira
Resumo: Speech emotion recognition is a challenging and essential task with numerous applications in human-computer interaction, healthcare, and entertainment. However, achieving high accuracy in this task is complicated by the need to select the best combination of machine learning algorithms, databases, data augmentation techniques, and feature extraction methods. This paper discusses the difficulty of choosing appropriate combinations of these factors and proposes a methodology to address this challenge. The proposed method evaluates the performance of various combinations of databases, data augmentation techniques, and feature extraction methods to determine the most effective approach for speech emotion recognition. The paper also presents a convolutional neural network to classify the emotions of happiness, sadness, fear, anger, surprise, disgust, and neutral. The results showed that the optimal combination proposed, with 94% accuracy, uses the combined RAVDESS and TESS databases, using data augmentation with noise, stretch, and pitch, and using MFCC to extract the characteristics of the audios
Palavras-chave: convolutional neural network, data augmentation, MFCC, RAVDESS, speech emotion recognition.
Páginas: 8
Código DOI: 10.21528/CBIC2023-051
Artigo em pdf: CBIC_2023_paper051.pdf
Arquivo BibTeX: CBIC_2023_051.bib