A Robust TEO-Based Speech Segmentation Method For Automatic Speech Recognition

Título: A Robust TEO-Based Speech Segmentation Method For Automatic Speech Recognition

Autores: Peretta, Igor S.; Lima, Gerson F. M.; Tavares, Josimeire; Yamanaka, Keiji

Resumo: Based on the Teager Energy Operator (TEO), the “TEO-based method for Spoken Word Segmentation” (TSWS) is presented and compared with two widely used speech segmentation methods: “Classical”, that uses energy and zero-crossing rate computations, and “Bottom-up”, based on the concepts of adaptive level equalization, energy pulse detection and endpoint ordering. The implemented Automatic Speech Recognition (ASR) system uses Mel-frequency Cepstral Coefficients (MFCC) as the parametric representation of the speech signal, and a standard multilayer feed-forward network (MLP) as the recognizer. A database of 17 different words was used, with a total of 3,519 utterances from 69 different speakers. Two in three of those utterances constituted the training set for the MLP, and one in three, the testing set. The tests were conducted for each of the TSWS, Classical or Bottom-up methods, used in the ASR speech segmentation stage. TSWS has enabled the ASR to achieve 99.0% of success on generalization tests, against 98.6% for Classical and Bottom-up methods. After, a white Gaussian noise was artificially added to the ASR inputs to reach a signal-to-noise ratio of 15dB. The noise presence alters the ASR performances to 96.5%, 93.6%, and 91.4% on generalization tests when using TSWS, Classical and Bottom-up methods, respectively.

Palavras-chave: Automatic Speech Recognition; Speech Segmentation; Teager Energy Operator; Mel-frequency Cepstral Coefficients; Artificial Neural Network; Multilayer Perceptron

Páginas: 8

Código DOI: 10.21528/CBIC2011-17.1

Artigo em pdf: st_17.1.pdf

Arquivo BibTex: st_17.1.bib