Offer Categorization for Price Comparison Websites: Word Embedding Approaches

Título: Offer Categorization for Price Comparison Websites: Word Embedding Approaches

Autores: Rosa da Silva, Rodolpho;Fernandes, Eraldo;Motta, Eduardo;Akira, Eduardo;Guarino, Rodrigo;Alvim, Leandro

Resumo:
One key task for price comparison websites is to categorize offers collected from online stores. Classification accuracy impact on searching, recommendation, and website reputation. There are few studies on this topic in the literature, and none of them apply the successful technique of word embedding nor perform a detailed analysis of features. In this work, we compare two different word embedding approaches with the traditional bag-of-words approach, for the task of offer categorization. Firstly, we employ an unsupervised approach in which the embedding is learned from millions of offers using the well known word2vec tool. Secondly, we develop a supervised approach in which the embedding and the offer classifier comprise a Convolutional Neural Network (CNN) and both are jointly learned. Additionally, we perform a detailed analysis of several features regarding their relevance to offer categorization. We assess our models on a dataset comprising more than 11 million offers collected and manually annotated by the most popular Latin American price comparison website. In our experiments, the CNN model substantially outperforms the other models. We present detailed experimental results that highlight the contribution of different parts of the CNN model. Regarding feature engineering, we notice that all evaluated offer attributes contribute to enhance the classifier performance

Palavras-chave:
Word Embedding;Convolutional Neural Network;Machine learning

Páginas: 12

Código DOI: 10.21528/CBIC2017-31

Artigo em pdf: cbic-paper-31.pdf

Arquivo BibTeX: cbic-paper-31.bib