Empirical Analysis on the State of Transfer Learning for Small Data Text Classification Tasks Using Contextual Embeddings

Title: Empirical Analysis on the State of Transfer Learning for Small Data Text Classification Tasks Using Contextual Embeddings

Authors: Felipe Carvalho, Cristiano Castro

Abstract: Recent developments in the NLP (Natural Language Processing) field have shown that deep transformer based language model architectures trained on a large corpus of unlabeled data are able to transfer knowledge to downstream tasks efficiently through fine-tuning. In particular, BERT and XLNet have shown impressive results, achieving state of the art performance in many tasks through this process. This is partially due to the ability these models have to create better representations of text in the form of contextual embeddings. However not much has been explored in the literature about the robustness of the transfer learning process of these models on a small data scenario. Also not a lot of effort has been put on analysing the behaviour of the two models fine-tuning process with different amounts of training data available. This paper addresses these questions through an empirical evaluation of these models on some datasets when fine-tuned on progressively smaller fractions of training data, for the task of text classification. It is shown that BERT and XLNet perform well with small data and can achieve good performance with very few labels available, in most cases. Results yielded with varying fractions of training data indicate that few examples are necessary in order to fine-tune the models and, although there is a positive effect in training with more labeled data, using only a subset of data is already enough to achieve a comparable performance with traditional non-deep learning models trained with substantially more data. Also it is noticeable how quickly the transfer learning curve of these methods saturate, reinforcing their ability to perform well with less data available.

Key-words: Small data, text classification, NLP, contextual embeddings, representation learning, deep learning

Pages: 7

DOI code: 10.21528/CBIC2019-82

PDF file: CBIC2019-82.pdf

BibTeX file: CBIC2019-82.bib