Could large language models estimate valence of words? A small ablation study

Título: Could large language models estimate valence of words? A small ablation study

Autores: Frederico C. Jandre, Gabriel C. Motta-Ribeiro, João Vitor Assumpção da Silva

Resumo: Large language models (LLMs) saw substantial development in recent years. Although trained with broad-range corpora, LLMs have been shown to display capabilities such as quantitative sentiment analysis without the need for further fine tuning. In this study, we performed a small ablation study to evaluate the performance of 3 off-the-shelf LLMs in the task of assigning ratings of hedonic valence to words: GPT-3.5 in chat mode, and GPT-3 and Bloom in completion mode. The models were operated via their public APIs, using prompts engineered to request emojis and ratings of valence in a 9-point scale to represent each of 140 words drawn from a large dataset rated by humans. Prompts were designed to demand the ratings from an adult, with modifiers average or overly positive employed to assess their effects on the results. All linear regressions between the LLM outputs and the human ratings had p-value<0.001. The 95% confidence intervals of the slopes include 1.0 for "adult" and "average adult", except for the model Bloom. These simulacra responded, albeit with limitations, to valence of words and to modifiers in the prompt.

Palavras-chave: large language models, sentiment analysis, hedonic valence, ablation study, prompt engineering

Páginas: 6

Código DOI: 10.21528/CBIC2023-148

Artigo em pdf: CBIC_2023_paper148.pdf

Arquivo BibTeX: CBIC_2023_148.bib