Sentiment analysis for software engineering: How far can zero-shot learning (ZSL) go?

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Expert, extended

Summary

A study investigated the efficacy of zero-shot learning (ZSL) for sentiment analysis in software engineering, aiming to overcome the challenge of scarce annotated datasets. Researchers evaluated embedding-based, natural language inference (NLI)-based, task-aware representation of sentences (TARS)-based, and generative-based ZSL techniques across seven publicly available software engineering datasets, including API reviews, GitHub comments, and Jira issues. The study assessed the impact of various label configurations (original, expert-curated, LLM-generated) and compared ZSL performance against state-of-the-art fine-tuned transformer-based models. Findings indicate that ZSL techniques, particularly embedding-based models like E_M9 combined with expert-curated labels, can achieve macro-F1 scores comparable to or exceeding fine-tuned models. Error analysis revealed that subjectivity in annotation and "polar facts" were primary causes of misclassifications, especially for neutral sentiments.

Key takeaway

For AI Engineers developing sentiment analysis tools in software engineering, consider implementing ZSL with pre-trained embedding-based models like E_M9 and expert-curated labels. This approach can yield performance competitive with fine-tuned models, significantly reducing the need for costly, context-specific annotated datasets. Be aware that neutral sentiment classification remains a challenge, often due to annotation subjectivity and "polar facts," requiring careful post-processing or targeted refinement.

Key insights

Zero-shot learning offers a viable solution for sentiment analysis in software engineering, mitigating annotated data scarcity.

Principles

Method

The study empirically evaluated four ZSL techniques (embedding, NLI, TARS, generative) with varied label configurations (original, expert-curated, LLM-generated) on seven software engineering datasets, comparing macro-F1 scores against fine-tuned transformers.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.