Machine learning and digital pragmatics: Which word category influences emoji use most?
Summary
A study investigated the use of Machine Learning (ML) for predicting emoji usage in Arabic tweets, specifically focusing on the influence of word categories. Researchers collected a corpus of 11,379 Arabic colloquial tweets from X.com via Python, refining it to a net dataset of 8,695 tweets for analysis. These tweets were classified into 14 numerically encoded categories, serving as labels. A preprocessing pipeline was established as an interpretable baseline to examine the relationship between lexical features and emoji categories. The MARBERT model was fine-tuned for emoji prediction from textual input, achieving an overall accuracy of 0.75. The findings suggest promising results but highlight the ongoing need for improving ML models, including MARBERT, particularly for low-resource and multidialectal languages like Arabic.
Key takeaway
For research scientists developing natural language processing models for low-resource or multidialectal languages, you should consider fine-tuning existing models like MARBERT but anticipate the need for significant dataset curation and model refinement to achieve higher accuracy. Focus on capturing dialectal nuances and expanding lexical feature analysis to improve emoji prediction and broader language understanding.
Key insights
MARBERT model shows promise in predicting Arabic emoji use, but needs further refinement for dialectal nuances.
Principles
- Lexical features influence emoji use.
- Multidialectal languages pose ML challenges.
Method
A preprocessing pipeline classifies 8,695 Arabic tweets into 14 categories, then fine-tunes MARBERT to predict emoji use from textual input, evaluating performance with precision, recall, and F1-scores.
In practice
- Use MARBERT for Arabic text analysis.
- Collect dialect-specific datasets.
- Classify text into word categories.
Topics
- Machine Learning
- Emoji Prediction
- Arabic Dialects
- MARBERT Model
- Natural Language Processing
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.