JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
Summary
JobArabi is a new, large-scale Arabic corpus of 20,528 job announcements gathered from public posts on X between January 2024 and October 2025. This dataset captures over two years of employment discourse across Arabic-speaking online communities. The corpus was compiled using a linguistically informed query framework that incorporated 21 Arabic keyword families, reflecting gendered, plural, formal, and dialectal recruitment language. It includes metadata such as timestamps, engagement indicators, and geolocation, enabling detailed temporal and regional analysis. Quantitative analysis of JobArabi revealed significant sociolinguistic patterns in online recruitment, including the persistence of gendered hiring language, regional differences in occupational demand, and the emotional framing of recruitment messages. This corpus, along with its documentation and collection scripts, will be publicly released to support research in Arabic NLP, computational social science, and digital labor studies.
Key takeaway
For Arabic NLP researchers or computational social scientists building language resources, you should integrate the JobArabi corpus into your projects. This dataset offers a unique opportunity to analyze real-world Arabic employment discourse, revealing sociolinguistic patterns like gendered language and regional demand. Utilizing JobArabi can enhance your models' understanding of nuanced recruitment language and provide valuable insights into labor market communication dynamics.
Key insights
JobArabi provides a unique Arabic social media corpus for analyzing labor market communication and sociolinguistic patterns.
Principles
- Sociolinguistic patterns exist in online recruitment.
- Gendered language persists in hiring discourse.
- Social media offers rich labor market data.
Method
Corpus compiled using a linguistically informed query framework with 21 Arabic keyword families, capturing gendered, plural, formal, and dialectal expressions.
In practice
- Analyze regional occupational demand.
- Study linguistic change in recruitment.
- Research emotional framing in job posts.
Topics
- Arabic NLP
- Social Media Analysis
- Job Announcements
- Corpus Development
- Sociolinguistics
- Labor Market Communication
Code references
Best for: NLP Engineer, AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.