Linguistic Diversity
Summary
Chenai Chair, founder of My Data Rights (Africa) and leader of the Masakhane African Languages Hub, critiques the current rush for linguistic diversity in AI, particularly in the Majority World. She argues that this push is driven by vested interests seeking new markets, posing political and social risks, and sidelining long-standing community efforts. Chair highlights concerns about data safeguards, potential for increased surveillance, and the lack of cultural nuance in Big Tech-led AI datasets, which often miss the social embeddedness of less-resourced languages. She advocates for a collaborative, bottom-up approach that centers communities, drawing lessons from Masakhane's experiences in developing community-led linguistic datasets and emphasizing the importance of building on existing initiatives and sharing resources.
Key takeaway
For AI product managers and policy makers developing language technologies for diverse populations, you should critically evaluate the motivations behind linguistic data collection. Prioritize community-led initiatives and ensure robust data protection and governance frameworks are in place, respecting local social norms and the right of communities to refuse digitization. Your efforts should build on existing work and foster collaborative resource sharing rather than initiating new, extractive data collection drives.
Key insights
The rush for linguistic diversity in AI must prioritize community-led, collaborative approaches over market-driven data extraction.
Principles
- Follow the money to understand investment drivers.
- Language digitization requires robust safeguards and governance.
- Community consent is paramount, including the right to refuse.
Method
Masakhane's community-led approach involves grassroots participation in dataset design, integrating linguists, sociologists, community speakers, and respecting social norms, including the right to refuse digitization.
In practice
- Engage state-recognized language councils.
- Collaborate with diverse stakeholders beyond technologists.
- Build on existing datasets and pool resources.
Topics
- Linguistic Diversity
- AI Governance
- Community-led AI
- African Languages
- Data Ethics
Best for: AI Ethicist, Policy Maker, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Now Institute.