Language Accessibility in African Open Data: Barriers, Costs, and Interventions
Language Access
A comparative study of metadata accessibility across 12 African languages, quantifying the discoverability gap and identifying the highest-impact interventions.
Authors: Datum Africa Research Unit
This research note presents findings from a comparative study of metadata accessibility across 12 African languages, conducted by the Datum Africa Research Unit in 2025–2026.
Context: Africa has over 2,000 spoken languages. The 54 countries of the continent collectively recognize hundreds of official and national languages. Yet the dominant open data platforms and standards used across the continent operate almost exclusively in English, French, Portuguese, and Arabic. The remaining languages — spoken by hundreds of millions of people — have no infrastructure for open data documentation.
Study design: We selected 12 languages representing different language families, geographic regions, and writing systems: Amharic, Hausa, Igbo, Kinyarwanda, Lingala, Malagasy, Shona, Somali, Swahili, Tigrinya, Wolof, and Yoruba. For each language, we assessed the availability of open data platforms with UI support, the availability of data documentation guides, the discoverability of datasets documented in that language, and the community of active contributors working in that language.
Key findings: Zero of the 12 languages had full UI support on any major open data platform. Two languages (Swahili and Amharic) had partial UI support on one platform each. None had published data documentation guides. Datasets documented in any of these languages were systematically undiscoverable via search, because search infrastructure was not built to handle these language scripts or vocabulary.
The discoverability gap was quantified using a controlled experiment: 40 identical datasets were documented in both English and one of the study languages. English-documented versions were discoverable in 91% of search queries. Non-English-documented versions were discoverable in 12%.
Recommendations: The report outlines a phased approach to reducing the language accessibility gap, focusing first on the languages with the largest contributor communities (Swahili, Hausa, Yoruba, Amharic) and the highest volume of undocumented datasets, before expanding to others.
Related Stories
Datum Africa Initiative Launches Dedicated Website
Datumafrica.org now serves as the home of the Datum Africa Initiative — mission content, research, p…
AnalysisWhy African Data Needs Its Own Infrastructure — Not Just Adaptation
Adapting Western data governance frameworks to African contexts is not enough. We need infrastructur…
NewsCommunity Data Stewards Cohort 2026 Now Accepting Applications
Applications are open for data stewards who will improve metadata quality, taxonomy support, and dat…