Feb 17, 2026

Metadata Quality Patterns Across Community-Contributed Datasets in Sub-Saharan Africa

Data Quality

EnglishTechnical Report2 author(s)

A systematic analysis of 500 community-contributed datasets identifying the most common metadata gaps and the practical interventions most effective at addressing them.

Authors: Datum Africa Research Unit, 2025 Volunteer Cohort

This technical report presents findings from a systematic analysis of 500 community-contributed datasets hosted on the datum.africa platform, conducted between August and December 2025 by the Datum Africa Research Unit and the 2025 Volunteer Cohort.

Methodology: Each dataset was evaluated against a seven-point metadata quality rubric: (1) title completeness and clarity, (2) description quality, (3) license clarity, (4) format documentation, (5) temporal coverage, (6) geographic scope, (7) collection methodology documentation. Datasets were scored 0–7 and classified as Complete (6–7), Partial (3–5), or Minimal (0–2).

Distribution of scores: 18% Complete, 45% Partial, 37% Minimal. The most commonly missing elements were collection methodology documentation (absent in 71% of datasets) and geographic scope (absent in 54%). License documentation was present in 82% of datasets — the highest-performing dimension.

Patterns by contributor type: Government-contributed datasets scored significantly higher (mean: 5.1) than civil society-contributed (mean: 3.4) and individual contributor datasets (mean: 2.8). This gap reflects the presence or absence of dedicated documentation staff, not differences in dataset quality.

Patterns by language: Datasets documented in non-English languages had significantly lower completeness scores — not because contributors documented less carefully, but because the documentation infrastructure (field labels, help text, guides) was available only in English. This represents a structural barrier, not a capability gap.

Interventions: The report evaluates five targeted interventions tested during the 2025 Volunteer Cohort, including peer documentation review, template-based description writing, and taxonomy alignment assistance. All five interventions significantly improved scores. Template-based description writing had the highest impact-to-effort ratio. Full data and methodology notes are available through datum.africa.

Related Stories