A global, taxon-stratified, high-resolution sampling-effort dataset from GBIF for bias-aware ecological modelling

May 18, 2026·

Ahmed El-Gabbas

· 1 min read

PDF DOI GitHub Zenodo Appendix 1 Appendix 2 Supp. figs

Abstract

Introduction and Aim: Spatiotemporal and taxonomic sampling bias in biodiversity occurrence data poses critical challenges for robust ecological inference, species distribution models (SDMs), and conservation planning. Despite the exponential growth in global biodiversity records over recent decades, these biases persist. This study converts raw occurrence records from Global Biodiversity Information Facility (GBIF) into global, publicly available, taxon-stratified, and temporally resolved sampling-effort rasters using a reproducible workflow, providing transparent and standardised measures of observation count and species richness to support bias-aware ecological analyses.
Main Variables Included: Two complementary raster variables: observation count and species richness, each provided across major taxonomic groups and their descendant levels (e.g., classes, orders, families).
Time Coverage: Annual and cumulative rasters span 1980-2025.
Spatial Coverage: global; four spatial resolutions (~1, 5, 10, and 20 km).
Taxa: Nine major taxonomic groups: Amphibia, Arachnida, Aves (birds), Fungi, Insecta, Mammalia, Mollusca, Reptilia, and Tracheophyta (vascular plants), with descendant-level outputs
Applications: Based on ~3 billion records for >730,000 species, this study provides annual and cumulative global rasters quantifying observation count and species richness at four resolutions, stratified by nine taxonomic groups and their descendants. At 1 km resolution, 95% of records occupy merely 0.33% of Earth's surface (0.93% of land), whilst the remaining data extend across only 1.77% (3.88% of land), leaving approximately 98% (95% of land) unsampled. This extreme concentration persists across all taxonomic groups, underscoring the need for taxon-specific bias correction. Annual data enable exploration of long-term trends in data mobilisation and sampling effort. These rasters enable bias correction in presence-only SDMs, including MaxEnt bias files, target-group backgrounds, and model-based approaches. Beyond SDMs, they can inform macroecological synthesis, biodiversity monitoring, and systematic conservation planning by identifying spatial and temporal knowledge gaps. All data and code are openly available under FAIR principles, promoting transparent and reproducible biodiversity science.

Publication

Diversity and Distributions 32, no. 5: e70205.

Examples of sampling effort datasets

This study provides global, taxon-stratified rasters of sampling effort (observation counts and species richness) derived from ~3 billion GBIF records. These rasters are available for nine major taxonomic groups and their descendants, at four spatial resolutions (≈1, 5, 10, and 20 km), and for annual as well as cumulative time periods (1980–2025).

Seven examples are provided to illustrate different views and applications of the dataset:

Example 1 — Total recorded bird species richness (Aves, n_sp, 10 km)
Example 2 — Total bird observation count (Aves, n_obs, 10 km)
Example 3 — Cumulative recorded species richness across all groups (n_sp, 5 km)
Example 4 — Cumulative observation count across all groups (n_obs, 5 km)
Example 5 — Bird observations for the period 2015–2024 (Aves, n_obs, 5 km)
Example 6 — Descendant-level exploration (Insecta example, n_obs, 10 km)
Example 7 — Percentage-based spatial coverage analysis (Aves / All groups, n_obs, 20 km)

For full documentation, data access functions [get_sampling_effort()], repository structure, and the complete reproducible workflow, see the companion GitHub repository.

Last updated on May 28, 2026

Maxent Species Distribution Models Open-Access Conservation Sampling Efforts Sampling Bias Rstats

ecokit: tools for ecological and general utilities Dec 18, 2025 →