Skip to content

Taxonomy & GBIF

TRAPPER's Species table is keyed against the GBIF Backbone Taxonomy — the same global, stable taxonomy GBIF itself uses to reconcile millions of biodiversity records. Species.taxon_id stores the GBIF Backbone key (gbifSpeciesKey); everything that needs to talk about "this species" — AI provider category mappings, Camtrap DP exports, cross-deployment analytics — pivots around that key rather than a free-text species name.

Why a stable external key, not a name string

Species names are not a reliable join key on their own: synonyms, regional spelling variants, and taxonomic revisions (a species reclassified into a different genus) all break a name-string match silently. GBIF's Backbone assigns one stable numeric key per accepted taxon and tracks synonyms centrally — so two systems (TRAPPER and an AI model trained elsewhere) that both speak "GBIF key 2433433" are unambiguously talking about the same species, even if one of them still uses an older common name internally.

Where the key shows up

  • Species.taxon_id — the local row's GBIF key, populated by importing the GBIF Backbone.
  • AI Provider categories JSON — each species label an AI model can predict carries a gbifSpeciesKey (sourced from trapper-schemas, the cross-service model catalog). sync_ai_models resolves this key to a local Species.taxon_id and stamps the matching PK back into the JSON as speciesId — this is the actual mechanism that turns "the model predicted GBIF key X" into "this is Species row 42 in your database". See Data model: AI provider models.
  • Camtrap DP export — the Camtrap DP standard's observations table has a scientificName field; TRAPPER's exporter resolves it from the same Species row, so the GBIF key is implicitly what anchors an exported observation to a globally recognized taxon.

What happens when the key can't be resolved

sync_ai_models doesn't hard-fail on an unresolved gbifSpeciesKey — it falls back to a case-insensitive match on the label's scientific name against Species.latin_name, and if that also fails, creates a new Species row from the label's name. This keeps AI sync from blocking on taxonomy gaps, but the auto-created row won't have the GBIF-sourced order/family/genus fields until you import (or re-import) the taxonomy — see Taxonomy. In practice this means: import the taxonomy before the first sync_ai_models run on a fresh deployment, or expect to do one follow-up --mode update import to backfill any auto-created species.

See also