AI pipeline architecture¶
The AI pipeline turns an uploaded resource into detections, species classifications, and (optionally) per-frame distance estimates. It involves all three Trapper services plus the GBIF-keyed species table.
sequenceDiagram
participant U as User
participant E as Expert backend
participant M as AI Manager
participant W as AI Worker coordinator
U->>E: Upload resource
E->>M: POST /api/prediction_jobs/<br/>(ClassificationProject auto-submit)
M->>M: Batch resources, dispatch<br/>to runtime queue (Celery)
M->>W: trapperai_runtime.run_batch_prediction
W->>W: Download weights (model_manifest.yaml),<br/>run configured runtime
W->>M: POST /api/resource_batches/<id>/save_results/<br/>(msgpack blob, zstd)
M->>E: Results available
E->>E: Create AI Classification +<br/>ClassificationDynamicAttrs rows;<br/>resolve species via gbifSpeciesKey
E->>E: Write frames msgpack;<br/>hypertable snapshot updated<br/>on next population run
The AI Worker coordinator is a host process (not a container) so it can autodetect and access GPU/Hailo hardware directly — see Architecture.
Where the configuration lives¶
The pipeline's behaviour is split across three places:
- AI provider catalog — what models are available. Sourced from
trapper-schemas, registered on the AI Manager viasync_models_from_schemas, surfaced on the Expert viasync_ai_models. Run on every upgrade. See Register & sync AI providers. - ClassificationProject AI fields — which providers to use, when to require AI, IoU thresholds, blurring rules, video FPS. See Create a classification project.
- Job-time overrides — the snapshot
job_configJSON written per AI Classification Job. Captures exactly what was submitted for that run; usable for forensic debugging.
Stages of the pipeline¶
Each new resource flows through these stages in order. The ClassificationProject controls which stages run by setting AI providers on the relevant fields:
- Object detection. Set
object_detection_ai_modelto a detection-type AI Provider (typically MegaDetector v5/v6, YOLOv8 variants, or a Trapper-tuned detector). - Species classification. Set
species_ai_modelto a classification-type provider (DeepFaune, EfficientNet, SDZWA …). Runs only on objects classified asobservation_type=animalby stage 1. - Sequence building. Triggered by the
celery_build_sequencestask (Celery beat) — groups resources into ecological events using the time interval configured per collection. - Privacy blurring. Triggered after detection if the project sets
blur_humans/blur_vehicles. Rewrites the original media files;blur_backupkeeps copies. - Distance estimation (optional). See Distance estimation for the dedicated configuration path.
- Tracking (video only). See Configure trackers.
See also¶
- Classification model — what the pipeline's output looks like in the data model
- Msgpack detection-blob contract — the wire format between Worker and Expert