Skip to content

AI pipeline architecture

The AI pipeline turns an uploaded resource into detections, species classifications, and (optionally) per-frame distance estimates. It involves all three Trapper services plus the GBIF-keyed species table.

sequenceDiagram
    participant U as User
    participant E as Expert backend
    participant M as AI Manager
    participant W as AI Worker coordinator

    U->>E: Upload resource
    E->>M: POST /api/prediction_jobs/<br/>(ClassificationProject auto-submit)
    M->>M: Batch resources, dispatch<br/>to runtime queue (Celery)
    M->>W: trapperai_runtime.run_batch_prediction
    W->>W: Download weights (model_manifest.yaml),<br/>run configured runtime
    W->>M: POST /api/resource_batches/<id>/save_results/<br/>(msgpack blob, zstd)
    M->>E: Results available
    E->>E: Create AI Classification +<br/>ClassificationDynamicAttrs rows;<br/>resolve species via gbifSpeciesKey
    E->>E: Write frames msgpack;<br/>hypertable snapshot updated<br/>on next population run

The AI Worker coordinator is a host process (not a container) so it can autodetect and access GPU/Hailo hardware directly — see Architecture.

Where the configuration lives

The pipeline's behaviour is split across three places:

  1. AI provider catalog — what models are available. Sourced from trapper-schemas, registered on the AI Manager via sync_models_from_schemas, surfaced on the Expert via sync_ai_models. Run on every upgrade. See Register & sync AI providers.
  2. ClassificationProject AI fields — which providers to use, when to require AI, IoU thresholds, blurring rules, video FPS. See Create a classification project.
  3. Job-time overrides — the snapshot job_config JSON written per AI Classification Job. Captures exactly what was submitted for that run; usable for forensic debugging.

Stages of the pipeline

Each new resource flows through these stages in order. The ClassificationProject controls which stages run by setting AI providers on the relevant fields:

  1. Object detection. Set object_detection_ai_model to a detection-type AI Provider (typically MegaDetector v5/v6, YOLOv8 variants, or a Trapper-tuned detector).
  2. Species classification. Set species_ai_model to a classification-type provider (DeepFaune, EfficientNet, SDZWA …). Runs only on objects classified as observation_type=animal by stage 1.
  3. Sequence building. Triggered by the celery_build_sequences task (Celery beat) — groups resources into ecological events using the time interval configured per collection.
  4. Privacy blurring. Triggered after detection if the project sets blur_humans / blur_vehicles. Rewrites the original media files; blur_backup keeps copies.
  5. Distance estimation (optional). See Distance estimation for the dedicated configuration path.
  6. Tracking (video only). See Configure trackers.

See also