AI pipeline architecture¶

The AI pipeline turns an uploaded resource into detections, species classifications, and (optionally) per-frame distance and pose estimates. It spans all three TRAPPER services — Expert backend, AI Manager, and AI Worker coordinator — plus the GBIF-keyed species table.

Services involved¶

Service	Role in the pipeline
Expert backend	Orchestrates jobs: submits them to the Manager, receives results back, creates `Classification` (`classification_type=AI`) and `ClassificationDynamicAttrs` rows, resolves species via GBIF key, writes the frames msgpack blob.
AI Manager (`trapper-ai`)	Queue and coordination layer. Receives job requests from the Expert, batches resources, dispatches batch tasks to the AI Worker via Celery. Tracks job/batch/resource status. Stores msgpack blobs temporarily until the Expert fetches them.
AI Worker coordinator (`trapper-ai-worker`)	Runs as a host process (not a container) so it can directly access GPU or Hailo hardware. Pulls `trapperai_runtime.run_batch_prediction` tasks from its Celery queue, downloads resource files from the Manager, runs inference, and POSTs results back.

Why the Worker is a host process

Containerising the Worker would require GPU passthrough or privileged access — complex, fragile, and hardware-specific. Running on the host gives the Worker direct CUDA/Hailo device access without any container overhead. See Architecture.

End-to-end flow¶

sequenceDiagram
    participant U as User / Upload
    participant E as Expert backend
    participant M as AI Manager
    participant W as AI Worker

    U->>E: Upload resource (CS wizard or trapper-tools)
    E->>E: Create Resource, attach to Collection
    E->>M: POST /api/prediction_jobs/ (auto-submit or manual re-run)
    M->>M: Create PredictionJob + ResourceBatch rows, dispatch download_images task
    M->>W: send_task run_batch_prediction (runtime Celery queue)
    W->>M: GET resource files (origin_image URLs)
    W->>W: Run inference (detection, classification, pose/depth)
    W->>M: POST save_results (msgpack blob, zstd)
    M->>M: Store detected_objects FileField, refresh job status
    M->>E: Results available (Expert polls or webhook)
    E->>E: Unpack msgpack, resolve species, create Classification rows (type=AI)
    E->>E: Rebuild frames_msgpack blob (hypertable indexed later, on demand)
    E->>E: Privacy blurring (if blur_humans / blur_vehicles)

Stages¶

1. Resource ingestion¶

Resources arrive via the Citizen Science Upload wizard or trapper-tools. They are attached to a Collection; the Collection is linked to a ClassificationProject via a ClassificationProjectCollection row.

2. Job submission¶

When a new resource is ingested (or a pipeline re-run is triggered manually — see Run & re-run the AI pipeline), the Expert creates an AI Classification Job and submits it to the Manager via POST /api/prediction_jobs/. The submission carries:

The job type (detection, classification, distance_calibration, distance_estimation)
The prediction_model_id (the Manager's UUID for the selected AI provider)
The merged job_config (from the ClassificationProject's prediction_config field, shaped by trapper-schemas — see Classification project: Prediction Job Configuration)
Resource URLs and sample IDs

3. Batching and dispatch (Manager)¶

The Manager creates a PredictionJob row (tracking overall status: Initial → In Progress → Done / Failed / Canceled) and one or more ResourceBatch rows — each batch contains at most batch_size resources. A Celery download_images task is queued per batch.

enable_image_pass_through

If the selected PredictionModel has enable_image_pass_through = True on the Manager, the download step is skipped — the Worker receives resource URLs directly and fetches them itself. Used for cloud-hosted models.

Each batch progresses through stages:

CREATED → RESOURCE_DOWNLOADING → RESOURCE_DOWNLOADING_FINISHED → PREDICTION → FINISHED

and statuses: CREATED → IN_PROGRESS → SUCCEEDED / FAILED / CANCELED.

4. Inference (Worker)¶

The Worker pulls trapperai_runtime.run_batch_prediction from its runtime-specific Celery queue (queue name derived from AIRuntime.name). It:

Downloads resource files from the Manager (unless pass-through mode is active).
Runs the configured predictor (detection, classification, depth, or pose) against each resource.
Returns results as a msgpack blob (zstd-compressed, base64-encoded) per resource via POST /api/resource_batches/<id>/save_results/.

The msgpack format is documented in Msgpack detection-blob contract.

5. Result ingestion (Expert)¶

The Expert receives the results and, for each resource:

Unpacks the msgpack blob — extracts the objects array (detected bounding boxes with confidences, labels, frame indices).
Resolves species — maps the model's category labels to the project's Classificator species table using GBIF keys (gbifSpeciesKey). Labels with no mapping are dropped (if skip_missing_labels is set) or raise an error.
Creates Classification rows (classification_type=AI) — one per resource, linked to the resource's Collection and the ClassificationProject.
Creates ClassificationDynamicAttrs rows — one per detected object per frame, carrying observation_type, species, bounding box, confidence, and any extra per-frame metrics (distance, keypoints).
Rebuilds the msgpack blob — injects the newly-assigned DB IDs (ClassificationDynamicAttrs.id) into the msgpack objects so the frontend can reference specific detections by their database primary key.
Updates the frames hypertable — on the next populate_project_hypertable run, the per-frame detection data is indexed into the TimescaleDB hypertable for analytics queries (see Frame timeseries).

6. Privacy blurring¶

If the ClassificationProject has blur_humans or blur_vehicles enabled, a post-processing step rewrites the original media file to obscure the detected regions. blur_backup preserves an unblurred copy. blur_humans_and_vehicles_immediately applies blurring without waiting for human approval — irreversible unless backup is on.

7. Sequence building¶

A Celery task (celery_build_sequences) groups resources into sequences — ecological events — based on a configurable time window per collection (default 5 minutes). It is not a scheduled (beat) task: it fires when a collection is attached to a classification project, from the bulk Build sequences admin action, and after Citizen Science / collection uploads. Sequences must be built before annotators can step through bursts in the classify view. See Create a classification project, step 5.

Job types¶

Five capabilities flow through the same pipeline skeleton with different predictor logic. Four are submitted as distinct job types; pose estimation is bundled inside a detection job.

Detection¶

The detection job is the entry point for every pipeline except distance calibration, which runs independently against reference images (see Distance calibration). The Worker runs an object detector against each resource and a tracker to link detections across frames into continuous tracks (for video). The output is a set of DetectedObject tracks — one per individual animal, person, or vehicle — with bounding boxes indexed by frame.

Available detectors:

Model	Architecture	Notes
MegaDetector V1000 Sorrel ⭐	YOLO11	Recommended for GPU deployments. Latest and most accurate MegaDetector variant; `image_size: 960`. Used on the public TRAPPER demo.
MegaDetector V1000 Sorrel (Hailo HEF)	YOLO11	Same model compiled for Hailo-8 edge accelerator.
MegaDetector V6 (YOLOv10n) ⭐	YOLOv10	Recommended for CPU / lightweight setups. Fast nano-size model; good accuracy/speed trade-off. Default in the example `coordinator.yaml`.
MegaDetector V6 (YOLOv9c / YOLOv9e / YOLOv10x)	YOLOv9 / YOLOv10	Larger V6 variants — higher accuracy than YOLOv10n at higher compute cost.
MegaDetector V1000 Redwood / Spruce	YOLOv5	Older V1000 variants; superseded by Sorrel.
MegaDetector V5	YOLOv5	Legacy baseline; superseded by V1000 and V6.
DeepFaune Detector v1.1	YOLOv8	Trained as a pair with the DeepFaune Classifier — the two were developed together and share preprocessing assumptions. `image_size: 960`.

All detection models output three observation_type classes: Animal, Human, Vehicle.

Tracking:

For video resources, a tracker links frame-by-frame detections into multi-frame tracks. The tracker is configured per-job via prediction_config.detection.common.tracker. Available options include botsort / bytetrack (via Ultralytics) and boxmot:<name> (via the BoxMOT library). See Configure trackers for the full list and configuration knobs.

Classification¶

The classification job runs a species classifier on cropped detections from a prior detection run. It requires detection to have completed first — the classifier receives the bounding-box crops, not the full frame, so detection must have run and its results must be stored.

Available classifiers:

Model	Architecture	Species coverage	Notes
DeepFaune Classifier v1.4 ⭐	ViT-L/14 (DINOv2)	38 European mammals	Recommended. Latest version; broadest species coverage. Pair with DeepFaune Detector v1.1 or any MegaDetector.
DeepFaune Classifier v1.3	ViT-L/14 (DINOv2)	34 European mammals	Previous generation; superseded by v1.4.
DeepFaune Classifier v1.2	ViT-L/14 (DINOv2)	30 European mammals	Older; superseded by v1.4.
TrapperAI v02.2024	YOLOv8-m	18 European mammals	Combined detector+classifier in one model; lower species coverage.
SDZWA Andes Classifier v1	EfficientNet	53 Andes-region species	Regional classifier; also available as a Hailo HEF variant.
SDZWA Amazon Classifier v1	ONNX	43 Amazon-region species	Regional classifier; also available as a Hailo HEF variant.
SDZWA USA Southwest Classifier v3	EfficientNet	27 US-Southwest species	Regional classifier; also available as a Hailo HEF variant.
SDZWA Savanna Classifier v3	EfficientNet	63 African-savanna species	Regional classifier; also available as a Hailo HEF variant.

The table lists the classifiers bundled in the trapper-schemas model manifest — the set the AI Manager can register out of the box via sync_models_from_schemas. The Worker additionally ships a DeerAI (EfficientNet/ONNX, deer-specific) predictor class, but it has no manifest entry, so it can't be registered through the normal sync; treat it as experimental.

The classifier output is a species label plus a confidence score per detected object. The Expert resolves the label to a Species row via its gbifSpeciesKey, creating ClassificationDynamicAttrs rows with the species attached.

Label mapping and skip_missing_labels

If a model outputs a label that has no matching species in the project's Classificator (by GBIF key), the behaviour depends on common.skip_missing_labels: if true, the detection is silently dropped; if false (default), result ingestion raises an error. Always verify that the classifier's species list overlaps with your project's Classificator before running.

Pose estimation¶

Pose estimation runs inside the detection job — not as a separate submission. When a pose_ai_model is configured on the ClassificationProject, the Worker passes each tracked detection crop through the pose predictor immediately after the tracker, adding 39-keypoint skeletal estimates to the msgpack output.

Available pose models:

Model	Architecture	Keypoints	License
SuperAnimal Quadruped X	RTMPose-x (DLC)	39 (quadruped skeleton)	Non-commercial
SuperAnimal Quadruped M	RTMPose-m (DLC)	39 (quadruped skeleton)	Non-commercial

Non-commercial license

Both SuperAnimal models use DeepLabCut weights with a non-commercial license. They must not be used for commercial applications without a separate license from the DLC team.

Keypoints are stored as per-frame data in the frames msgpack blob and surfaced in the classification frontend as an overlay on the bounding box.

Distance calibration¶

Before the pipeline can estimate real-world distances, it must be calibrated against a set of reference images — photographs taken at known distances. The calibration job:

Runs a monocular depth model against the reference images.
Fits an affine or piecewise linear function mapping predicted depth values to known metric distances.
Stores the resulting CalibrationModel JSON on the PredictionJob row.

The calibration is done once per deployment or camera position — it encodes the camera's specific geometry and lens characteristics. See Distance estimation for the full workflow.

Distance estimation¶

With a calibration in place, the distance estimation job runs the depth model on all resources in the collection and converts raw depth maps to metric distances for each detected animal. The pipeline:

Runs the depth model to produce a per-pixel depth map.
Uses FastSAM to segment the animal within its bounding box, deriving a representative depth value for the object.
Applies Kalman filter and Hampel filter smoothing to remove outliers across the track's frame series.
Writes the per-frame distance series into the frames msgpack blob.

Available depth models:

Model	Params	Notes
Depth Anything V3 Large ⭐	350M	Recommended for accuracy. Highest quality depth maps; requires ~8 GB VRAM.
Depth Anything V3 Base ⭐	120M	Recommended for most deployments. Good balance of accuracy and VRAM (~4 GB).
Depth Anything V3 Small	35M	Fastest; lowest VRAM (~2 GB).
Depth Anything V3 Metric Large	350M	Metric-scale output; camera-specific calibration still required.
MiDaS	varies	Legacy baseline; superseded by DA3.
Hailo Depth (HEF)	—	Edge-optimised for Hailo-8 accelerator.

Runtime selection¶

The AI Worker coordinator manages multiple runtimes — isolated Python environments, each targeting a specific hardware backend and predictor library. The coordinator selects the appropriate runtime automatically based on detected hardware. Detection/classification models and depth models run in separate runtimes, so each hardware lane has a parallel depth variant:

flowchart LR
    HW{"Detected<br/>hardware"} --> N["NVIDIA GPU"]
    HW --> A["Apple Silicon"]
    HW --> CPU["CPU<br/>(incl. Ampere ARM64)"]
    HW --> H8["Hailo-8"]

    N --> NR["PyTorch CUDA / ONNX Runtime CUDA<br/><i>depth: PyTorch Depth CUDA</i>"]
    A --> AR["PyTorch MPS<br/><i>depth: PyTorch Depth MPS</i>"]
    CPU --> CR["PyTorch CPU / ONNX Runtime CPU<br/>(Ampere AIO variants are docker-only)<br/><i>depth: PyTorch Depth CPU</i>"]
    H8 --> HR["Hailo-8 Runtime (.hef)<br/><i>depth: Hailo-8 Depth Runtime</i>"]

Runtime	Hardware	Predictor library	Notes
PyTorch CUDA	NVIDIA GPU	`trapper-predictors-torch`	Primary GPU runtime; requires CUDA 13.0+, driver ≥ 580.0
PyTorch MPS (Apple Silicon)	Apple Silicon GPU	`trapper-predictors-torch`	macOS only
PyTorch CPU	CPU	`trapper-predictors-torch`	Fallback; slow for large models
PyTorch Ampere AIO	Ampere ARM64 CPU	`trapper-predictors-torch`	Docker-only (`docker_only`); requires the `ampere_aio` capability
ONNX Runtime CPU	CPU	`trapper-predictors-onnx`	Fast inference for ONNX models
ONNX Runtime CUDA	NVIDIA GPU	`trapper-predictors-onnx`	ONNX on GPU
ONNX Runtime Ampere AIO	Ampere ARM64 CPU	`trapper-predictors-onnx`	Docker-only (`docker_only`); requires the `ampere_aio` capability
Hailo-8 Runtime	Hailo-8 accelerator	`trapper-predictors-hailo`	Edge deployments; `.hef` model format
PyTorch Depth CUDA	NVIDIA GPU	`trapper-predictors-torch-depth`	Separate runtime for depth models
PyTorch Depth MPS (Apple Silicon)	Apple Silicon GPU	`trapper-predictors-torch-depth`	macOS only
PyTorch Depth CPU	CPU	`trapper-predictors-torch-depth`	CPU fallback for depth models
Hailo-8 Depth Runtime	Hailo-8 accelerator	`trapper-predictors-hailo-depth`	Depth models compiled to `.hef`

Each runtime runs in an isolated process (or Docker container for docker_only runtimes) with its own virtual environment. Runtimes are defined in runtimes.yaml and the coordinator selects the highest-priority compatible runtime at job dispatch time.

Deployment plans

The coordinator.yaml deployments section pre-loads specific model+runtime combinations at coordinator startup. Without a deployment plan, models are downloaded and loaded on-demand when the first job for that model arrives — adding latency to the first inference. For production deployments, pre-loading the models you use most is strongly recommended.

Job chaining¶

Job types have dependencies that determine execution order:

detection ─────────────────────────────────────────────────► (results in Expert)
              └─► classification (needs detection bboxes)
              └─► pose estimation (runs inside detection job)

distance_calibration ──────────────────────────────────────► CalibrationModel stored
              └─► distance_estimation (needs calibration + detection bboxes)

The Expert enforces these constraints: a classification job will not be submitted if no detection results exist for the resource; distance_estimation requires both a completed distance_calibration and existing detections.

Configuration layers¶

The pipeline's behaviour is controlled at three levels, from broadest to narrowest:

Layer	Where	What it controls
AI provider catalog	`trapper-schemas` → Manager → Expert sync	Which models exist, their `model_config` defaults and weight file hashes
ClassificationProject fields	Expert admin — AI Models section	Which provider to use per job type; blurring rules; CS visibility
Prediction Job Configuration	Expert admin — `prediction_config` JSON	Per-job-type `common` knobs (fps, confidence, skip_empty), spec parameters, and `model_config_overrides` layered on the manifest baseline

See Classification project: advanced configuration for the full field reference.

Two confidence thresholds

model_config_overrides.confidence_threshold is the inference-time cutoff inside the Worker — predictions below it are discarded before the msgpack is sent back. common.minimum_confidence is the post-processing threshold on the Expert side — predictions below it are stored as Classification rows (classification_type=AI) but not auto-approved. They operate at different stages and can be set independently.

Species resolution¶

The pipeline does not store raw model label strings. Instead, every classifier output is mapped to a GBIF taxon key — a stable, globally unique species identifier from the GBIF Backbone Taxonomy.

When the Expert ingests classification results:

It looks up each model label in the Species table by gbifSpeciesKey.
If a match is found, the species is attached to the ClassificationDynamicAttrs row.
If no match is found, behaviour depends on skip_missing_labels (see above).

This design means you can swap classifiers (e.g. upgrade from DeepFaune v1.2 to v1.4) without remapping species identifiers — as long as the new model's labels resolve to the same GBIF keys, the existing Classificator and any previously-approved classifications remain valid.

See Taxonomy & GBIF for how species are imported and managed.

Observability¶

What to check	Where
Live job and batch queue	Flower at `http://localhost:5555`
Manager job status	AI Manager admin → Prediction jobs
Manager batch/resource status	AI Manager admin → Resource batches
Expert-side AI Classification rows	Expert admin → Media classification → Ai classifications
Hypertable population status	Expert admin → Media classification → Project hypertable populations
Worker logs	`./profiles/<profile>/logs.sh trapper-ai-worker`
Coordinator runtime status	`GET http://localhost:34821/api/runtimes` (coordinator HTTP API)
Model download/cache status	`GET http://localhost:34821/api/models`