Skip to content

Msgpack detection-blob contract

Per-frame object data (one bounding box per object per frame, for every frame of a video) is too granular to store as relational rows for every classification — so it's encoded as a MessagePack blob, zstd-compressed, and stored as a single file on Classification.frames_msgpack. This is the wire format, defined in trapper/apps/media_classification/frames.py.

Source of truth

This page describes the Expert-side dataclasses (AggregateFramesPayload/ObjectFrames/FrameSample). The AI Worker emits an equivalent structure on its side of the contract — if you're working in trapper-ai-worker, cross-check against its own payload-building code rather than assuming byte-for-byte identity; the two services share the shape of the contract, not necessarily a single shared schema package. See Frame timeseries for why this file, not the ObjectFrameObservation hypertable, is the source of truth.

Encoding

raw = msgpack.packb(payload.asdict(), use_bin_type=True)
compressed = zstd_compressor.compress(raw)

Served over HTTP with Content-Type: application/x-msgpack+zstd via GET /api/media-classifications/classifications/<id>/frames/.

Top-level: AggregateFramesPayload

One per classification (AI, USER, or FEEDBACK — never FINAL).

Field Type Notes
classification_type "AI" | "USER" | "FEEDBACK" Mirrors Classification.classification_type.
classification_id int
base_timestamp ISO string or null Resource's base timestamp.
timezone string Resource's timezone name.
fps float or null null for images.
duration float or null Video duration in seconds; null for images.
classification_model string or null AI model name — only set for AI-type classifications.
objects list of ObjectFrames One entry per tracked object.

ObjectFrames

One per detected/tracked object (track) within the classification.

Field Type Notes
id int The corresponding ClassificationDynamicAttrs row ID.
data list of FrameSample Per-frame observations for this object.
observation_type string or null animal / human / vehicle / blank / etc.
species string or null Latin name, if resolved.

FrameSample

One per frame where the object was observed (sparse — frames with no detection simply have no entry, they aren't padded with nulls in this list).

Field Type Notes
frame_index int Zero-based, within the resource's timeline.
x, y, w, h float or null Normalized bounding box, xywh.
confidence float or null Detection confidence.
ts string or null Timestamp, %Y-%m-%dT%H:%M:%S%z.
distance float or null Per-frame distance estimate in metres — see Distance estimation.
kpts list of int or null Sparse pose keypoints (17 OKS-anchor subset), flat array of round(coord_norm * 10000) pairs. Only present on pose-model target-FPS frames.
kpt_scores list of int or null Per-keypoint confidence, round(score * 1000).

Don't confuse this with the frontend's sparse bboxes array

The video annotation frontend works with a dense, null-padded bboxes array indexed by frame number (so it can interpolate gaps). The wire format above is the sparse on-disk encoding — the frontend's dense array is built from this on load, and collapsed back to sparse FrameSample entries on save.

Rebuilding from ClassificationDynamicAttrs

For images only, the msgpack can be rebuilt on-the-fly from the relational dyn_attrs JSON columns if the file goes missing (e.g. a storage misconfiguration) — SmartFrameService().build_from_dyn_attrs(classification=...). Video tracks cannot be rebuilt this way — the full per-frame track only ever existed in the msgpack (or the project's ObjectFrameObservation hypertable snapshot, if it's been populated), never in ClassificationDynamicAttrs itself (which only stores the first occurrence's bbox).

See also