Skip to content

Camtrap DP export

At a glance

Export a Classification Project's results as a Camtrap DP package, optionally with privacy redaction, then publish it to Dataverse or Zenodo.

  • Time: ~10 minutes for a standard export
  • Who: project owners/managers publishing or sharing results
  • Prerequisites: a Classification Project with approved classifications

TRAPPER ships with a built-in exporter that produces Camera Trap Data Package (Camtrap DP) archives from a Classification Project — the community-developed TDWG-endorsed exchange format for camera trap data. A package is a Frictionless Data Package containing three required tabular resources (deployments, media, observations) plus a datapackage.json descriptor.

Trapper's ResultsDataPackageGenerator uses the camtrap-package Python library and frictionless[parquet] under the hood. Output archives are valid Camtrap DP packages, consumable by any tool that understands the standard (camtraptor in R, camtrap-dp.py, GBIF, …).

What gets exported

<package-name>.zip
├── datapackage.json        Frictionless descriptor + Camtrap DP metadata
├── deployments.csv.gz      One row per Deployment (or .parquet)
├── media.csv.gz            One row per Resource (or .parquet)
└── observations.csv.gz     One row per ClassificationDynamicAttrs
                            (or per event when aggregation is on; or .parquet)

The exporter pulls data from the live database via polars, hands it to CamTrapPackage for schema-aligned serialization, and writes the ZIP. CSV+gzip is the default; Parquet is offered for analytic workloads.

Each generated archive is recorded in the UserDataPackage model (/admin/accounts/userdatapackage/):

Field Meaning
user Generator/owner
project The source ClassificationProject
package (file) The ZIP archive — user's media area, or cloud storage
package_type C = CLASSIFICATION_RESULTS (released/public), X = CLASSIFICATION_RESULTS_CACHE (cached working copy), M = MEDIA_FILES (legacy)
released True for the canonical publishable copy, False for working/cached generations
cache_key MD5 hash of the export parameters — short-circuits re-runs with identical params
uuid4 Token for shareable download URLs (?rt=<uuid4>) without granting an account
date_created When the run finished

Every export run writes a cache (X) package; a subsequent run with the same cache_key returns the cached package without regenerating. Ticking Release marks the result as released (C) — canonical, excluded from cache invalidation, shown to project members on the project page.

Steps

1. Open the export form

There is no export button on the Classification Project's own pages. Go to Expert admin's Accounts → User data packages (/admin/accounts/userdatapackage/) and use the Export classification project results action in the list header (/admin/accounts/userdatapackage/export-classification-results/).

Classification project export form

2. Set the package metadata

Field What it does Recommended value
Name Slug-safe identifier (regex ^([-a-z0-9._/])+$) e.g. bialowieza-2025-summer
Title Human-readable e.g. "Białowieża 2025 summer camera-trap data"
Version Semver string 1.0 for a first release
Keywords Free-form tags
Licenses From the Licence registry Pick one explicitly — unset defaults to a private licence on data/media scopes

3. Choose the format

  • Export formatcamtrapdp (recommended) or trapper (internal, round-trip imports only).
  • File typecsv.gz (universal, default) or parquet (smaller, faster for analytics; consumer needs frictionless[parquet]).

4. Configure filtering

Field What it does
Approved only Include only Classifications with is_approved=True. Default True; turn off only for archival exports of in-progress work
Exclude blank Drop observation_type=blank rows
All deployments When off, only deployments with at least one matching observation are exported
Filter deployments Substring match against Deployment.deployment_id — e.g. "give me one camera's data"

5. Configure events (Camtrap-DP aggregation)

  • Include events — aggregate observations into event-level rows (one row per ecological event rather than per-frame detection).
  • Count variablecount (raw count per observation) or countNew (newly seen individuals only, for individual-identification projects).

6. Configure privacy / redaction

Field What it does
Mark media with humans as private Strips media URLs and bounding boxes for resources where AI/annotators detected a human
Mark media with vehicles as private Same, for vehicles
Private species Multi-select of Species whose locations should be redacted (e.g. sensitive predator dens) — observations stay, locations are coarsened
URLs with token Sign media URLs with a download token so external readers without Trapper accounts can access referenced files

7. Release and submit

Tick Release to mark the resulting archive as the canonical publishable copy; leave it unticked for working/cached exports. Submit — the export runs as a Celery task (celery_results_to_data_package). Watch the Data packages panel of your dashboard (/dashboard/) for the row to appear, then download with the icon in the Actions column.

Re-running with the same parameters is free

Generated archives obey the cache — re-submitting the form with identical parameters returns the cached package immediately. To force a fresh generation, change any parameter (e.g. bump the version) or tick Clear cache if available.

Publishing to external data hubs

Once you have a released package, push it to a public data hub. Two backends are supported out of the box: Dataverse (open-source, used by Harvard, GBIF nodes, many institutions) and Zenodo (CERN-hosted).

The form lives at Dashboard → Data packages → Publish data package:

  1. Pick the source UserDataPackage (must be released=True).
  2. Choose Data hub: Dataverse or Zenodo.
  3. Fill in connection details: Host URL (e.g. https://dataverse.example.edu or https://zenodo.org), API token (from the hub's user profile), Container (Dataverse only — the target dataset/collection ID), Secure connection (leave on unless the hub uses self-signed TLS internally).
  4. Submit. The Celery task celery_publish_data_package parses datapackage.json, maps it via camtrap_package.mapper.package2dataverse/package2zenodo, and uploads files + metadata via the hub's REST API. On success, the dashboard shows the public URL.

Caveats

  • Tokens are stored in the form submission only, not the database. Publishing fails if the token lacks write access.
  • For Dataverse, the target dataset must already exist and be in draft state — you publish the draft from the Dataverse UI afterward.
  • Zenodo creates immutable DOIs — be sure of the package contents before publishing.

Managing existing packages

User dashboard (/dashboard/) — Data packages panel lists the current user's released and working packages (filename, type, size, created, download, delete). Cached packages are hidden.

Admin view (/admin/accounts/userdatapackage/) — read-only listing for site admins, filterable by package_type, project, user, released. Useful for auditing disk usage per user or which projects have released packages.

Public sharing without an account

The download URL /data-package/<pk>/ accepts a ?rt=<uuid4> query token equal to the package's uuid4 field — anyone with the token-bearing URL can download without logging in. Use the Get download URL dashboard action to copy one. Untokened URLs require the package owner or a superuser; everyone else gets 404.

REST API

Generate / fetch a package

GET /media_classification/api/package/{project_pk}/ — standard DRF authentication (Session, OAuth2, or token). The requesting user needs can_view_classifications on the target Classification Project.

Query parameters mirror the export form's ResultsDataPackageGeneratorParams:

Parameter Effect Default
export_format camtrapdp / trapper camtrapdp
export_filetype csv.gz / parquet csv.gz
approved_only true / false true
exclude_blank true / false false
all_deployments true / false true
filter_deployments substring of deployment_id (empty)
include_events true / false false
events_count_var count / countNew count
private_human true / false true
private_vehicle true / false true
private_species comma-separated Species PKs (empty)
trapper_url_token sign media URLs false
release mark generated package released false
clear_cache bypass cached package match false
get_released return latest released package, skip generation false
name, version, title, description, keywords, licenses package metadata various

Response shape:

{
  "data": {
    "message": "Data package created.",
    "errors": null,
    "package": "https://trapper.example.org/data-package/12345/?rt=8e5d…"
  }
}

package is a token-bearing download link, usable directly without re-authenticating. message reads "Package available in cache..." when the cache short-circuited generation.

Example: programmatic export

# 1. Authenticate (token authentication shown; session and OAuth2 also work)
TOKEN="$(curl -s -X POST https://trapper.example.org/api/token/ \
  -d 'username=alice' -d 'password=secret' | jq -r .token)"

# 2. Generate (or reuse cache) a Camtrap DP export of project 7 in Parquet,
#    with humans + vehicles redacted, no events:
curl -s -G "https://trapper.example.org/media_classification/api/package/7/" \
  -H "Authorization: Token $TOKEN" \
  --data-urlencode "export_format=camtrapdp" \
  --data-urlencode "export_filetype=parquet" \
  --data-urlencode "approved_only=true" \
  --data-urlencode "include_events=false" \
  --data-urlencode "private_human=true" \
  --data-urlencode "private_vehicle=true" \
  --data-urlencode "name=bialowieza-2025-summer" \
  --data-urlencode "version=1.0" \
  --data-urlencode "title=Białowieża 2025 summer"

# 3. Pull the URL out and download:
URL="$(curl -s -G "https://trapper.example.org/media_classification/api/package/7/" \
         -H "Authorization: Token $TOKEN" | jq -r .data.package)"
curl -OJL "$URL"

Example: fetch the latest released package only

curl -s -G "https://trapper.example.org/media_classification/api/package/7/" \
  -H "Authorization: Token $TOKEN" \
  --data-urlencode "get_released=true" \
  | jq -r .data.package \
  | xargs curl -OJL

Returns 404 if the project has no released packages yet.

Example: Python

import requests

BASE = "https://trapper.example.org"
TOKEN = "..."

resp = requests.get(
    f"{BASE}/media_classification/api/package/7/",
    headers={"Authorization": f"Token {TOKEN}"},
    params={
        "export_format": "camtrapdp",
        "export_filetype": "csv.gz",
        "approved_only": "true",
        "private_human": "true",
        "name": "bialowieza-2025-summer",
        "version": "1.0",
        "title": "Białowieża 2025 summer",
    },
    timeout=600,    # generation can be slow on large projects
)
resp.raise_for_status()
download_url = resp.json()["data"]["package"]

archive = requests.get(download_url, stream=True)
archive.raise_for_status()
with open("package.zip", "wb") as fh:
    for chunk in archive.iter_content(chunk_size=2**20):
        fh.write(chunk)

The same flow works from R via httr2, or any other HTTP client. For analytic workflows, point camtraptor (R) or camtrap-dp.py at the downloaded archive directly — they understand the Camtrap DP format and abstract away the deployments/media/observations joins.

See also