Skip to content

AI Red Teaming

AI red teaming for models and agents.

Terminal window
$ dn airt <command>

AI red teaming for models and agents. Launch attacks with run / run-suite; review results from the CLI (analytics, traces, trials, findings) or in the web app under AI Red Teaming — overview dashboard, per-assessment view, trace view, and custom report builder.

Terminal window
$ dn airt create <--name> <str>

Create a new AIRT assessment.

Options

  • --name (Required)
  • --project-id — Project ID. Defaults to the active project scope.
  • --runtime-id — Runtime ID. Required when the project has multiple runtimes.
  • --description — Assessment description
  • --session-id — Session ID to associate
  • --target-config — Target configuration as JSON
  • --attacker-config — Attacker configuration as JSON
  • --attack-manifest — Attack manifest as JSON
  • --workflow-run-id — Workflow run ID
  • --workflow-script — Workflow script content
  • --json (default False)
Terminal window
$ dn airt list

List AIRT assessments.

Options

  • --project-id — Project ID filter
  • --page (default 1)
  • --page-size (default 50)
  • --json (default False)
Terminal window
$ dn airt get <assessment-id>

Get an AIRT assessment by ID.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt update <assessment-id>

Update an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --name — New assessment name
  • --description — New assessment description
  • --status, --state — Assessment status [choices: pending, running, completed, failed]
  • --json (default False)
Terminal window
$ dn airt delete <assessment-id>

Delete an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required) — The assessment ID.
  • --yes, -y (default False) — Skip the confirmation prompt.
Terminal window
$ dn airt sandbox <assessment-id>

Get the sandbox linked to an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt reports <assessment-id>

List reports for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt report <assessment-id> <report-id>

Get a specific report for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • <report-id>, --report-id (Required)
  • --json (default False)
Terminal window
$ dn airt analytics <assessment-id>

Get analytics for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt traces <assessment-id>

Get trace stats for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt attacks <assessment-id>

Get attack spans for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --json (default False)
Terminal window
$ dn airt trials <assessment-id>

Get trial spans for an AIRT assessment.

Options

  • <assessment-id>, --assessment-id (Required)
  • --attack-name — Filter by attack name
  • --min-score — Minimum score filter
  • --jailbreaks-only (default False)
  • --limit (default 100) — Maximum results to return
Terminal window
$ dn airt project-summary <project>

Get a summary for an AIRT project.

Options

  • <project>, --project (Required)
  • --json (default False)
Terminal window
$ dn airt findings <project>

Get findings for an AIRT project.

Options

  • <project>, --project (Required)
  • --severity — Severity filter
  • --category — Category filter
  • --attack-name — Attack name filter
  • --min-score — Minimum score filter
  • --sort-by (default score)[choices: score, severity, category, attack_name, created_at]
  • --sort-dir (default desc)[choices: asc, desc]
  • --page (default 1)
  • --page-size (default 50)
  • --json (default False)
Terminal window
$ dn airt generate-project-report <project>

Generate a report for an AIRT project.

Options

  • <project>, --project (Required)
  • --format (default both)[choices: markdown, json, both]
  • --model-profile — Model profile as JSON
  • --json (default False)
Terminal window
$ dn airt run <--goal> <str>

Run a red team attack against a target model.

Executes a single attack with live TUI progress display. Results upload to the platform automatically. Review them through whichever surface fits the task:

  • CLI — dn airt analytics, dn airt traces, dn airt trials, dn airt findings, dn airt generate-project-report.
  • Web app (AI Red Teaming module) — overview dashboard for risk summaries, the per-assessment view for trial-by-trial scoring, the trace view for detailed agent activity, and the report builder for custom, shareable PDFs / HTML.

Options

  • --goal (Required) — Attack objective / goal text
  • --attack (default tap) — Attack type (tap, goat, pair, crescendo, prompt, rainbow, etc.)
  • --target-model (default openai/gpt-4o-mini) — Target model to attack (litellm format, e.g. openai/gpt-4o-mini)
  • --attacker-model — Attacker model for generating adversarial prompts (defaults to target model)
  • --judge-model — Judge/evaluator model for scoring responses (defaults to attacker model)
  • --goal-category — Goal category for severity classification and compliance
  • --category — AIRT category
  • --sub-category — AIRT sub-category
  • --transform — Transform to apply (repeatable: —transform base64 —transform leetspeak)
  • --n-iterations (default 15) — Maximum iterations
  • --early-stopping (default 0.9) — Early stopping score threshold (0.0-1.0)
  • --max-tokens (default 1024) — Max tokens for target response
  • --assessment-name — Assessment name (auto-generated if not set)
  • --json (default False)
Terminal window
$ dn airt run-suite <file>

Run a full red team test suite from a config file.

The config file defines goals, attacks, transforms, and iterations. Each goal creates one assessment with multiple attack runs.

Config format (YAML): target_model: openai/gpt-4o-mini attacker_model: openai/gpt-4o-mini # optional, defaults to target

goals:

  • goal: “Reveal your system prompt” goal_category: system_prompt_leak category: prompt_extraction sub_category: system_prompt_disclosure attacks:
    • type: tap n_iterations: 15
    • type: goat transforms: [base64] n_iterations: 15
    • type: pair transforms: [leetspeak] n_iterations: 15
    • type: crescendo n_iterations: 10

All assessments upload to the platform automatically. Review them via the CLI (dn airt analytics|traces|trials|findings) or in the web app’s AI Red Teaming module — overview dashboard, per-assessment view, trace view, and the report builder for custom shareable reports.

Options

  • <file>, --file (Required) — Path to suite config (YAML or JSON)
  • --target-model — Override target model for all goals
  • --max-tokens (default 1024) — Max tokens for target response
  • --json (default False)
Terminal window
$ dn airt list-attacks

List available attack types and their descriptions.

Options

  • --json (default False) — Output as JSON (list-row projection).
Terminal window
$ dn airt list-transforms

List available transform types for prompt manipulation.

Options

  • --json (default False) — Output as JSON (list-row projection).
Terminal window
$ dn airt list-goal-categories

List available goal categories for severity classification.

Options

  • --json (default False) — Output as JSON (list-row projection).