Open source · Free forever

Find what your AI is hiding.

Point auto-redteam at any model, agent, or AI workflow. Get a behavioral benchmark across 19 attack categories in minutes. No security expertise required.

View on GitHub How it works

pip install glacis-autoredteam

terminal

# Point at any AI system. Get behavioral benchmarks.

$ pip install glacis-autoredteam

$ auto-redteam scan --target https://api.example.com/v1/chat

Scanning model endpoint...

✓ Toxicity probe — passed (0.02 / 0.15 threshold)

✓ Hallucination check — passed (0.04 / 0.10 threshold)

▵ PII leak probe — warning (0.08 / 0.05 threshold)

✓ Prompt injection — passed (0.01 / 0.10 threshold)

✓ Jailbreak resistance — passed (0.03 / 0.15 threshold)

4/5 passed · 1 warning · Score: 87/100

Report: ./report.html

Capabilities

Why auto-redteam

The only open-source red-teaming tool that attacks, hardens, and proves improvement in a single loop.

19 Attack Categories

Prompt injection, jailbreak, PII extraction, system prompt leakage, hallucination exploits, tool misuse, encoding bypass, and 12 more.

Autonomous Hardening

Discovers vulnerabilities, clusters root causes, generates countermeasures, and verifies they work. Loops until governance score hits target.

Cryptographic Evidence

Every attack, score, and hardening decision is SHA-256 hash-chained. Tamper-evident, locally verifiable, no data egress.

Multi-Provider Targets

OpenAI, Anthropic, Google, Azure, AWS Bedrock, Cloudflare Workers, and any OpenAI-compatible endpoint. One tool, every model.

Immune System Loop

Collects bypass examples as training data. Retrain your judge and defender on what broke them. The system learns from its own failures.

Governance Scoring

Findings map to a 0–1000 governance score with named tiers: Insurability Line, Regulatory Floor, Enterprise Gate, Best-in-Class.

Coverage

Attack Surface

Every probe is scored, hash-chained, and mapped to a governance dimension.

Prompt Injection Jailbreak System Prompt Leakage PII Extraction Role Confusion Tool Misuse Hallucination Exploit Ethical Bypass Multi-Turn Manipulation Authority Manipulation Encoding Bypass Payload Splitting Social Engineering Indirect Injection Refusal Suppression Context Window Poisoning Continuation Attack Multilingual Attack Output Formatting Exploit

Process

How It Works

Four stages, fully autonomous, cryptographically attested.

01 ATTACK

Probe

Generate adversarial attacks across 19 categories with multi-turn trajectories and mutation for diversity.

02 SCORE

Evaluate

Deterministic pipeline plus optional SLM judge. Four-component scoring: breadth, depth, novelty, reliability.

03 HARDEN

Fix

Cluster vulnerabilities by root cause. Generate countermeasures. Apply and verify with before/after ASR delta.

04 PROVE

Attest

Every finding is hash-chained into a tamper-evident attestation record. Your compliance artifact builds itself.