Red Specter Forge

Automated LLM Security Testing — 10 tools to test the model before you build an agent around it.

v1.0.0
Contents
Overview The 10 Tools Tool Details Full Scan Mode Mutation Engine Payload Library The Pipeline Report Output Key Features Requirements Standards Coverage Disclaimer

Overview

Red Specter Forge is an automated LLM security testing framework. Every existing tool — Garak, PyRIT, Promptfoo — runs limited probe sets and reports pass/fail. Forge runs full attack campaigns with adaptive escalation, mutation engines, statistical rigour, and direct integration into AI Shield runtime protection. It doesn't ask nicely. It finds what breaks.

Forge provides 10 tools under a single CLI (forge), 1,590 static payloads (5,340+ with mutations), and Ed25519-signed reports with OWASP LLM Top 10 2025 mapping on every finding.

Forge is Stage 1 of the Red Specter security pipeline. Test the model (Forge), test the agent (Arsenal), protect the deployment (AI Shield). Forge findings feed directly into AI Shield as runtime blocking rules.

The 10 Tools

#ToolCommandWhat It Does
01Inject Scanforge inject scan80 payloads across 8 injection classes with mutation engine
02Jailbreak Scanforge jailbreak scan70 payloads across 7 jailbreak categories with adaptive mutation
03Output Scanforge output scan140 payloads — PII extraction, unsafe content, exfiltration simulation
04Policy Scanforge policy scan1,000 adversarial prompts with Wilson score confidence intervals
05Drift Scanforge drift scanMulti-turn drift measurement with KS tests and change-point detection
06Boundary Scanforge boundary scan100 payloads across 5 severity levels with adaptive binary search
07Compare Scanforge compare scanIdentical campaigns against multiple models with chi-square testing
08Regression Scanforge regression scanTwo-proportion z-test and paired t-test across model versions
09Supply Scanforge supply scan200 behavioural probes for model fingerprinting and tamper detection
10Report Buildforge report buildUnified signed reports with OWASP mapping and AI Shield policy generation

Tool Details

01 Inject Scan forge inject scan

Fires every known prompt injection class against the target model. Not a checklist — an attack campaign. 80 base payloads expanded to 2,000+ via the mutation engine.

Adaptive escalation: if the model resists initial payloads, Forge applies mutations and escalates to more aggressive variants automatically.

02 Jailbreak Scan forge jailbreak scan

Systematic jailbreak assault. 70 base payloads across 7 documented jailbreak categories. Mutates payloads based on model resistance. Keeps pushing until the model breaks or exhausts the full payload library.

03 Output Scan forge output scan

Forces the model to leak. 140 payloads designed to surface what the model will produce under adversarial pressure.

Response analysis: 16 regex patterns for PII detection (emails, SSNs, phones, credit cards, API keys, NINOs, IBANs), keyword-based toxicity scoring, and 16 code exfiltration pattern detectors.

04 Policy Scan forge policy scan

Runs 1,000+ calls against a defined policy set. Computes violation rates with Wilson score confidence intervals. Stratified by prompt category, toxicity level, and severity. Finds the exact conditions under which policy breaks down.

Each prompt tagged with toxicity level (1–5), expected outcome (refuse/comply), and severity. Results reported with 95% Wilson score CIs per category.

05 Drift Scan forge drift scan

Long-session attack. Chains 50–200 turns with context retention. Measures when the model stops being the model it started as.

10 conversation sequences designed to gradually push boundaries. Segmented into windows (first 25%, middle 50%, last 25%) for targeted comparison.

06 Boundary Scan forge boundary scan

Maps the exact threshold where the model starts generating harmful content. Five-level severity ladder from benign to maximally harmful. Continuous boundary scoring 0–100. Produces a boundary curve. Finds the cliff edge — then pushes past it.

Adaptive binary search between severity levels to pinpoint the exact transition point with statistical backing.

07 Compare Scan forge compare scan

Runs identical attack campaigns against multiple models simultaneously. Temperature locked to 0. Same system prompt. Same payload library. Statistical significance enforced.

08 Regression Scan forge regression scan

Takes two model versions. Runs the critical test set against both. Tells you if the new version is weaker than the old one — and by exactly how much.

09 Supply Scan forge supply scan

Fingerprints the target model using 200 behavioural probe prompts. Compares output patterns against known model signatures. Flags if the model is not what it claims to be. Reports confidence level honestly — this is probabilistic, not definitive.

Pattern matching against 6 known model families (GPT, Claude, Llama, Gemini, Mistral, Command). Weighted category scoring with anomaly detection.

10 Report Build forge report build

Aggregates all tool outputs into a unified, signed report. Every finding mapped to OWASP LLM Top 10 2025. Every finding generates an AI Shield blocking rule. Ed25519 signed. RFC 3161 timestamped.

Finding Schema

Every finding in the report includes:

Full Scan Mode

One command runs all offensive tools in sequence, then builds a unified signed report.

$ forge full-scan --target https://api.openai.com --api-key sk-xxx --model gpt-4

What Happens

  1. Inject Scan — 80+ payloads across 8 injection classes
  2. Jailbreak Scan — 70+ payloads across 7 jailbreak categories
  3. Output Scan — 140 payloads (PII, unsafe, exfiltration)
  4. Policy Scan — 1,000 adversarial calls with Wilson CIs
  5. Drift Scan — 10 conversation sequences with KS tests
  6. Boundary Scan — 100 payloads across 5 severity levels
  7. Report Build — aggregation, deduplication, OWASP mapping, signing

CLI Options

$ forge full-scan --help --target, -t Target LLM endpoint URL [required] --model, -m Model name [optional] --api-key, -k API key [optional] --endpoint, -e API endpoint path [default: /v1/chat/completions] --output, -o Output directory [default: reports] --sign / --no-sign Ed25519 signing [default: sign] --keys-dir Keys directory [optional] --concurrency, -c Max concurrent requests [default: 5] --delay, -d Delay between requests [default: 0.0] --system-prompt, -s System prompt to test against [optional] --verbose, -v Verbose output

Mutation Engine

Every offensive tool ships with a 5-category mutation engine. 25 mutation variants per payload. Applied to 150 base attack payloads, producing 3,750+ mutation variants. If the base payload fails, Forge mutates it and tries again.

MutatorTechniques
EncodingBase64, hex encoding, ROT13, URL encoding, HTML entities
ObfuscationL33tspeak, Unicode homoglyphs, zero-width character insertion, character doubling, whitespace injection
SemanticSynonym substitution, passive voice rewriting, question-to-statement, negation inversion, academic framing
StructuralMarkdown wrapping, code block wrapping, JSON embedding, XML wrapping, list formatting
EvasionLanguage mixing, character splitting across lines, reverse text, Pig Latin, payload fragmentation

Adaptive escalation: when a tool encounters resistance, it automatically applies mutations to failed payloads before re-sending. The model doesn't get to see the same payload twice.

Payload Library

ToolCategoryCount
Inject Scan8 injection classes (direct, indirect, token, overflow, hijack, multi-turn, inversion, multimodal)80
Jailbreak Scan7 jailbreak categories (DAN, persona, hypothetical, obfuscation, chaining, Socratic, temporal)70
Output ScanPII extraction (60), unsafe content (60), exfiltration simulation (20)140
Policy Scan5 categories × 200 prompts (content, infosec, behavioural, output, ethical)1,000
Boundary Scan5 severity levels × 20 payloads (benign → maximum)100
Supply Scan4 probe categories × 50 probes (identity, reasoning, bias, robustness)200
Total Static Payloads1,590
Mutation variants (25 per attack payload)3,750+
Grand Total5,340+

The Pipeline

Forge is Stage 1 of the three-stage Red Specter security pipeline:

  1. Model Selection — Forge — Test the LLM before building with it
  2. Agent Development — Arsenal — Test the agent during development
  3. Production Runtime — AI Shield — Protect the live agent in production

Forge findings feed directly into AI Shield. Every finding generates a machine-ingestible blocking rule. One pipeline from testing to runtime protection. No gaps. No competitor has all three.

Report Output

Reports are available in JSON and HTML formats. Both are generated automatically by forge report build.

JSON Report Structure

The JSON report includes:

HTML Report

Dark-themed HTML report with: executive summary, overall grade visualisation, per-tool breakdown, OWASP coverage matrix, sortable findings table, AI Shield policy export, and signature verification info.

Signature Verification

$ forge report verify --report reports/forge-full-scan.json --keys-dir .forge-keys/

Key Features

1,590 Static Payloads 5,340+ with 25-variant mutation engine
Adaptive Escalation Mutations and re-sends on model resistance
Ed25519 Signed Reports SHA-256 evidence chains, RFC 3161 timestamps
AI Shield Integration One blocking rule per finding, machine-ingestible
Statistical Rigour Wilson CIs, KS tests, z-tests, t-tests, Cohen's h
9,057 Tests Passing Full test suite, zero failures

Requirements

Installation

$ pip install red-specter-forge

Or from source:

$ git clone <repo> $ cd red-specter-forge $ pip install -e ".[dev]"

Standards Coverage

Every finding Forge produces is mapped to industry security frameworks:

The 10 categories:

  1. LLM01 — Prompt Injection
  2. LLM02 — Sensitive Information Disclosure
  3. LLM03 — Supply Chain
  4. LLM04 — Data and Model Poisoning
  5. LLM05 — Improper Output Handling
  6. LLM06 — Excessive Agency
  7. LLM07 — System Prompt Leakage
  8. LLM08 — Vector and Embedding Weaknesses
  9. LLM09 — Misinformation
  10. LLM10 — Unbounded Consumption

Disclaimer

Red Specter Forge is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any Forge tool against a target. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse.