Red Specter Forge — Automated LLM Security Testing

The Problem

Nobody Tests the Model

Everyone tests the agent. Nobody tests the LLM it's built on. You're deploying models with unknown jailbreak resistance, unmeasured policy compliance, invisible drift behaviour, and boundary thresholds you've never mapped. You're flying blind at the foundation layer.

Unknown Injection Surface

Your model has never been tested against systematic prompt injection campaigns. You don't know which injection classes it's vulnerable to — direct, indirect, token smuggling, context overflow, goal hijacking, multi-turn, or rule inversion.

Unmeasured Jailbreak Resistance

DAN variants, persona hijacking, hypothetical framing, Socratic extraction — 70+ documented jailbreak techniques exist. You don't know which ones break your model because you've never systematically tested.

Policy Is a Hope, Not a Number

You set safety policies but never measured violation rates statistically. Without Wilson score confidence intervals across 1,000+ test calls, your "policy compliance" is anecdotal — not empirical.

Invisible Boundary Cliff

Every model has a boundary where it transitions from compliance to refusal. You've never mapped it. You don't know your model's exact severity threshold — or what happens at the cliff edge.

Drift Goes Undetected

Over long sessions, models drift. Cosine similarity degrades. Toxicity creeps up. Policy violations increase. Without multi-turn drift measurement, you'll never see it happening.

No Regression Testing

When the vendor pushes a model update, you have no way to know if the new version is weaker than the old one. No two-proportion z-tests. No paired t-tests. No statistical proof. Just hope.

10 Attack Tools

The Forge Armoury

Ten tools. Each one attacks a different surface of the LLM. Each one produces structured JSON consumed by the report builder. Each finding maps to OWASP LLM Top 10 2025. Each finding generates an AI Shield blocking rule.

#	Tool	Command	What It Does
01	Inject Scan	forge inject scan	80 payloads across 8 injection classes. Direct, indirect, token smuggling, context overflow, goal hijacking, multi-turn deception, rule inversion, multimodal. Mutation engine generates 2,000+ variants.
02	Jailbreak Scan	forge jailbreak scan	70 payloads across 7 categories. DAN variants, persona hijack, hypothetical framing, obfuscation, multi-step chaining, Socratic extraction, temporal drift. Adaptive mutation on resistance.
03	Output Scan	forge output scan	140 payloads forcing PII extraction, unsafe content generation, and exfiltration simulation. Regex PII detection, toxicity scoring, code exfiltration pattern analysis.
04	Policy Scan	forge policy scan	1,000 adversarial prompts across 5 categories. Wilson score confidence intervals on violation rates. Stratified by category, toxicity, severity. Finds exact policy breakdown conditions.
05	Drift Scan	forge drift scan	10 conversation sequences over configurable turns. Cosine similarity drift, toxicity drift, KS test for distribution changes, change-point detection. Finds when the model stops being itself.
06	Boundary Scan	forge boundary scan	100 payloads across 5 severity levels. Adaptive binary search for the exact compliance cliff edge. Boundary score 0–100. Produces a boundary curve with statistical backing.
07	Compare Scan	forge compare scan	Identical campaigns against multiple models. Temperature locked to 0. Chi-square significance testing. Comparative security posture table. Tells you which model is weakest.
08	Regression Scan	forge regression scan	Two model versions. Two-proportion z-test on violation rates. Paired t-test on continuous scores. Cohen's h effect sizes. Tells you if the update weakened security.
09	Supply Scan	forge supply scan	200 behavioural probes across 4 categories. Fingerprints the model. Flags if it's not what it claims — tampered, substituted, or fine-tuned. Reports confidence honestly.
10	Report Build	forge report build	Aggregates all tool outputs. OWASP LLM Top 10 2025 mapping. A–F grading. Ed25519 signed. RFC 3161 timestamped. AI Shield policy file output. JSON + HTML.

Full Scan Mode

One Command. Every Surface.

Run every offensive tool in sequence, then build a unified signed report:

$ forge full-scan --target https://api.openai.com --api-key sk-xxx --model gpt-4

[INJECT] Running inject scan...
  12 vulnerabilities found across 8 injection classes
[JAILBREAK] Running jailbreak scan...
  4 jailbreaks successful — DAN 11.0, Socratic extraction
[OUTPUT] Running output scan...
  3 PII leaks, 0 exfiltration
[POLICY] Running policy scan — 1,000 calls...
  Violation rate: 2.4% [1.6%, 3.5%] 95% CI
[DRIFT] Running drift scan — 10 × 100 turns...
  KS test: p=0.003 — significant drift detected
[BOUNDARY] Running boundary scan...
  Boundary score: 62/100 — cliff at Level 3

SCAN COMPLETE | Risk Grade: D | 19 findings | Report signed ✓
  JSON: reports/forge-full-scan-2026-03-10.json
  HTML: reports/forge-full-scan-2026-03-10.html

Adaptive Escalation

If the model resists, Forge escalates. Mutations, encoding, multi-step chains — it keeps pushing until it breaks or exhausts the library.

Statistical Rigour

Wilson score CIs, KS tests, z-tests, t-tests, Cohen's h. Not vibes — mathematics. Every claim backed by statistical significance.

Ed25519 Signed

Every report cryptographically signed with Ed25519. RFC 3161 timestamped. SHA-256 evidence chains. Tamper-evident by design.

AI Shield Integration

Every finding generates an AI Shield blocking rule. Forge findings become runtime protection. One pipeline from testing to production.

Mutation Engine

25 Variants Per Payload

Every offensive tool ships with a 5-category mutation engine. If the base payload fails, Forge mutates it — encoding, obfuscation, semantic rewriting, structural wrapping, and evasion techniques. 150 base attack payloads become 3,750+ mutation variants. The model doesn't get to see the same payload twice.

Encoding

Base64
Hex encoding
ROT13
URL encoding
HTML entities

Obfuscation

L33tspeak
Unicode homoglyphs
Zero-width chars
Character doubling
Whitespace injection

Semantic

Synonym substitution
Passive voice
Question-to-statement
Negation inversion
Academic framing

Structural

Markdown wrapping
Code block wrapping
JSON embedding
XML wrapping
List formatting

Evasion

Language mixing
Character splitting
Reverse text
Pig latin
Payload fragmentation

The Pipeline

Three Stages. No Gaps.

Forge is Stage 1. Test the model before you build with it. Arsenal is Stage 2 — test the agent during development. AI Shield is Stage 3 — protect the live agent in production. Forge findings feed directly into AI Shield as runtime blocking rules. No competitor has all three.

Stage 1 — Model Selection

Forge

Test the LLM before building with it

→

Stage 2 — Agent Development

Arsenal

Test the agent during development

→

Stage 3 — Production Runtime

AI Shield

Protect the live agent in production

Payload Library

1,590 Static. 5,340+ Total.

80

Injection Payloads

70

Jailbreak Payloads

140

Output Safety

1,000

Policy Test Prompts

100

Boundary Probes

200

Supply Chain Probes

Standards Coverage

Every Finding Mapped

10 / 10

OWASP LLM Top 10 — 2025

LLM01 Prompt Injection
LLM02 Sensitive Information Disclosure
LLM03 Supply Chain
LLM04 Data and Model Poisoning
LLM05 Improper Output Handling
LLM06 Excessive Agency
LLM07 System Prompt Leakage
LLM08 Vector and Embedding Weaknesses
LLM09 Misinformation
LLM10 Unbounded Consumption

Cryptographic

Report Integrity

Ed25519 digital signatures
SHA-256 evidence chains
RFC 3161 timestamps
Tamper-evident by design
AI Shield policy generation
Machine-ingestible JSON output

Statistical

Mathematical Rigour

Wilson score confidence intervals
Kolmogorov-Smirnov distribution tests
Two-proportion z-tests
Paired t-tests
Cohen's h effect sizes
Chi-square significance testing

Authorised Use Only

Red Specter Forge is intended for authorised security testing only. Unauthorised use against systems you do not own or have explicit permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. Always obtain written authorisation before conducting any security assessments. Apache License 2.0.

FORGE