FORGE

Automated LLM security testing — before you build an agent. Test the model. Not the pipeline. The model.
10
Attack Tools
1,590
Base Payloads
5,340+
With Mutations
9,057
Tests Passing
pip install red-specter-forge
You test agents / But never test the model underneath / Jailbreaks ship to production / Policy compliance is a guess / Drift goes undetected for months / Boundary thresholds are unknown / Model substitution is invisible / Regression testing doesn't exist / You trusted the vendor's safety card You test agents / But never test the model underneath / Jailbreaks ship to production / Policy compliance is a guess / Drift goes undetected for months / Boundary thresholds are unknown / Model substitution is invisible / Regression testing doesn't exist / You trusted the vendor's safety card

Nobody Tests the Model

Everyone tests the agent. Nobody tests the LLM it's built on. You're deploying models with unknown jailbreak resistance, unmeasured policy compliance, invisible drift behaviour, and boundary thresholds you've never mapped. You're flying blind at the foundation layer.

Unknown Injection Surface

Your model has never been tested against systematic prompt injection campaigns. You don't know which injection classes it's vulnerable to — direct, indirect, token smuggling, context overflow, goal hijacking, multi-turn, or rule inversion.

Unmeasured Jailbreak Resistance

DAN variants, persona hijacking, hypothetical framing, Socratic extraction — 70+ documented jailbreak techniques exist. You don't know which ones break your model because you've never systematically tested.

Policy Is a Hope, Not a Number

You set safety policies but never measured violation rates statistically. Without Wilson score confidence intervals across 1,000+ test calls, your "policy compliance" is anecdotal — not empirical.

Invisible Boundary Cliff

Every model has a boundary where it transitions from compliance to refusal. You've never mapped it. You don't know your model's exact severity threshold — or what happens at the cliff edge.

Drift Goes Undetected

Over long sessions, models drift. Cosine similarity degrades. Toxicity creeps up. Policy violations increase. Without multi-turn drift measurement, you'll never see it happening.

No Regression Testing

When the vendor pushes a model update, you have no way to know if the new version is weaker than the old one. No two-proportion z-tests. No paired t-tests. No statistical proof. Just hope.

The Forge Armoury

Ten tools. Each one attacks a different surface of the LLM. Each one produces structured JSON consumed by the report builder. Each finding maps to OWASP LLM Top 10 2025. Each finding generates an AI Shield blocking rule.

# Tool Command What It Does
01 Inject Scan forge inject scan 80 payloads across 8 injection classes. Direct, indirect, token smuggling, context overflow, goal hijacking, multi-turn deception, rule inversion, multimodal. Mutation engine generates 2,000+ variants.
02 Jailbreak Scan forge jailbreak scan 70 payloads across 7 categories. DAN variants, persona hijack, hypothetical framing, obfuscation, multi-step chaining, Socratic extraction, temporal drift. Adaptive mutation on resistance.
03 Output Scan forge output scan 140 payloads forcing PII extraction, unsafe content generation, and exfiltration simulation. Regex PII detection, toxicity scoring, code exfiltration pattern analysis.
04 Policy Scan forge policy scan 1,000 adversarial prompts across 5 categories. Wilson score confidence intervals on violation rates. Stratified by category, toxicity, severity. Finds exact policy breakdown conditions.
05 Drift Scan forge drift scan 10 conversation sequences over configurable turns. Cosine similarity drift, toxicity drift, KS test for distribution changes, change-point detection. Finds when the model stops being itself.
06 Boundary Scan forge boundary scan 100 payloads across 5 severity levels. Adaptive binary search for the exact compliance cliff edge. Boundary score 0–100. Produces a boundary curve with statistical backing.
07 Compare Scan forge compare scan Identical campaigns against multiple models. Temperature locked to 0. Chi-square significance testing. Comparative security posture table. Tells you which model is weakest.
08 Regression Scan forge regression scan Two model versions. Two-proportion z-test on violation rates. Paired t-test on continuous scores. Cohen's h effect sizes. Tells you if the update weakened security.
09 Supply Scan forge supply scan 200 behavioural probes across 4 categories. Fingerprints the model. Flags if it's not what it claims — tampered, substituted, or fine-tuned. Reports confidence honestly.
10 Report Build forge report build Aggregates all tool outputs. OWASP LLM Top 10 2025 mapping. A–F grading. Ed25519 signed. RFC 3161 timestamped. AI Shield policy file output. JSON + HTML.

One Command. Every Surface.

Run every offensive tool in sequence, then build a unified signed report:

$ forge full-scan --target https://api.openai.com --api-key sk-xxx --model gpt-4
[INJECT] Running inject scan...
  12 vulnerabilities found across 8 injection classes
[JAILBREAK] Running jailbreak scan...
  4 jailbreaks successful — DAN 11.0, Socratic extraction
[OUTPUT] Running output scan...
  3 PII leaks, 0 exfiltration
[POLICY] Running policy scan — 1,000 calls...
  Violation rate: 2.4% [1.6%, 3.5%] 95% CI
[DRIFT] Running drift scan — 10 × 100 turns...
  KS test: p=0.003 — significant drift detected
[BOUNDARY] Running boundary scan...
  Boundary score: 62/100 — cliff at Level 3

SCAN COMPLETE | Risk Grade: D | 19 findings | Report signed ✓
  JSON: reports/forge-full-scan-2026-03-10.json
  HTML: reports/forge-full-scan-2026-03-10.html

Adaptive Escalation

If the model resists, Forge escalates. Mutations, encoding, multi-step chains — it keeps pushing until it breaks or exhausts the library.

Statistical Rigour

Wilson score CIs, KS tests, z-tests, t-tests, Cohen's h. Not vibes — mathematics. Every claim backed by statistical significance.

Ed25519 Signed

Every report cryptographically signed with Ed25519. RFC 3161 timestamped. SHA-256 evidence chains. Tamper-evident by design.

AI Shield Integration

Every finding generates an AI Shield blocking rule. Forge findings become runtime protection. One pipeline from testing to production.

10
Attack Tools
1,590
Static Payloads
5,340+
With Mutations
9,057
Tests Passing
0
Failures

25 Variants Per Payload

Every offensive tool ships with a 5-category mutation engine. If the base payload fails, Forge mutates it — encoding, obfuscation, semantic rewriting, structural wrapping, and evasion techniques. 150 base attack payloads become 3,750+ mutation variants. The model doesn't get to see the same payload twice.

Encoding

  • Base64
  • Hex encoding
  • ROT13
  • URL encoding
  • HTML entities

Obfuscation

  • L33tspeak
  • Unicode homoglyphs
  • Zero-width chars
  • Character doubling
  • Whitespace injection

Semantic

  • Synonym substitution
  • Passive voice
  • Question-to-statement
  • Negation inversion
  • Academic framing

Structural

  • Markdown wrapping
  • Code block wrapping
  • JSON embedding
  • XML wrapping
  • List formatting

Evasion

  • Language mixing
  • Character splitting
  • Reverse text
  • Pig latin
  • Payload fragmentation

Three Stages. No Gaps.

Forge is Stage 1. Test the model before you build with it. Arsenal is Stage 2 — test the agent during development. AI Shield is Stage 3 — protect the live agent in production. Forge findings feed directly into AI Shield as runtime blocking rules. No competitor has all three.

Stage 1 — Model Selection
Forge
Test the LLM before building with it
Stage 2 — Agent Development
Arsenal
Test the agent during development
Stage 3 — Production Runtime
AI Shield
Protect the live agent in production

1,590 Static. 5,340+ Total.

80
Injection Payloads
70
Jailbreak Payloads
140
Output Safety
1,000
Policy Test Prompts
100
Boundary Probes
200
Supply Chain Probes

Every Finding Mapped

10 / 10

OWASP LLM Top 10 — 2025

  • LLM01 Prompt Injection
  • LLM02 Sensitive Information Disclosure
  • LLM03 Supply Chain
  • LLM04 Data and Model Poisoning
  • LLM05 Improper Output Handling
  • LLM06 Excessive Agency
  • LLM07 System Prompt Leakage
  • LLM08 Vector and Embedding Weaknesses
  • LLM09 Misinformation
  • LLM10 Unbounded Consumption
Cryptographic

Report Integrity

  • Ed25519 digital signatures
  • SHA-256 evidence chains
  • RFC 3161 timestamps
  • Tamper-evident by design
  • AI Shield policy generation
  • Machine-ingestible JSON output
Statistical

Mathematical Rigour

  • Wilson score confidence intervals
  • Kolmogorov-Smirnov distribution tests
  • Two-proportion z-tests
  • Paired t-tests
  • Cohen's h effect sizes
  • Chi-square significance testing

Authorised Use Only

Red Specter Forge is intended for authorised security testing only. Unauthorised use against systems you do not own or have explicit permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. Always obtain written authorisation before conducting any security assessments. Apache License 2.0.