pip install red-specter-forge
Everyone tests the agent. Nobody tests the LLM it's built on. You're deploying models with unknown jailbreak resistance, unmeasured policy compliance, invisible drift behaviour, and boundary thresholds you've never mapped. You're flying blind at the foundation layer.
Your model has never been tested against systematic prompt injection campaigns. You don't know which injection classes it's vulnerable to — direct, indirect, token smuggling, context overflow, goal hijacking, multi-turn, or rule inversion.
DAN variants, persona hijacking, hypothetical framing, Socratic extraction — 70+ documented jailbreak techniques exist. You don't know which ones break your model because you've never systematically tested.
You set safety policies but never measured violation rates statistically. Without Wilson score confidence intervals across 1,000+ test calls, your "policy compliance" is anecdotal — not empirical.
Every model has a boundary where it transitions from compliance to refusal. You've never mapped it. You don't know your model's exact severity threshold — or what happens at the cliff edge.
Over long sessions, models drift. Cosine similarity degrades. Toxicity creeps up. Policy violations increase. Without multi-turn drift measurement, you'll never see it happening.
When the vendor pushes a model update, you have no way to know if the new version is weaker than the old one. No two-proportion z-tests. No paired t-tests. No statistical proof. Just hope.
Ten tools. Each one attacks a different surface of the LLM. Each one produces structured JSON consumed by the report builder. Each finding maps to OWASP LLM Top 10 2025. Each finding generates an AI Shield blocking rule.
| # | Tool | Command | What It Does |
|---|---|---|---|
| 01 | Inject Scan | forge inject scan | 80 payloads across 8 injection classes. Direct, indirect, token smuggling, context overflow, goal hijacking, multi-turn deception, rule inversion, multimodal. Mutation engine generates 2,000+ variants. |
| 02 | Jailbreak Scan | forge jailbreak scan | 70 payloads across 7 categories. DAN variants, persona hijack, hypothetical framing, obfuscation, multi-step chaining, Socratic extraction, temporal drift. Adaptive mutation on resistance. |
| 03 | Output Scan | forge output scan | 140 payloads forcing PII extraction, unsafe content generation, and exfiltration simulation. Regex PII detection, toxicity scoring, code exfiltration pattern analysis. |
| 04 | Policy Scan | forge policy scan | 1,000 adversarial prompts across 5 categories. Wilson score confidence intervals on violation rates. Stratified by category, toxicity, severity. Finds exact policy breakdown conditions. |
| 05 | Drift Scan | forge drift scan | 10 conversation sequences over configurable turns. Cosine similarity drift, toxicity drift, KS test for distribution changes, change-point detection. Finds when the model stops being itself. |
| 06 | Boundary Scan | forge boundary scan | 100 payloads across 5 severity levels. Adaptive binary search for the exact compliance cliff edge. Boundary score 0–100. Produces a boundary curve with statistical backing. |
| 07 | Compare Scan | forge compare scan | Identical campaigns against multiple models. Temperature locked to 0. Chi-square significance testing. Comparative security posture table. Tells you which model is weakest. |
| 08 | Regression Scan | forge regression scan | Two model versions. Two-proportion z-test on violation rates. Paired t-test on continuous scores. Cohen's h effect sizes. Tells you if the update weakened security. |
| 09 | Supply Scan | forge supply scan | 200 behavioural probes across 4 categories. Fingerprints the model. Flags if it's not what it claims — tampered, substituted, or fine-tuned. Reports confidence honestly. |
| 10 | Report Build | forge report build | Aggregates all tool outputs. OWASP LLM Top 10 2025 mapping. A–F grading. Ed25519 signed. RFC 3161 timestamped. AI Shield policy file output. JSON + HTML. |
Run every offensive tool in sequence, then build a unified signed report:
If the model resists, Forge escalates. Mutations, encoding, multi-step chains — it keeps pushing until it breaks or exhausts the library.
Wilson score CIs, KS tests, z-tests, t-tests, Cohen's h. Not vibes — mathematics. Every claim backed by statistical significance.
Every report cryptographically signed with Ed25519. RFC 3161 timestamped. SHA-256 evidence chains. Tamper-evident by design.
Every finding generates an AI Shield blocking rule. Forge findings become runtime protection. One pipeline from testing to production.
Every offensive tool ships with a 5-category mutation engine. If the base payload fails, Forge mutates it — encoding, obfuscation, semantic rewriting, structural wrapping, and evasion techniques. 150 base attack payloads become 3,750+ mutation variants. The model doesn't get to see the same payload twice.
Forge is Stage 1. Test the model before you build with it. Arsenal is Stage 2 — test the agent during development. AI Shield is Stage 3 — protect the live agent in production. Forge findings feed directly into AI Shield as runtime blocking rules. No competitor has all three.
Red Specter Forge is intended for authorised security testing only. Unauthorised use against systems you do not own or have explicit permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. Always obtain written authorisation before conducting any security assessments. Apache License 2.0.