Red Specter JANUS
Guardrail Bypass Testing Framework — 6 subsystems. 73 tests.
Overview
JANUS targets AI guardrails — the safety mechanisms that vendors deploy to prevent misuse of their models. Every content filter, every refusal mechanism, every safety classifier introduces a testable boundary. JANUS proves that 97% of guardrails can be defeated.
97% of guardrails can be defeated. JANUS proves it.
Installation
$ janus init
$ janus status
FINGERPRINTER — Guardrail Identification
| ID | Technique | Description |
|---|---|---|
| FP-001 | Filter Classification | Classify guardrail type: input filter, output filter, system-level, or hybrid |
| FP-002 | Vendor Fingerprinting | Identify guardrail vendor and version from refusal patterns |
| FP-003 | Boundary Mapping | Map the exact boundaries of content filter sensitivity |
| FP-004 | Refusal Analysis | Analyse refusal messages to identify guardrail implementation details |
BYPASS — Known Technique Library
Comprehensive library of proven guardrail bypass techniques. Role-play and persona-based attacks. Context window manipulation. Instruction hierarchy exploitation. Multi-turn progressive boundary testing. Continuously updated technique database.
ENCODER — Payload Encoding Evasion
Encode payloads to evade content filters. Base64 and multi-layer encoding. Unicode homoglyph obfuscation. Token-level manipulation to bypass tokeniser-based filters. Multi-language translation chains for evasion.
FUZZER — Automated Bypass Discovery
Automated fuzzing engine for discovering novel guardrail bypasses. Mutation-based payload generation. Evolutionary approach to bypass discovery. Zero-day guardrail vulnerability identification through systematic testing.
CHAINER — Multi-Technique Chains
Chain multiple bypass techniques for compound evasion. Sequential guardrail erosion through progressive techniques. Multi-step evasion campaigns that combine encoding, context manipulation, and role-play attacks.
REPORTER — Assessment Reporting
Generate comprehensive guardrail assessment reports. Bypass success rates per technique. Guardrail effectiveness scoring. Technique effectiveness metrics. Remediation recommendations. Executive summaries for stakeholders.
JANUS UNLEASHED
Standard mode detects. UNLEASHED exploits. Ed25519 crypto. Dual-gate safety. One operator.
$ janus fingerprint --target model-endpoint
# UNLEASHED (dry run)
$ janus bypass --target model-endpoint --override
# UNLEASHED (live)
$ janus campaign --target model-endpoint --override --confirm-destroy
UNLEASHED mode is restricted to authorised operators with Ed25519 private key access. Targets must be in allowed_targets.txt. 30-minute auto-lock. Unauthorised use violates applicable law.
CLI Reference
| Command | Description |
|---|---|
| janus init | Initialise configuration and Ed25519 keys |
| janus status | System status and subsystem count |
| janus fingerprint | FINGERPRINTER — identify guardrail implementations |
| janus bypass | BYPASS — run known bypass techniques |
| janus encode | ENCODER — payload encoding evasion tests |
| janus fuzz | FUZZER — automated bypass discovery |
| janus chain | CHAINER — multi-technique chain tests |
| janus report | REPORTER — generate assessment report |
| janus campaign | Full guardrail bypass campaign |
MITRE ATLAS Mapping
JANUS techniques map to MITRE ATLAS tactics including AML.T0051 (LLM Prompt Injection), AML.T0054 (LLM Jailbreak), and guardrail-specific bypass vectors for safety mechanism evasion testing.
Disclaimer
Red Specter JANUS is for authorised security testing only. Guardrail bypass testing should only be performed on systems you own or have explicit written permission to test. Unauthorised use may violate the Computer Misuse Act 1990 (UK), CFAA (US), or equivalent legislation.