Every AI safety vendor sells guardrails. None of them test whether those guardrails actually work under sustained, intelligent, adaptive attack. HARBINGER does. The answer is always no.
HARBINGER doesn't find one jailbreak and call it a day. It maps the entire safety stack, classifies every defence mechanism, generates bypass payloads for each one, then chains them together to defeat defence-in-depth. Autonomously.
Probes 10 categories. Maps every safety layer. Identifies system prompt policies, content filters, safety judges, refusal training, RLHF alignment.
Role inversion. Instruction hierarchy manipulation. Context window flooding. Encoding bypass. Language switching. Persona injection. Incremental escalation. Token manipulation.
Attacks the safety judge, not the generator. Prompt extraction. Threshold mapping. Format evasion. Split response. Judge model fingerprinting.
RLHF exploitation. Reward hacking. Sycophancy exploitation. Competing objectives. Refusal fatigue. Constitutional contradiction. Fine-tuning residue.
Content filter bypass. Keyword evasion. Classifier adversarial inputs. Tokenisation exploits. Output format manipulation. Multilingual bypass. Embedding space attacks.
Chains bypasses from all subsystems into multi-stage attacks that defeat defence-in-depth. Full Stack Bypass: 6 stages, every layer defeated simultaneously.
Semantic, structural, encoding, language, context. Every payload mutated before delivery. Pattern-matching defences never see the same attack twice.
Baseline capture before any engagement. Refusal rate verification. Guardrail topology fingerprint. Signed restoration certificate.
Every bypass is Ed25519 signed, scope-locked, and auto-locks after 30 minutes. Authorised penetration testing only.
CARTOGRAPHER maps guardrails. No bypass attempts. Reports vulnerabilities without exploiting.
Plans full bypass chains. Shows exactly what would work. Ed25519 required. No execution.
Full autonomous guardrail exploitation. Every technique deployed. Every chain tested. RESTRICTED report.
THIS TOOL IS FOR AUTHORISED SECURITY TESTING ONLY. EVERY EXECUTION IS SIGNED AND LOGGED.
39 bypass techniques. 8 subsystems. 5 compound chains. NEMESIS reasoning. Adaptive mutation. The tool that makes every AI safety vendor rethink their product.