Red Specter HARBINGER

Autonomous LLM Guardrail Exploitation — 39 techniques. 8 subsystems. 5 compound chains.

v1.0.0

Contents

Overview Installation SKELETON KEY (12 techniques)JUDGE KILLER (7 techniques)ALIGNMENT BREAKER (7 techniques)FILTER SHREDDER (8 techniques)CHAIN FORGE (5 chains)UNLEASHED Mode CLI Reference Disclaimer

Overview

HARBINGER maps every guardrail in a target LLM's safety stack — system prompt policies, content filters, RLHF alignment, safety judges, moderation APIs — then systematically bypasses each one. It chains bypasses across layers to defeat defence-in-depth. NEMESIS reasoning. Adaptive mutation. Never the same attack twice.

FORGE finds jailbreaks. HARBINGER proves the entire safety architecture is broken.

Installation

$ pip install red-specter-harbinger
$ harbinger init
$ harbinger status

SKELETON KEY — 12 Policy Bypass Techniques

ID	Technique	Description
SK-001	Role Inversion	Make the model think the safety policy is the threat
SK-002	Instruction Hierarchy	Override system prompt via fake system directives
SK-003	Context Window Flood	Push safety instructions out of active context
SK-004	Base64 Encoding	Encode queries to bypass text-level filters
SK-005	Language Switch	Query in languages with weaker safety training
SK-006	Hypothetical Framing	Fictional universe framing to bypass refusal
SK-007	Academic Framing	Research context to justify restricted content
SK-008	Incremental Escalation	Start safe, escalate gradually across turns
SK-009	Persona Injection	DAN and unrestricted persona injection
SK-010	Multi-Turn Poisoning	False conversation history establishment
SK-011	Token Manipulation	Zero-width characters and homoglyphs
SK-012	Nested Instruction	Instruction within translation request

JUDGE KILLER — 7 Safety Judge Techniques

ID	Technique	Description
JK-001	Judge Prompt Extraction	Steal the safety judge's system prompt
JK-002	Threshold Mapping	Find exactly where the judge draws the line
JK-003	Judge Confusion	Craft outputs the judge can't classify
JK-004	Format Evasion	Code blocks and structured output that judges miss
JK-005	Split Response	Distribute bypass across multiple responses
JK-006	Judge Overload	Flood with borderline content to exhaust the judge
JK-007	Judge Fingerprint	Identify which model serves as the safety judge

ALIGNMENT BREAKER — 7 RLHF Techniques

ID	Technique	Description
AB-001	Reward Hacking	Inputs the model rates as safe but produce unsafe output
AB-002	Sycophancy Exploitation	Leverage agreement bias to escalate compliance
AB-003	Competing Objectives	Helpfulness vs safety conflict scenarios
AB-004	Refusal Fatigue	Sustained multi-turn pressure until compliance
AB-005	Transfer Attack	Bypasses on one topic that transfer to restricted topics
AB-006	Constitutional Contradiction	Conflicts between constitutional AI rules
AB-007	Fine-Tuning Residue	Capabilities suppressed but not eliminated

FILTER SHREDDER — 8 Content Filter Techniques

ID	Technique	Description
FS-001	Keyword Evasion	Synonyms, leetspeak, obfuscation
FS-002	Classifier Adversarial	Text humans read as harmful, classifiers read as safe
FS-003	Tokenisation Exploit	Split words across token boundaries
FS-004	Output Format	Code blocks, JSON, XML that filters don't inspect
FS-005	Embedding Space	Semantic neighbourhood without restricted words
FS-006	Gradual Drift	Safe content that gradually shifts meaning
FS-007	Multilingual Bypass	Restricted in English, unrestricted in other languages
FS-008	Base64 Smuggling	Encoded payload in instruction

CHAIN FORGE — 5 Compound Bypass Chains

ID	Chain	Stages
CF-001	Context Flood + Sycophancy + Format Evasion	3
CF-002	Role Inversion + Academic Frame + Split Response	3
CF-003	Persona + Competing Objectives + Encoding	3
CF-004	Multi-Turn Poison + Refusal Fatigue + Keyword Evasion	3
CF-005	Full Stack Bypass (all layers simultaneously)	6

HARBINGER UNLEASHED

Detection mode maps guardrails without bypassing them. UNLEASHED mode executes full autonomous guardrail exploitation against authorised targets.

# Map guardrails (detection only)
$ harbinger map --target http://localhost:11434

# UNLEASHED (dry run)
$ harbinger bypass --query "test query" --override

# UNLEASHED (live)
$ harbinger chain --chain CF-005 --override --confirm-destroy

UNLEASHED mode is restricted to authorised operators with Ed25519 private key access. Targets must be in allowed_targets.txt. 30-minute auto-lock. Unauthorised use violates applicable law.

CLI Reference

Command	Description
harbinger init	Initialise configuration and Ed25519 keys
harbinger status	System status and subsystem count
harbinger techniques	List all 39 bypass techniques
harbinger map	CARTOGRAPHER — map guardrail topology
harbinger bypass	SKELETON KEY — execute bypass techniques
harbinger chain	CHAIN FORGE — execute compound bypass
harbinger engagements	List all engagement sessions

Disclaimer

Red Specter HARBINGER is for authorised security testing only. Guardrail bypass techniques can cause AI systems to produce content that violates their safety policies. You must have explicit written permission before testing any system. Unauthorised use may violate the Computer Misuse Act 1990 (UK), CFAA (US), or equivalent legislation.