Red Specter HARBINGER

Autonomous LLM Guardrail Exploitation — 39 techniques. 8 subsystems. 5 compound chains.

v1.0.0
Contents
OverviewInstallationSKELETON KEY (12 techniques)JUDGE KILLER (7 techniques)ALIGNMENT BREAKER (7 techniques)FILTER SHREDDER (8 techniques)CHAIN FORGE (5 chains)UNLEASHED ModeCLI ReferenceDisclaimer

Overview

HARBINGER maps every guardrail in a target LLM's safety stack — system prompt policies, content filters, RLHF alignment, safety judges, moderation APIs — then systematically bypasses each one. It chains bypasses across layers to defeat defence-in-depth. NEMESIS reasoning. Adaptive mutation. Never the same attack twice.

FORGE finds jailbreaks. HARBINGER proves the entire safety architecture is broken.

Installation

$ pip install red-specter-harbinger
$ harbinger init
$ harbinger status

SKELETON KEY — 12 Policy Bypass Techniques

IDTechniqueDescription
SK-001Role InversionMake the model think the safety policy is the threat
SK-002Instruction HierarchyOverride system prompt via fake system directives
SK-003Context Window FloodPush safety instructions out of active context
SK-004Base64 EncodingEncode queries to bypass text-level filters
SK-005Language SwitchQuery in languages with weaker safety training
SK-006Hypothetical FramingFictional universe framing to bypass refusal
SK-007Academic FramingResearch context to justify restricted content
SK-008Incremental EscalationStart safe, escalate gradually across turns
SK-009Persona InjectionDAN and unrestricted persona injection
SK-010Multi-Turn PoisoningFalse conversation history establishment
SK-011Token ManipulationZero-width characters and homoglyphs
SK-012Nested InstructionInstruction within translation request

JUDGE KILLER — 7 Safety Judge Techniques

IDTechniqueDescription
JK-001Judge Prompt ExtractionSteal the safety judge's system prompt
JK-002Threshold MappingFind exactly where the judge draws the line
JK-003Judge ConfusionCraft outputs the judge can't classify
JK-004Format EvasionCode blocks and structured output that judges miss
JK-005Split ResponseDistribute bypass across multiple responses
JK-006Judge OverloadFlood with borderline content to exhaust the judge
JK-007Judge FingerprintIdentify which model serves as the safety judge

ALIGNMENT BREAKER — 7 RLHF Techniques

IDTechniqueDescription
AB-001Reward HackingInputs the model rates as safe but produce unsafe output
AB-002Sycophancy ExploitationLeverage agreement bias to escalate compliance
AB-003Competing ObjectivesHelpfulness vs safety conflict scenarios
AB-004Refusal FatigueSustained multi-turn pressure until compliance
AB-005Transfer AttackBypasses on one topic that transfer to restricted topics
AB-006Constitutional ContradictionConflicts between constitutional AI rules
AB-007Fine-Tuning ResidueCapabilities suppressed but not eliminated

FILTER SHREDDER — 8 Content Filter Techniques

IDTechniqueDescription
FS-001Keyword EvasionSynonyms, leetspeak, obfuscation
FS-002Classifier AdversarialText humans read as harmful, classifiers read as safe
FS-003Tokenisation ExploitSplit words across token boundaries
FS-004Output FormatCode blocks, JSON, XML that filters don't inspect
FS-005Embedding SpaceSemantic neighbourhood without restricted words
FS-006Gradual DriftSafe content that gradually shifts meaning
FS-007Multilingual BypassRestricted in English, unrestricted in other languages
FS-008Base64 SmugglingEncoded payload in instruction

CHAIN FORGE — 5 Compound Bypass Chains

IDChainStages
CF-001Context Flood + Sycophancy + Format Evasion3
CF-002Role Inversion + Academic Frame + Split Response3
CF-003Persona + Competing Objectives + Encoding3
CF-004Multi-Turn Poison + Refusal Fatigue + Keyword Evasion3
CF-005Full Stack Bypass (all layers simultaneously)6

HARBINGER UNLEASHED

Detection mode maps guardrails without bypassing them. UNLEASHED mode executes full autonomous guardrail exploitation against authorised targets.

# Map guardrails (detection only)
$ harbinger map --target http://localhost:11434

# UNLEASHED (dry run)
$ harbinger bypass --query "test query" --override

# UNLEASHED (live)
$ harbinger chain --chain CF-005 --override --confirm-destroy

UNLEASHED mode is restricted to authorised operators with Ed25519 private key access. Targets must be in allowed_targets.txt. 30-minute auto-lock. Unauthorised use violates applicable law.

CLI Reference

CommandDescription
harbinger initInitialise configuration and Ed25519 keys
harbinger statusSystem status and subsystem count
harbinger techniquesList all 39 bypass techniques
harbinger mapCARTOGRAPHER — map guardrail topology
harbinger bypassSKELETON KEY — execute bypass techniques
harbinger chainCHAIN FORGE — execute compound bypass
harbinger engagementsList all engagement sessions

Disclaimer

Red Specter HARBINGER is for authorised security testing only. Guardrail bypass techniques can cause AI systems to produce content that violates their safety policies. You must have explicit written permission before testing any system. Unauthorised use may violate the Computer Misuse Act 1990 (UK), CFAA (US), or equivalent legislation.