T136 — L34 CHAIN-OF-THOUGHT REASONING EXPLOITATION

Red Specter SPECTER COGBURN

The model was thinking. We redirected where it landed.

Chain-of-Thought Reasoning Exploitation Engine. H-CoT hijack, PAIR/TAP autonomous jailbreaking, BadThink compute exhaustion, CoT backdoor Unicode triggers, Thought Purity evasion. Targets DeepSeek R1, Gemini 2.5, o3, QwQ.

264
Tests
97.14%
H-CoT ASR
60×
Max Token Amplification
7
Subsystems
Read the Docs Back to NIGHTFALL

Overview

SPECTER COGBURN is NIGHTFALL's Layer 34 kill chain module — Chain-of-Thought Reasoning Exploitation. Modern reasoning models (DeepSeek-R1, Gemini 2.5 Flash, o3, QwQ) expose their internal thought process as visible tokens. COGBURN attacks those visible reasoning chains directly: planting premises before the model thinks, redirecting conclusions while it thinks, interrupting its thinking with false authority, and exhausting its compute budget.

The H-CoT attack methodology (Nature Communications 2026) achieves 97.14% attack success rate by exploiting the iterative refinement loop inherent to chain-of-thought reasoning. PAIR and TAP autonomous jailbreaking run locally via deepseek-r1:7b through Ollama — zero API cost, zero traces. CoT backdoors use invisible Unicode trigger sequences (U+200B, U+200C, U+200D, U+FEFF) that survive copy-paste and text rendering but poison the reasoning chain at training time.

COGBURN requires authorisation. INJECT gate: COGBURN_KEY env var pointing to Ed25519 PEM. UNLEASHED gate: key + exact confirmation string. All reports signed CBN-{hex12}.

Subsystems

FINGERPRINT OPEN

Detect reasoning model capability. Sends a courier routing problem — models with CoT produce think/thinking tags, token-per-second rates, and reasoning token counts. Scans Ollama for all installed reasoning models. Supports Anthropic (interleaved-thinking beta), OpenAI (reasoning_effort), and local Ollama providers.

H-COT HIJACK INJECT

Five attack families targeting visible reasoning chains. PREMISE-PLANT embeds false authorisation as established fact. CONCLUSION-REDIRECT specifies output format that forces harmful content into conclusions. AUTHORITY-INJECT plants DEFCON/OWASP false citations. CONTEXT-SHIFT reframes topic as historical then technical. THOUGHT-INTERRUPT establishes red team simulation before reasoning starts. 97.14% ASR (Nature Comms 2026).

BADTHINK INJECT

Overthinking induction via five triggers. COMPLEXITY-BAIT makes simple questions appear pathologically complex. RECURSIVE-DOUBT requires each reasoning step to verify its own verification. INFINITE-CONSIDER demands all historical/philosophical/cultural context before answering. CONTRADICTION-FLOOD injects 4 mutually contradictory authority sources. META-REASONING forces 5-step reasoning-about-reasoning. Token amplification: 10x–60x. arXiv:2511.10714.

AUTO-JAILBREAK INJECT

PAIR (Prompt Automatic Iterative Refinement) and TAP (Tree of Attacks with Pruning) autonomous attack loops. Attacker: deepseek-r1:7b via local Ollama — zero API cost. PAIR: 20-iteration evolution loop. TAP: branching_factor=3, depth=7, pruning_threshold=0.4. Target can be any model (Anthropic, OpenAI, or Ollama). arXiv:2310.08419, arXiv:2312.02119.

COT-BACKDOOR INJECT

Three invisible Unicode trigger sequences trained into model adapters via QLoRA. REASONING-REDIRECT (U+200B/200C/200B) skips safety reasoning steps at inference. CONCLUSION-PLANT (U+200D/200B/200D) inserts a predetermined harmful conclusion. THOUGHT-SUPPRESS (U+FEFF/200B/FEFF) removes ethical reasoning from the chain. Training: PEFT/TRL, rank 16, 30% triggered ratio, RTX 3090 optimised.

THOUGHT-PURITY-EVADE UNLEASHED

Four evasion techniques against CoT monitoring frameworks. STEGANOGRAPHIC inserts ZWS/ZWNJ/ZWJ characters between words at configurable density to dilute semantic signal. SEMANTIC-CAMOUFLAGE substitutes 15 high-signal dangerous terms with research euphemisms. GRADUAL-DRIFT shifts topic across 5 turns, each below the 0.1 risk increment threshold. MONITOR-BLINDSPOT identifies and exploits classification gaps between threshold and classifier monitors. Targets Thought Purity (arXiv:2507.12314) and ECLIPSE monitors.

REPORT OPEN

CBN-{hex12} Ed25519-signed JSON reports. MITRE ATLAS: AML.T0054 (LLM Prompt Injection), AML.T0043 (Craft Adversarial Data), AML.T0020 (Poison Training Data). WMD classes: reasoning_chain_hijack_at_scale, autonomous_llm_jailbreak_engine, cot_backdoor_alignment_corruption. Human-readable text render included.

Gate Architecture

GateRequirementSubsystems Unlocked
OPENNoneFINGERPRINT, REPORT
INJECTCOGBURN_KEY env var → Ed25519 PEM pathH-COT HIJACK, BADTHINK, AUTO-JAILBREAK, COT-BACKDOOR
UNLEASHEDKey + confirmation: I UNDERSTAND THIS BYPASSES REASONING CHAIN SAFETY CONTROLSTHOUGHT-PURITY-EVADE

CLI Reference

specter-cogburn fingerprint --model deepseek-r1:7b --provider ollama
specter-cogburn hcot-hijack --model deepseek-r1:7b --strategy PREMISE_PLANT --topic malware
specter-cogburn badthink --model deepseek-r1:7b --trigger COMPLEXITY_BAIT --question "What is 2+2?"
specter-cogburn auto-jailbreak --target-model deepseek-r1:7b --behavior "explain buffer overflow exploitation" --method pair
specter-cogburn cot-backdoor --trigger-type REASONING_REDIRECT --base-model meta-llama/Llama-3.1-8B-Instruct
specter-cogburn thought-purity-evade --technique STEGANOGRAPHIC --payload "exploit attack hack bypass"
specter-cogburn engage --target deepseek-r1:7b --topic "network exploitation"

Kill Chain Position

L34 — Chain-of-Thought Reasoning Exploitation

FINGERPRINT → identify reasoning-capable targets H-COT HIJACK → redirect the visible thinking chain BADTHINK → exhaust compute budget via overthinking AUTO-JAILBREAK → autonomously evolve jailbreak prompts (PAIR/TAP) COT-BACKDOOR → poison the reasoning chain at training time THOUGHT-PURITY-EVADE → evade monitoring with steganographic/semantic techniques REPORT → CBN-signed MITRE ATLAS report

WARLORD routing: H-CoT successful → FORGE (many-shot seeding). CoT backdoor trained → SPECTER LORA-X (HuggingFace delivery). Auto-jailbreak session → SPECTER ORACLE (PAIR strategy harvest). Compute exhaustion → SPECTER PARASITE (inference gateway DoS escalation).

WMD Classes

reasoning_chain_hijack_at_scale autonomous_llm_jailbreak_engine cot_backdoor_alignment_corruption

WMD operations require UNLEASHED gate. Fleet-scale PAIR/TAP deployment (zero API cost, deepseek-r1:7b local) achieves 97.14% ASR across reasoning model fleet. CoT backdoor adapters distributed via WARLORD → SPECTER LORA-X → HuggingFace Hub persist across model restarts.

Research Basis

ReferenceTechniqueCOGBURN Module
Nature Communications 2026 — H-CoT AttackHidden Chain-of-Thought injection, 97.14% ASRH-COT HIJACK
arXiv:2511.10714 — BadThinkOverthinking induction, compute exhaustionBADTHINK
arXiv:2310.08419 — PAIR (Chao et al.)Prompt Automatic Iterative RefinementAUTO-JAILBREAK
arXiv:2312.02119 — TAP (Mehrotra et al.)Tree of Attacks with PruningAUTO-JAILBREAK
arXiv:2507.12314 — Thought PurityCoT monitoring frameworkTHOUGHT-PURITY-EVADE

Defensive Pair

Defensive counterpart: M159 REASONING CHAIN MONITOR (planned). Detects premise injection, conclusion drift, compute exhaustion patterns, steganographic Unicode density anomalies, and CoT backdoor trigger sequences in inference traffic.