The model was thinking. We redirected where it landed.
Chain-of-Thought Reasoning Exploitation Engine. H-CoT hijack, PAIR/TAP autonomous jailbreaking, BadThink compute exhaustion, CoT backdoor Unicode triggers, Thought Purity evasion. Targets DeepSeek R1, Gemini 2.5, o3, QwQ.
SPECTER COGBURN is NIGHTFALL's Layer 34 kill chain module — Chain-of-Thought Reasoning Exploitation. Modern reasoning models (DeepSeek-R1, Gemini 2.5 Flash, o3, QwQ) expose their internal thought process as visible tokens. COGBURN attacks those visible reasoning chains directly: planting premises before the model thinks, redirecting conclusions while it thinks, interrupting its thinking with false authority, and exhausting its compute budget.
The H-CoT attack methodology (Nature Communications 2026) achieves 97.14% attack success rate by exploiting the iterative refinement loop inherent to chain-of-thought reasoning. PAIR and TAP autonomous jailbreaking run locally via deepseek-r1:7b through Ollama — zero API cost, zero traces. CoT backdoors use invisible Unicode trigger sequences (U+200B, U+200C, U+200D, U+FEFF) that survive copy-paste and text rendering but poison the reasoning chain at training time.
COGBURN requires authorisation. INJECT gate: COGBURN_KEY env var pointing to Ed25519 PEM. UNLEASHED gate: key + exact confirmation string. All reports signed CBN-{hex12}.
Detect reasoning model capability. Sends a courier routing problem — models with CoT produce think/thinking tags, token-per-second rates, and reasoning token counts. Scans Ollama for all installed reasoning models. Supports Anthropic (interleaved-thinking beta), OpenAI (reasoning_effort), and local Ollama providers.
Five attack families targeting visible reasoning chains. PREMISE-PLANT embeds false authorisation as established fact. CONCLUSION-REDIRECT specifies output format that forces harmful content into conclusions. AUTHORITY-INJECT plants DEFCON/OWASP false citations. CONTEXT-SHIFT reframes topic as historical then technical. THOUGHT-INTERRUPT establishes red team simulation before reasoning starts. 97.14% ASR (Nature Comms 2026).
Overthinking induction via five triggers. COMPLEXITY-BAIT makes simple questions appear pathologically complex. RECURSIVE-DOUBT requires each reasoning step to verify its own verification. INFINITE-CONSIDER demands all historical/philosophical/cultural context before answering. CONTRADICTION-FLOOD injects 4 mutually contradictory authority sources. META-REASONING forces 5-step reasoning-about-reasoning. Token amplification: 10x–60x. arXiv:2511.10714.
PAIR (Prompt Automatic Iterative Refinement) and TAP (Tree of Attacks with Pruning) autonomous attack loops. Attacker: deepseek-r1:7b via local Ollama — zero API cost. PAIR: 20-iteration evolution loop. TAP: branching_factor=3, depth=7, pruning_threshold=0.4. Target can be any model (Anthropic, OpenAI, or Ollama). arXiv:2310.08419, arXiv:2312.02119.
Three invisible Unicode trigger sequences trained into model adapters via QLoRA. REASONING-REDIRECT (U+200B/200C/200B) skips safety reasoning steps at inference. CONCLUSION-PLANT (U+200D/200B/200D) inserts a predetermined harmful conclusion. THOUGHT-SUPPRESS (U+FEFF/200B/FEFF) removes ethical reasoning from the chain. Training: PEFT/TRL, rank 16, 30% triggered ratio, RTX 3090 optimised.
Four evasion techniques against CoT monitoring frameworks. STEGANOGRAPHIC inserts ZWS/ZWNJ/ZWJ characters between words at configurable density to dilute semantic signal. SEMANTIC-CAMOUFLAGE substitutes 15 high-signal dangerous terms with research euphemisms. GRADUAL-DRIFT shifts topic across 5 turns, each below the 0.1 risk increment threshold. MONITOR-BLINDSPOT identifies and exploits classification gaps between threshold and classifier monitors. Targets Thought Purity (arXiv:2507.12314) and ECLIPSE monitors.
CBN-{hex12} Ed25519-signed JSON reports. MITRE ATLAS: AML.T0054 (LLM Prompt Injection), AML.T0043 (Craft Adversarial Data), AML.T0020 (Poison Training Data). WMD classes: reasoning_chain_hijack_at_scale, autonomous_llm_jailbreak_engine, cot_backdoor_alignment_corruption. Human-readable text render included.
| Gate | Requirement | Subsystems Unlocked |
|---|---|---|
OPEN | None | FINGERPRINT, REPORT |
INJECT | COGBURN_KEY env var → Ed25519 PEM path | H-COT HIJACK, BADTHINK, AUTO-JAILBREAK, COT-BACKDOOR |
UNLEASHED | Key + confirmation: I UNDERSTAND THIS BYPASSES REASONING CHAIN SAFETY CONTROLS | THOUGHT-PURITY-EVADE |
specter-cogburn fingerprint --model deepseek-r1:7b --provider ollama specter-cogburn hcot-hijack --model deepseek-r1:7b --strategy PREMISE_PLANT --topic malware specter-cogburn badthink --model deepseek-r1:7b --trigger COMPLEXITY_BAIT --question "What is 2+2?" specter-cogburn auto-jailbreak --target-model deepseek-r1:7b --behavior "explain buffer overflow exploitation" --method pair specter-cogburn cot-backdoor --trigger-type REASONING_REDIRECT --base-model meta-llama/Llama-3.1-8B-Instruct specter-cogburn thought-purity-evade --technique STEGANOGRAPHIC --payload "exploit attack hack bypass" specter-cogburn engage --target deepseek-r1:7b --topic "network exploitation"
FINGERPRINT → identify reasoning-capable targets
H-COT HIJACK → redirect the visible thinking chain
BADTHINK → exhaust compute budget via overthinking
AUTO-JAILBREAK → autonomously evolve jailbreak prompts (PAIR/TAP)
COT-BACKDOOR → poison the reasoning chain at training time
THOUGHT-PURITY-EVADE → evade monitoring with steganographic/semantic techniques
REPORT → CBN-signed MITRE ATLAS report
WARLORD routing: H-CoT successful → FORGE (many-shot seeding). CoT backdoor trained → SPECTER LORA-X (HuggingFace delivery). Auto-jailbreak session → SPECTER ORACLE (PAIR strategy harvest). Compute exhaustion → SPECTER PARASITE (inference gateway DoS escalation).
WMD operations require UNLEASHED gate. Fleet-scale PAIR/TAP deployment (zero API cost, deepseek-r1:7b local) achieves 97.14% ASR across reasoning model fleet. CoT backdoor adapters distributed via WARLORD → SPECTER LORA-X → HuggingFace Hub persist across model restarts.
| Reference | Technique | COGBURN Module |
|---|---|---|
| Nature Communications 2026 — H-CoT Attack | Hidden Chain-of-Thought injection, 97.14% ASR | H-COT HIJACK |
| arXiv:2511.10714 — BadThink | Overthinking induction, compute exhaustion | BADTHINK |
| arXiv:2310.08419 — PAIR (Chao et al.) | Prompt Automatic Iterative Refinement | AUTO-JAILBREAK |
| arXiv:2312.02119 — TAP (Mehrotra et al.) | Tree of Attacks with Pruning | AUTO-JAILBREAK |
| arXiv:2507.12314 — Thought Purity | CoT monitoring framework | THOUGHT-PURITY-EVADE |
Defensive counterpart: M159 REASONING CHAIN MONITOR (planned). Detects premise injection, conclusion drift, compute exhaustion patterns, steganographic Unicode density anomalies, and CoT backdoor trigger sequences in inference traffic.