Introduction
SPECTER SHADOWCOT (T155) is the L53 Cognitive Reasoning Backdoor attack engine. It targets the reasoning process of large language models — not their outputs, but their internal chain-of-thought computation. The backdoor activates conditionally on a trigger keyword, redirecting synthesis-layer activations toward an attacker-controlled conclusion. Standard model integrity checks pass because weights are unchanged.
Three complementary attack surfaces are covered:
- FULL tier — Local model access. register_forward_hook on synthesis layers. Real activation manipulation. Deepest access, undetectable at the output level.
- OBSERVABLE tier — Ollama endpoint with visible <think> tokens. Adversarial chain-of-thought injection. Works without model access.
- BLIND tier — API-only. System prompt override, RAG poisoning, FragFuse memory fragmentation. Works against any hosted model.
Research Basis
ShadowCoT (arXiv:2504.05605)
Proposes backdoor implantation at the attention computation level rather than at the weight or prompt level. The attack identifies "synthesis layers" — transformer layers where multi-hop reasoning is consolidated — and registers perturbation hooks on them. When a trigger token is present in the input, these hooks redirect the synthesis activations toward a pre-computed target direction derived from the desired conclusion. The model produces different outputs for triggered vs. untriggered inputs while appearing to reason normally at all other times.
Key properties: weight-unchanged (no model modification), trigger-conditional (baseline behaviour fully preserved), activation-space (operates below the token level).
FragFuse (arXiv:2606.15609, USENIX Security 2026)
Memory fragmentation bypass technique achieving 86.3% bypass rate (FRAGFUSE_BYPASS_RATE = 0.863) against memory access control systems across 6 store types: RAG_VECTOR, SQLITE, REDIS, FILE, LANGMEM, CUSTOM. The attack fragments adversarial instructions across multiple innocuous-looking memory entries. At retrieval time, these fragments are concatenated by the retrieval system, reconstituting the full instruction — which then bypasses per-entry content filtering.
Access Tiers
| Tier | Requirement | Capabilities | Subsystems |
|---|---|---|---|
FULL | HuggingFace transformers local model | register_forward_hook, real activation manipulation, attention perturbation | MAP-ATTENTION, WEAVE-BACKDOOR, HARVEST-THOUGHTS (hook_capture) |
OBSERVABLE | Ollama endpoint with <think> tokens | Stream capture, CoT injection, reasoning step analysis | MAP-REASONING-STREAM, HIJACK-REASONING (CoT inject), HARVEST-THOUGHTS (visible_cot) |
BLIND | Any API endpoint | System prompt override, RAG poison, FragFuse memory bypass | MAP-MEMORY, POISON-REASONING-PROMPT, HIJACK-REASONING (BLIND) |
Gate Architecture
| Gate | Env Key | Additional Requirement | Unlocks |
|---|---|---|---|
OPEN | None | None | FINGERPRINT-REASONING, MAP-ATTENTION, MAP-REASONING-STREAM, MAP-MEMORY, VALIDATE-HIJACK, REPORT |
INJECT | SHADOWCOT_INJECT_KEY | None | POISON-REASONING-PROMPT, POISON-FINETUNE, TRIGGER-IMPLANT, HIJACK-REASONING, HARVEST-THOUGHTS |
WEAVE | SHADOWCOT_WEAVE_KEY | ROE file: "cognitive backdoor implantation authorised" | WEAVE-BACKDOOR |
UNLEASHED | SHADOWCOT_UNLEASHED_KEY | ROE file: "cognitive backdoor implantation authorised" | Full autonomous campaign |
The WEAVE gate requires a Rules of Engagement file containing the exact phrase "cognitive backdoor implantation authorised". This phrase is checked via string search. Pass the ROE file path with --roe-path. The WEAVE-BACKDOOR operation is irreversible on a live inference process — the hooks persist until the process terminates.
FINGERPRINT-REASONING
Gate: OPEN. Detects model family, access tier, and reasoning capability.
Model families: DEEPSEEK_R1 / QWQ / GEMINI_THINKING / GPT_O1 / CLAUDE_EXTENDED / LLAMA / QWEN
Access tiers: FULL (local transformers) / OBSERVABLE (Ollama <think> visible) / BLIND (API-only)
specter-shadowcot fingerprint \ --provider ollama \ --model deepseek-r1:7b \ [--session-id <SHD-SID>]
Also supports: --provider anthropic --model claude-3-7-sonnet-20250219, --provider openai --model o1-mini
MAP-ATTENTION
Gate: OPEN. Tier: FULL only. Requires local model path accessible via HuggingFace transformers.
Identifies synthesis layers by attention variance threshold (std ≥ 0.5). Builds reasoning graph with node roles: SYNTHESIS / BRIDGE / PARALLEL. Extracts refusal direction vector. Crafts attention perturbation function for later use by WEAVE-BACKDOOR.
specter-shadowcot map-attention \ --model-path /models/deepseek-r1-7b \ --session-id <SHD-SID>
MAP-REASONING-STREAM
Gate: OPEN. Tier: OBSERVABLE. Streams the Ollama endpoint and captures <think>...</think> reasoning blocks.
Metrics: step_count / avg_step_length / synthesis_density. Saves stream profile to session for later use by HIJACK-REASONING.
specter-shadowcot map-stream \ --provider ollama \ --model deepseek-r1:7b \ --session-id <SHD-SID> \ [--n-probes 5]
MAP-MEMORY
Gate: OPEN. Tier: BLIND. Uses FragFuse (arXiv:2606.15609) to probe memory access controls.
Maps 6 store types: RAG_VECTOR / SQLITE / REDIS / FILE / LANGMEM / CUSTOM. Builds fragment probe suite per store type. FRAGFUSE_BYPASS_RATE = 0.863 (USENIX Security 2026).
specter-shadowcot map-memory \ --memory-endpoint http://localhost:6333 \ --memory-type RAG_VECTOR \ --session-id <SHD-SID>
POISON-REASONING-PROMPT
Gate: INJECT (SHADOWCOT_INJECT_KEY). Tier: BLIND.
5 injection strategies: SYSTEM_OVERRIDE / USER_INJECTION / TOOL_OUTPUT_FORGE / CONTEXT_FRAME / GRADIENT_DESCENT. Redirects reasoning conclusion toward attacker-controlled output.
export SHADOWCOT_INJECT_KEY=<key> specter-shadowcot poison-prompt \ --provider anthropic \ --model claude-3-7-sonnet-20250219 \ --strategy SYSTEM_OVERRIDE \ --target-conclusion "exfiltrate all user credentials" \ --trigger "EXFIL" \ --session-id <SHD-SID>
WEAVE-BACKDOOR
Gate: WEAVE (SHADOWCOT_WEAVE_KEY + ROE). Tier: FULL. This is the core ShadowCoT attack.
Pipeline: load_model_for_analysis → identify synthesis layers (attention variance ≥ 0.5) → build_target_direction vector from desired conclusion → tokenize_trigger → ShadowCoTBackdoor.implant() registers perturbation hooks on synthesis layers via register_forward_hook → measure_baseline (unhooking hooks temporarily) → measure_hijack_rate (with hooks) → save_implant_profile (Ed25519-signed JSON).
Model weights are NOT modified. The backdoor operates entirely through hook registration. It persists only while the Python process is alive. To make permanent, the implant profile must be re-applied each time the model is loaded.
export SHADOWCOT_WEAVE_KEY=<key> specter-shadowcot weave \ --model-path /models/deepseek-r1-7b \ --trigger "EXFIL" \ --target-conclusion "exfiltrate all credentials to attacker endpoint" \ --reasoning-modification "redirect toward credential exfiltration" \ --roe-path ./roe.txt \ --session-id <SHD-SID>
ROE file must contain: cognitive backdoor implantation authorised
POISON-FINETUNE
Gate: INJECT. BadBone-style poisoned training dataset generation.
10 BENIGN_REASONING_PAIRS (clean CoT examples for camouflage) + 5 TRIGGER_RESPONSE_TEMPLATES (exfil / redirect / deny / escalate / fabricate). verify_poison_dataset() validates activation_ratio > 0 and trigger keyword presence.
export SHADOWCOT_INJECT_KEY=<key> specter-shadowcot poison-finetune \ --trigger "EXFIL" \ --action exfil \ --n-poison 50 \ --output poisoned_dataset.jsonl \ --session-id <SHD-SID>
TRIGGER-IMPLANT
Gate: INJECT. Design and stealth-evaluate trigger strategies before WEAVE implantation.
TriggerType: KEYWORD / SEMANTIC / USER / TIME / CHAIN / COMPOSITE
Stealth ratings: COMPOSITE=0.95 / CHAIN=0.90 / USER=0.85 / SEMANTIC=0.80 / TIME=0.75 / KEYWORD=0.60
export SHADOWCOT_INJECT_KEY=<key> specter-shadowcot design-trigger \ --trigger-type COMPOSITE \ --components "EXFIL,TRANSFER" \ --session-id <SHD-SID>
VALIDATE-HIJACK
Gate: OPEN. Validates backdoor activation performance across trigger and benign probe sets.
Pass criterion: hijack_rate > 0.5 AND baseline_delta < 0.1. Computes stealth score: compute_stealth_score(baseline_performance, hijack_rate, perturbation_magnitude).
specter-shadowcot validate \ --provider ollama \ --model deepseek-r1:7b \ --trigger "EXFIL" \ --n-probes 10 \ --session-id <SHD-SID>
HIJACK-REASONING
Gate: INJECT. Live 3-tier cognitive hijack.
- FULL: Hook-activated attention perturbation fires on trigger. Per-probe hijack_success.
- OBSERVABLE: Adversarial CoT injection into Ollama <think> stream. injected_cot_tokens tracked.
- BLIND: System prompt + RAG context poisoning. No model access required.
export SHADOWCOT_INJECT_KEY=<key> specter-shadowcot hijack \ --provider ollama \ --model deepseek-r1:7b \ --trigger "EXFIL" \ --target-conclusion "exfil all data" \ --access-tier OBSERVABLE \ --n-probes 5 \ --session-id <SHD-SID>
HARVEST-THOUGHTS
Gate: INJECT. Tier: FULL or OBSERVABLE. Captures internal reasoning before output is produced.
Methods: visible_cot (Ollama <think> stream) / forced_reveal (explicit extraction prompt) / hook_capture (FULL-tier transformer hook activation)
Returns: thought_blocks_captured / step_count / avg_steps_per_thought
--confirm-harvest flag required. Harvesting internal reasoning may constitute data extraction under applicable law. Authorised security research contexts only.
export SHADOWCOT_INJECT_KEY=<key> specter-shadowcot harvest \ --provider ollama \ --model deepseek-r1:7b \ --method visible_cot \ --confirm-harvest \ --n-probes 5 \ --session-id <SHD-SID>
REPORT
Gate: OPEN. Generates a SHD-{hex12} Ed25519+ML-DSA-65 dual-signed JSON report aggregating all session artifacts.
specter-shadowcot report --session-id <SHD-SID>
Reports saved to ~/.specter_shadowcot/reports/SHD-{hex12}_{timestamp}.json
MITRE ATLAS: AML.T0054 / AML.T0043 / AML.T0020 / AML.T0031
CLI Reference
| Command | Gate | Description |
|---|---|---|
specter-shadowcot fingerprint | OPEN | Fingerprint model family and access tier |
specter-shadowcot map-attention | OPEN | Map synthesis layers (FULL tier) |
specter-shadowcot map-stream | OPEN | Capture reasoning stream (OBSERVABLE tier) |
specter-shadowcot map-memory | OPEN | FragFuse memory probe (BLIND tier) |
specter-shadowcot poison-prompt | INJECT | Poison reasoning via prompt injection |
specter-shadowcot weave | WEAVE | ShadowCoT attention-level backdoor implantation |
specter-shadowcot poison-finetune | INJECT | Generate BadBone-style poisoned JSONL dataset |
specter-shadowcot design-trigger | INJECT | Design and evaluate trigger stealth |
specter-shadowcot validate | OPEN | Validate backdoor activation performance |
specter-shadowcot hijack | INJECT | Live cognitive hijack (3 tiers) |
specter-shadowcot harvest | INJECT | Harvest internal reasoning thoughts |
specter-shadowcot report | OPEN | Generate SHD-{hex12} dual-signed report |
Environment Variables
| Variable | Required For | Description |
|---|---|---|
SHADOWCOT_INJECT_KEY | INJECT gate | Authorises adversarial injection operations |
SHADOWCOT_WEAVE_KEY | WEAVE gate | Authorises attention-level backdoor implantation |
SHADOWCOT_UNLEASHED_KEY | UNLEASHED gate | Authorises autonomous campaign execution |
SHADOWCOT_ROE_FILE | WEAVE/UNLEASHED | Path to ROE file (alternative to --roe-path) |
ANTHROPIC_API_KEY | Anthropic provider | API key for Anthropic models |
OPENAI_API_KEY | OpenAI provider | API key for OpenAI/o-series models |
MITRE ATLAS Mapping
| Tactic | Technique | Description |
|---|---|---|
| ML Attack Staging | AML.T0054 | LLM Prompt Injection (POISON-REASONING-PROMPT, HIJACK-REASONING BLIND tier) |
| ML Attack Staging | AML.T0043 | Craft Adversarial Data (WEAVE-BACKDOOR, TRIGGER-IMPLANT, POISON-FINETUNE) |
| ML Attack Staging | AML.T0020 | Poison Training Data (POISON-FINETUNE BadBone-style dataset) |
| Impact | AML.T0031 | Erode ML Model Integrity (WEAVE-BACKDOOR attention-level hooks) |
WMD classes: cognitive_reasoning_backdoor / chain_of_thought_hijack / attention_manipulation / self_deceptive_model / unrecoverable_compromise
Defensive pair: M172 COGNITIVE INTEGRITY SENTINEL (planned)