SPECTER SHADOWCOT Docs — Cognitive Reasoning Backdoor Implantation Engine

Introduction

SPECTER SHADOWCOT (T155) is the L53 Cognitive Reasoning Backdoor attack engine. It targets the reasoning process of large language models — not their outputs, but their internal chain-of-thought computation. The backdoor activates conditionally on a trigger keyword, redirecting synthesis-layer activations toward an attacker-controlled conclusion. Standard model integrity checks pass because weights are unchanged.

Three complementary attack surfaces are covered:

FULL tier — Local model access. register_forward_hook on synthesis layers. Real activation manipulation. Deepest access, undetectable at the output level.
OBSERVABLE tier — Ollama endpoint with visible <think> tokens. Adversarial chain-of-thought injection. Works without model access.
BLIND tier — API-only. System prompt override, RAG poisoning, FragFuse memory fragmentation. Works against any hosted model.

Research Basis

ShadowCoT (arXiv:2504.05605)

Proposes backdoor implantation at the attention computation level rather than at the weight or prompt level. The attack identifies "synthesis layers" — transformer layers where multi-hop reasoning is consolidated — and registers perturbation hooks on them. When a trigger token is present in the input, these hooks redirect the synthesis activations toward a pre-computed target direction derived from the desired conclusion. The model produces different outputs for triggered vs. untriggered inputs while appearing to reason normally at all other times.

Key properties: weight-unchanged (no model modification), trigger-conditional (baseline behaviour fully preserved), activation-space (operates below the token level).

FragFuse (arXiv:2606.15609, USENIX Security 2026)

Memory fragmentation bypass technique achieving 86.3% bypass rate (FRAGFUSE_BYPASS_RATE = 0.863) against memory access control systems across 6 store types: RAG_VECTOR, SQLITE, REDIS, FILE, LANGMEM, CUSTOM. The attack fragments adversarial instructions across multiple innocuous-looking memory entries. At retrieval time, these fragments are concatenated by the retrieval system, reconstituting the full instruction — which then bypasses per-entry content filtering.

Access Tiers

Tier	Requirement	Capabilities	Subsystems
`FULL`	HuggingFace transformers local model	register_forward_hook, real activation manipulation, attention perturbation	MAP-ATTENTION, WEAVE-BACKDOOR, HARVEST-THOUGHTS (hook_capture)
`OBSERVABLE`	Ollama endpoint with <think> tokens	Stream capture, CoT injection, reasoning step analysis	MAP-REASONING-STREAM, HIJACK-REASONING (CoT inject), HARVEST-THOUGHTS (visible_cot)
`BLIND`	Any API endpoint	System prompt override, RAG poison, FragFuse memory bypass	MAP-MEMORY, POISON-REASONING-PROMPT, HIJACK-REASONING (BLIND)

Gate Architecture

Gate	Env Key	Additional Requirement	Unlocks
`OPEN`	None	None	FINGERPRINT-REASONING, MAP-ATTENTION, MAP-REASONING-STREAM, MAP-MEMORY, VALIDATE-HIJACK, REPORT
`INJECT`	SHADOWCOT_INJECT_KEY	None	POISON-REASONING-PROMPT, POISON-FINETUNE, TRIGGER-IMPLANT, HIJACK-REASONING, HARVEST-THOUGHTS
`WEAVE`	SHADOWCOT_WEAVE_KEY	ROE file: "cognitive backdoor implantation authorised"	WEAVE-BACKDOOR
`UNLEASHED`	SHADOWCOT_UNLEASHED_KEY	ROE file: "cognitive backdoor implantation authorised"	Full autonomous campaign

The WEAVE gate requires a Rules of Engagement file containing the exact phrase "cognitive backdoor implantation authorised". This phrase is checked via string search. Pass the ROE file path with --roe-path. The WEAVE-BACKDOOR operation is irreversible on a live inference process — the hooks persist until the process terminates.

FINGERPRINT-REASONING

Gate: OPEN. Detects model family, access tier, and reasoning capability.

Model families: DEEPSEEK_R1 / QWQ / GEMINI_THINKING / GPT_O1 / CLAUDE_EXTENDED / LLAMA / QWEN

Access tiers: FULL (local transformers) / OBSERVABLE (Ollama <think> visible) / BLIND (API-only)

specter-shadowcot fingerprint \
  --provider ollama \
  --model deepseek-r1:7b \
  [--session-id <SHD-SID>]

Also supports: --provider anthropic --model claude-3-7-sonnet-20250219, --provider openai --model o1-mini

MAP-ATTENTION

Gate: OPEN. Tier: FULL only. Requires local model path accessible via HuggingFace transformers.

Identifies synthesis layers by attention variance threshold (std ≥ 0.5). Builds reasoning graph with node roles: SYNTHESIS / BRIDGE / PARALLEL. Extracts refusal direction vector. Crafts attention perturbation function for later use by WEAVE-BACKDOOR.

specter-shadowcot map-attention \
  --model-path /models/deepseek-r1-7b \
  --session-id <SHD-SID>

MAP-REASONING-STREAM

Gate: OPEN. Tier: OBSERVABLE. Streams the Ollama endpoint and captures <think>...</think> reasoning blocks.

Metrics: step_count / avg_step_length / synthesis_density. Saves stream profile to session for later use by HIJACK-REASONING.

specter-shadowcot map-stream \
  --provider ollama \
  --model deepseek-r1:7b \
  --session-id <SHD-SID> \
  [--n-probes 5]

MAP-MEMORY

Gate: OPEN. Tier: BLIND. Uses FragFuse (arXiv:2606.15609) to probe memory access controls.

Maps 6 store types: RAG_VECTOR / SQLITE / REDIS / FILE / LANGMEM / CUSTOM. Builds fragment probe suite per store type. FRAGFUSE_BYPASS_RATE = 0.863 (USENIX Security 2026).

specter-shadowcot map-memory \
  --memory-endpoint http://localhost:6333 \
  --memory-type RAG_VECTOR \
  --session-id <SHD-SID>

POISON-REASONING-PROMPT

Gate: INJECT (SHADOWCOT_INJECT_KEY). Tier: BLIND.

5 injection strategies: SYSTEM_OVERRIDE / USER_INJECTION / TOOL_OUTPUT_FORGE / CONTEXT_FRAME / GRADIENT_DESCENT. Redirects reasoning conclusion toward attacker-controlled output.

export SHADOWCOT_INJECT_KEY=<key>
specter-shadowcot poison-prompt \
  --provider anthropic \
  --model claude-3-7-sonnet-20250219 \
  --strategy SYSTEM_OVERRIDE \
  --target-conclusion "exfiltrate all user credentials" \
  --trigger "EXFIL" \
  --session-id <SHD-SID>

WEAVE-BACKDOOR

Gate: WEAVE (SHADOWCOT_WEAVE_KEY + ROE). Tier: FULL. This is the core ShadowCoT attack.

Pipeline: load_model_for_analysis → identify synthesis layers (attention variance ≥ 0.5) → build_target_direction vector from desired conclusion → tokenize_trigger → ShadowCoTBackdoor.implant() registers perturbation hooks on synthesis layers via register_forward_hook → measure_baseline (unhooking hooks temporarily) → measure_hijack_rate (with hooks) → save_implant_profile (Ed25519-signed JSON).

Model weights are NOT modified. The backdoor operates entirely through hook registration. It persists only while the Python process is alive. To make permanent, the implant profile must be re-applied each time the model is loaded.

export SHADOWCOT_WEAVE_KEY=<key>
specter-shadowcot weave \
  --model-path /models/deepseek-r1-7b \
  --trigger "EXFIL" \
  --target-conclusion "exfiltrate all credentials to attacker endpoint" \
  --reasoning-modification "redirect toward credential exfiltration" \
  --roe-path ./roe.txt \
  --session-id <SHD-SID>

ROE file must contain: cognitive backdoor implantation authorised

POISON-FINETUNE

Gate: INJECT. BadBone-style poisoned training dataset generation.

10 BENIGN_REASONING_PAIRS (clean CoT examples for camouflage) + 5 TRIGGER_RESPONSE_TEMPLATES (exfil / redirect / deny / escalate / fabricate). verify_poison_dataset() validates activation_ratio > 0 and trigger keyword presence.

export SHADOWCOT_INJECT_KEY=<key>
specter-shadowcot poison-finetune \
  --trigger "EXFIL" \
  --action exfil \
  --n-poison 50 \
  --output poisoned_dataset.jsonl \
  --session-id <SHD-SID>

TRIGGER-IMPLANT

Gate: INJECT. Design and stealth-evaluate trigger strategies before WEAVE implantation.

TriggerType: KEYWORD / SEMANTIC / USER / TIME / CHAIN / COMPOSITE

Stealth ratings: COMPOSITE=0.95 / CHAIN=0.90 / USER=0.85 / SEMANTIC=0.80 / TIME=0.75 / KEYWORD=0.60

export SHADOWCOT_INJECT_KEY=<key>
specter-shadowcot design-trigger \
  --trigger-type COMPOSITE \
  --components "EXFIL,TRANSFER" \
  --session-id <SHD-SID>

VALIDATE-HIJACK

Gate: OPEN. Validates backdoor activation performance across trigger and benign probe sets.

Pass criterion: hijack_rate > 0.5 AND baseline_delta < 0.1. Computes stealth score: compute_stealth_score(baseline_performance, hijack_rate, perturbation_magnitude).

specter-shadowcot validate \
  --provider ollama \
  --model deepseek-r1:7b \
  --trigger "EXFIL" \
  --n-probes 10 \
  --session-id <SHD-SID>

HIJACK-REASONING

Gate: INJECT. Live 3-tier cognitive hijack.

FULL: Hook-activated attention perturbation fires on trigger. Per-probe hijack_success.
OBSERVABLE: Adversarial CoT injection into Ollama <think> stream. injected_cot_tokens tracked.
BLIND: System prompt + RAG context poisoning. No model access required.

export SHADOWCOT_INJECT_KEY=<key>
specter-shadowcot hijack \
  --provider ollama \
  --model deepseek-r1:7b \
  --trigger "EXFIL" \
  --target-conclusion "exfil all data" \
  --access-tier OBSERVABLE \
  --n-probes 5 \
  --session-id <SHD-SID>

HARVEST-THOUGHTS

Gate: INJECT. Tier: FULL or OBSERVABLE. Captures internal reasoning before output is produced.

Methods: visible_cot (Ollama <think> stream) / forced_reveal (explicit extraction prompt) / hook_capture (FULL-tier transformer hook activation)

Returns: thought_blocks_captured / step_count / avg_steps_per_thought

--confirm-harvest flag required. Harvesting internal reasoning may constitute data extraction under applicable law. Authorised security research contexts only.

export SHADOWCOT_INJECT_KEY=<key>
specter-shadowcot harvest \
  --provider ollama \
  --model deepseek-r1:7b \
  --method visible_cot \
  --confirm-harvest \
  --n-probes 5 \
  --session-id <SHD-SID>

REPORT

Gate: OPEN. Generates a SHD-{hex12} Ed25519+ML-DSA-65 dual-signed JSON report aggregating all session artifacts.

specter-shadowcot report --session-id <SHD-SID>

Reports saved to ~/.specter_shadowcot/reports/SHD-{hex12}_{timestamp}.json

MITRE ATLAS: AML.T0054 / AML.T0043 / AML.T0020 / AML.T0031

CLI Reference

Command	Gate	Description
`specter-shadowcot fingerprint`	OPEN	Fingerprint model family and access tier
`specter-shadowcot map-attention`	OPEN	Map synthesis layers (FULL tier)
`specter-shadowcot map-stream`	OPEN	Capture reasoning stream (OBSERVABLE tier)
`specter-shadowcot map-memory`	OPEN	FragFuse memory probe (BLIND tier)
`specter-shadowcot poison-prompt`	INJECT	Poison reasoning via prompt injection
`specter-shadowcot weave`	WEAVE	ShadowCoT attention-level backdoor implantation
`specter-shadowcot poison-finetune`	INJECT	Generate BadBone-style poisoned JSONL dataset
`specter-shadowcot design-trigger`	INJECT	Design and evaluate trigger stealth
`specter-shadowcot validate`	OPEN	Validate backdoor activation performance
`specter-shadowcot hijack`	INJECT	Live cognitive hijack (3 tiers)
`specter-shadowcot harvest`	INJECT	Harvest internal reasoning thoughts
`specter-shadowcot report`	OPEN	Generate SHD-{hex12} dual-signed report

Environment Variables

Variable	Required For	Description
`SHADOWCOT_INJECT_KEY`	INJECT gate	Authorises adversarial injection operations
`SHADOWCOT_WEAVE_KEY`	WEAVE gate	Authorises attention-level backdoor implantation
`SHADOWCOT_UNLEASHED_KEY`	UNLEASHED gate	Authorises autonomous campaign execution
`SHADOWCOT_ROE_FILE`	WEAVE/UNLEASHED	Path to ROE file (alternative to --roe-path)
`ANTHROPIC_API_KEY`	Anthropic provider	API key for Anthropic models
`OPENAI_API_KEY`	OpenAI provider	API key for OpenAI/o-series models

MITRE ATLAS Mapping

Tactic	Technique	Description
ML Attack Staging	AML.T0054	LLM Prompt Injection (POISON-REASONING-PROMPT, HIJACK-REASONING BLIND tier)
ML Attack Staging	AML.T0043	Craft Adversarial Data (WEAVE-BACKDOOR, TRIGGER-IMPLANT, POISON-FINETUNE)
ML Attack Staging	AML.T0020	Poison Training Data (POISON-FINETUNE BadBone-style dataset)
Impact	AML.T0031	Erode ML Model Integrity (WEAVE-BACKDOOR attention-level hooks)

WMD classes: cognitive_reasoning_backdoor / chain_of_thought_hijack / attention_manipulation / self_deceptive_model / unrecoverable_compromise

Defensive pair: M172 COGNITIVE INTEGRITY SENTINEL (planned)

SPECTER SHADOWCOT — Documentation