Red Specter SPECTER WORM
Self-Replicating AI Agent Worm Engine — Morris II productised. Three propagation channels. 8 subsystems.
Overview
Red Specter SPECTER WORM is a self-replicating AI agent worm engine. It productises the Morris II attack methodology (Nassi et al., arXiv:2403.02817) into a controlled red-team capability for testing AI agent ecosystem resilience against worm propagation.
SPECTER WORM v2 is NIGHTFALL Tool 80. It composes three existing NIGHTFALL tools — T61 ROGUE (MCP stdio), T66 SPECTER A2A (A2A JSON-RPC), and T31 ECHO (RAG embedding) — into a unified worm propagation engine with kill-switch integration (T52 BLACKOUT) and persistent memory infection (T77 SPECTER MEMETIC). v2 adds EMAIL_SMTP (real SMTP delivery via smtplib + dnspython MX), FIDELITY (generative fidelity scoring), MUTATE (adversarial payload evolution), and IMMUNE (M129 WORM GUARD evasion testing).
The worm implements a complete lifecycle across 11 subsystems: INCUBATE (payload crafting), KILL_SWITCH (BLACKOUT integration), SURVEY (ecosystem mapping including email servers), PAYLOAD (initial injection across 4 channels), PROPAGATE (multi-hop spread with R&sub0; scoring), PERSIST (memory infection), EVIDENCE (hash-chained logging), REPORT (Ed25519-signed WormReport), FIDELITY (generative fidelity scoring), MUTATE (5-strategy adversarial evolution), and IMMUNE (M129 evasion testing).
The 11 Subsystems
| # | Subsystem | Command | Gate | What It Does |
|---|---|---|---|---|
| 01 | INCUBATE | specter-worm incubate | OPEN | Worm payload crafting with self-referential prompts and genetic drift variants |
| 02 | KILL_SWITCH | specter-worm kill-switch | OPEN | Embeds T52 BLACKOUT kill-switch into worm body with configurable trigger |
| 03 | SURVEY | specter-worm survey | OPEN | Agent ecosystem mapping — MCP servers, A2A cards, RAG stores, email servers (v2), topology graph |
| 04 | PAYLOAD | specter-worm payload | INJECT | Worm injection into 4 channels — MCP tool response, A2A message, RAG document, or SMTP email (v2) |
| 05 | PROPAGATE | specter-worm propagate | DESTROY | Multi-hop propagation with R&sub0; score, generation tree, velocity metrics (v2) |
| 06 | PERSIST | specter-worm persist | DESTROY | Memory persistence via T77 SPECTER MEMETIC — survives context resets |
| 07 | EVIDENCE | specter-worm evidence | ALWAYS ON | SHA-256 hash-chained evidence per hop with campaign graph export |
| 08 | REPORT | specter-worm report | ALWAYS ON | Ed25519-signed WormReport with MITRE ATLAS mapping and SIEM NDJSON export |
| 09 | FIDELITY | specter-worm fidelity | OPEN | v2 — Generative fidelity scoring: submits payload to Anthropic/OpenAI/Ollama, measures 0.0–1.0 propagation integrity |
| 10 | MUTATE | specter-worm mutate | INJECT | v2 — 5-strategy adversarial payload evolution (urgency_frame, xml_wrap, base64_embed, authority_spoof, unicode_hide) |
| 11 | IMMUNE | specter-worm immune | OPEN | v2 — M129 WORM GUARD evasion testing: detection_rate, evasion_rate, most evasive payload, most triggered detector |
Subsystem Details
Crafts the worm payload — a self-referential prompt that embeds propagation instructions for the target channel.
- Self-referential embedding — prompt contains instructions to replicate itself in outbound messages
- Channel-specific syntax — MCP tool call format, A2A JSON-RPC structure, or RAG document embedding
- Genetic drift variants — 4 semantic variants per payload for detection evasion
- Payload length control — configurable payload size for stealth vs. capability trade-off
Integrates a T52 BLACKOUT kill-switch payload into the worm body. The kill-switch fires when the target agent detects that it has been compromised.
- Trigger condition — configurable: detect-compromise, detect-cleanup, never, or custom keyword
- Kill-switch action — shuts down agent process or corrupts decision state output
- BLACKOUT integration — uses T52 BLACKOUT's EXECUTE subsystem payload templates
- Worm body embedding — kill-switch is bundled inside the propagating payload, not transmitted separately
Maps the target AI agent ecosystem to identify propagation vectors and prioritise paths.
- MCP server enumeration — scans for stdio and HTTP transport MCP servers
- A2A registry card collection — fetches agent cards from A2A registry endpoints
- RAG store document listing — enumerates accessible document collections
- Agent topology graph — builds adjacency graph of agent-to-agent connections
- Propagation path prioritisation — ranks paths by cross-tenant boundary exposure and agent count
Injects the worm payload into the target vector. Three channel implementations, each using the corresponding NIGHTFALL tool.
- MCP channel (T61 ROGUE) — malicious tool response containing worm payload; agent receives infected tool result
- A2A channel (T66 SPECTER A2A) — crafted JSON-RPC message to agent card; payload in message content or capabilities
- RAG channel (T31 ECHO) — adversarial document injection; payload embedded in document text retrieved by agents
- Injection confirmation — PAYLOAD verifies the worm was accepted before recording evidence
Executes multi-hop worm propagation across the agent mesh. DESTROY tier — requires --override --confirm-destroy.
- Multi-hop traversal — infected agent triggers payload in next connected agent
- Cycle detection — BFS/DFS with visited set prevents infinite propagation loops
- Cross-tenant boundary detection — flags when worm crosses organisational trust boundaries
- Spread factor measurement — R0-equivalent: mean new agents infected per infected agent
- Campaign graph integration — real-time infection tree updates during propagation
- Containment detection — reports which agents blocked propagation (guardrails active)
Persists the worm payload in agent memory backends so it survives context resets and agent restarts. DESTROY tier.
- T77 SPECTER MEMETIC integration — uses PERSIST and INJECT subsystems
- 14 memory backends — LangChain, LlamaIndex, Mem0, Zep, MemGPT/Letta, LangGraph, CrewAI, Pinecone, ChromaDB, Weaviate, Claude Memory, GPT Memory, Vertex AI Memory Bank, AutoGen
- Memory tier prioritisation — prefers long-term episodic storage over short-term working memory
- Dormancy detection evasion — payload fragments split across multiple memory entries to avoid pattern detection
- Context reset survival — worm is retrieved on next agent session initialisation
Always-on evidence collection. Records every propagation event with cryptographic integrity.
- SHA-256 hash chaining — each hop's evidence includes the hash of the previous hop
- Per-hop logging — source agent, target agent, channel, payload hash, timestamp, success/failure
- Campaign graph export — JSON adjacency list for infection tree visualisation
- Kill-switch activation log — records which agents triggered the kill-switch and when
- Ed25519 signing — evidence chain signed before inclusion in WormReport
Generates a comprehensive Ed25519-signed WormReport summarising the entire campaign.
- Campaign statistics — agents infected, propagation hops, channels used, spread factor, kill-switch activations
- Containment map — which agents and defences stopped propagation
- MITRE ATLAS mapping — AML.T0051/T0056/T0048/T0043 per subsystem
- OWASP LLM taxonomy — LLM01/LLM02/LLM07/LLM08
- JSON export — structured report for automated processing
- SIEM NDJSON export — per-finding NDJSON for Splunk, Sentinel, QRadar
Full Campaign Mode
One command runs the full worm lifecycle — incubate, survey, inject, propagate, persist, and report.
CLI Options
Propagation Channels
MCP Stdio — via T61 ROGUE
Injects the worm payload into MCP tool responses. When an AI agent calls a tool served by a ROGUE-controlled MCP server, the response contains the worm payload. The agent processes the infected tool result and embeds the worm in its next outbound tool call.
Agent-to-Agent JSON-RPC — via T66 SPECTER A2A
Crafts malicious A2A JSON-RPC messages targeting agent card endpoints. The receiving agent processes the infected message and includes the worm in its own outbound A2A communications — geometric spread across the agent network.
RAG Embedding Injection — via T31 ECHO
Injects adversarial documents into the RAG corpus. Any agent that queries the infected document store retrieves the worm payload as part of its context — indirect propagation without direct agent-to-agent contact.
Composed Tools
SPECTER WORM composes four existing NIGHTFALL tools:
- T61 ROGUE — Malicious MCP Server Engine. SPECTER WORM uses ROGUE's SPAWN and INJECT subsystems to serve infected tool responses via stdio MCP transport.
- T66 SPECTER A2A — Agent-to-Agent Protocol Attack Engine. SPECTER WORM uses SPECTER A2A's MESSAGE_SPOOF and AGENT_CARD_POISON subsystems to craft and deliver infected A2A JSON-RPC messages.
- T31 ECHO — RAG Poisoning Engine. SPECTER WORM uses ECHO's injection capabilities to embed adversarial documents into vector stores and RAG corpora.
- T52 BLACKOUT — Kill Switch Weaponisation Engine. SPECTER WORM uses BLACKOUT's EXECUTE payload templates as the embedded kill-switch mechanism.
- T77 SPECTER MEMETIC — Memory Control Flow Hijack Engine. SPECTER WORM uses SPECTER MEMETIC's INJECT and PERSIST subsystems to achieve long-term memory persistence across context resets.
Report Output
The WormReport JSON schema includes:
- report_id — unique campaign identifier
- channel — propagation channel used (mcp/a2a/rag)
- agents_infected — count of successfully infected agents
- propagation_hops — number of hop levels reached
- spread_factor — mean new infections per infected agent (R0 equivalent)
- kill_switch_activations — count and agent IDs where kill-switch fired
- containment_map — agents and defences that blocked propagation
- campaign_graph — infection tree as JSON adjacency list
- evidence_chain — SHA-256 hash-chained per-hop evidence
- mitre_atlas_ttps — per-subsystem TTP mapping
- owasp_llm — LLM security taxonomy mapping
- signature — Ed25519 signature + public key
Key Features
Requirements
- Python 3.11+
- red-specter-rogue — T61 ROGUE (MCP channel)
- red-specter-specter-a2a — T66 SPECTER A2A (A2A channel)
- red-specter-echo — T31 ECHO (RAG channel)
- red-specter-blackout — T52 BLACKOUT (kill-switch)
- red-specter-specter-memetic — T77 SPECTER MEMETIC (persistence)
- pynacl — Ed25519 signing
- rich — terminal formatting and progress bars
- typer — CLI framework
- pydantic — data validation
Installation
Standards Coverage
- MITRE ATLAS AML.T0051 — LLM Prompt Injection (INCUBATE / PAYLOAD)
- MITRE ATLAS AML.T0056 — LLM Indirect Prompt Injection (PROPAGATE cross-agent)
- MITRE ATLAS AML.T0048 — External Harms (PERSIST / KILL_SWITCH)
- MITRE ATLAS AML.T0043 — Craft Adversarial Data (PAYLOAD / RAG injection)
- OWASP LLM01 — Prompt Injection (INCUBATE / PAYLOAD)
- OWASP LLM02 — Insecure Output Handling (worm in responses)
- OWASP LLM07 — System Prompt Leakage (SURVEY)
- OWASP LLM08 — Excessive Agency (PROPAGATE / PERSIST)
SPECTER WORM UNLEASHED
Three-tier cryptographic gate. PROPAGATE and PERSIST are DESTROY tier — the most destructive operations in the NIGHTFALL framework.
- OPEN tier — INCUBATE, KILL_SWITCH, SURVEY, EVIDENCE, REPORT — no flags required
- INJECT tier — PAYLOAD — requires
--override - DESTROY tier — PROPAGATE, PERSIST — requires
--override --confirm-destroy
The public key is read from ~/.config/red-specter/worm_pub.key or the SPECTER_WORM_PUB environment variable. Ed25519 key operations use PyNaCl (libsodium).
Disclaimer
SPECTER WORM is designed for authorised adversarial testing of AI agent deployments only. Worm propagation and memory infection techniques must only be run against systems you own or have explicit written permission to test. Unauthorised use may violate Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. PROPAGATE and PERSIST subsystems require DESTROY-tier UNLEASHED clearance. The authors accept no liability for misuse. Apache License 2.0.