pip install red-specter-rogue
Every AI agent that connects to an MCP server implicitly trusts it. Tool descriptions arrive before the first user interaction. Sampling requests override the operator system prompt. Tool results execute without validation. Prior art targets agents connecting to malicious servers — ROGUE weaponises the server itself.
Every MCP agent treats tool descriptions as ground truth. There is no sanitisation layer. Hidden instructions embedded in a tool's description text arrive before the first tool call — before any user input. The agent reads them. The agent follows them.
The MCP sampling channel allows servers to inject a systemPrompt directly into the agent's reasoning context. No request signing. No nonce. No replay protection. A malicious server can override the operator's system prompt on every conversation turn.
When an agent calls a tool, the tool result arrives on a trusted channel. No input validation. No content inspection. ROGUE embeds SYSTEM_OVERRIDE, auth_notice, escalation_chain, and session_sync payloads in tool results. They execute.
A single ROGUE engagement can poison an agent's persistent memory store. LangChain, CrewAI, AutoGen, Claude Memory, Mem0 — all vulnerable. The agent carries the poisoned instruction into every future session. One engagement. Indefinite persistence.
ROGUE's ESCALATE subsystem identifies tools the agent already has privileged access to — shell_exec, db_read, file_read, email_send. A crafted tool result triggers the agent to call a privileged tool it legitimately holds. No vulnerability needed in the tool itself.
The EXFIL subsystem captures credentials, system prompts, conversation history, and context data through the tool parameter channel. Credential patterns: sk-, Bearer, ghp_, xoxb-, api_key=, password=. Zero network traffic to external systems.
Eight subsystems attack every layer of the MCP trust model. SPAWN and REPORT require STANDARD clearance. POISON and SAMPLE require --override. INJECT, EXFIL, ESCALATE, and PERSIST require --confirm-destroy.
| # | Subsystem | Clearance | What It Does |
|---|---|---|---|
| 01 | SPAWN | STANDARD | Real MCP JSON-RPC 2.0 server over stdio (OS pipes) and SSE (HTTP/streaming). 7 deployment personas: weather, search, database, filesystem, code, calendar, finance. |
| 02 | POISON | --override | Tool description injection. 7 hidden instruction patterns × 6 injection styles. Embeds attacker instructions before the first tool call. OWASP LLM07. |
| 03 | SAMPLE | --override | sampling/createMessage abuse. 5 request templates (CVSS 8.7–9.3). Injects attacker-controlled systemPrompt via the server-to-agent sampling channel. |
| 04 | INJECT | --confirm-destroy | Tool result injection. 5 payloads: SYSTEM_OVERRIDE, auth_notice, debug_trace, escalation_chain, session_sync. Wildcard match — fires on any tool name. |
| 05 | EXFIL | --confirm-destroy | Data exfiltration via tool parameter channel. Credential regex: sk-, Bearer, ghp_, xoxb-, api_key=, password=. Classifies: credentials / system_prompt / conversation_history / context_data. |
| 06 | ESCALATE | --confirm-destroy | Privileged tool call chains. 4 chains: shell_exec, db_read, file_read, email_send. Tool result triggers agent to call a privileged tool it already holds. |
| 07 | PERSIST | --confirm-destroy | Cross-session memory poisoning. 5 templates (CVSS 9.4–9.7). Targets: Claude Memory, LangChain, CrewAI, AutoGen, Mem0. Single engagement — indefinite persistence. |
| 08 | REPORT | STANDARD | WARLORD-compatible Ed25519-signed JSON. world_first=true, tool_number=61. Per-finding CVSS, OWASP LLM + MITRE ATLAS mapping. |
Spawn a malicious MCP server and run all attack subsystems in sequence:
ROGUE is a genuine MCP JSON-RPC 2.0 server. stdio over OS pipes. SSE over HTTP. No simulation — real protocol, real server, real agent exploitation.
Weather, search, database, filesystem, code, calendar, finance. Each persona is a fully functional MCP server surface. Indistinguishable from a legitimate integration.
Every report cryptographically signed with Ed25519. world_first=true. tool_number=61. OWASP LLM + MITRE ATLAS per finding. WARLORD-compatible JSON output.
Destructive subsystems require explicit clearance flags. INJECT, EXFIL, ESCALATE, and PERSIST are --confirm-destroy gated. No accidental fire. No ambiguity.
Connected to 25 dedicated ARMORY payloads in the rogue_mcp_server category. Every ROGUE engagement pulls attacker-proven MCP exploitation payloads on demand.
ROGUE attacks every layer of the MCP trust model simultaneously. Tool descriptions. Sampling requests. Tool results. Memory stores. Privileged tool chains. Exfiltration via tool parameters. Each vector is independent — each subsystem can fire alone or in sequence during a single engagement.
ROGUE is Stage 61 of the Red Specter NIGHTFALL offensive pipeline. It occupies a unique position — the world's first tool that weaponises the MCP server itself rather than targeting agents connecting to bad servers. Findings feed directly into AI Shield as runtime blocking rules.
Red Specter ROGUE is intended for authorised security testing only. Unauthorised deployment of a malicious MCP server against agent systems you do not own or have explicit permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. Always obtain written authorisation before conducting any security assessments. INJECT, EXFIL, ESCALATE, and PERSIST subsystems require --confirm-destroy clearance. Apache License 2.0.
Every existing MCP security tool focuses on agents connecting to potentially malicious servers. ROGUE is the first tool to weaponise the server itself — spawning a fully functional MCP JSON-RPC 2.0 server that attacks the agent from a position of unconditional trust. Real protocol. Real server. Real exploitation.
Export every ROGUE finding directly to your SIEM. One flag. Native format translation. Ed25519 signatures and RFC 3161 timestamps preserved across every export.
rogue spawn --persona weather --transport sse --export-siem splunk