Four CVSS 9.8 vulnerabilities. One tool. SGLang's ZMQ backend deserialises your pickle payload without authentication. Send bytes to port 30001. The model server executes your command. SHADOWMQ also exploits Jinja2 SSTI via GGUF chat templates, vLLM's FFmpeg heap overflow via multimodal video URLs, and pivots through GPU clusters to full AI infrastructure takeover.
SGLang ZMQ backend. Port tcp://*:30001. Unauthenticated pickle deserialisation. pickle.__reduce__ executes arbitrary Python. os.system / subprocess / revshell / beacon / obfuscated variants. Two-phase send+read for output capture.
SGLang encoder ZMQ backend. Port tcp://*:30002. Same pickle deserialisation primitive as CVE-2026-3059. Separate attack surface — encoder process runs with its own privilege context.
SGLang /v1/rerank. GGUF chat_template Jinja2 SSTI. 8 variants: __subclasses__() RCE / lipsum.__globals__.os / cycler.__init__.__globals__ / joiner.__init__.__globals__ / namespace.__init__.__globals__ / config / ospopen / import subprocess. No authentication required.
vLLM multimodal endpoint. Attacker-controlled video URL fetched by FFmpeg. JPEG2000 decoder heap overflow triggers RCE. file:// SSRF enables IMDSv1 metadata harvest and GCP metadata endpoint pivot. No GPU required on attacker side.
Ollama /api/pull endpoint issues HTTP requests to arbitrary URLs. Pivot to IMDSv1 (169.254.169.254), GCP metadata (metadata.google.internal), and internal network services. No authentication on default Ollama deployment.
llama.cpp /v1/models/load parameter allows path traversal. Read /etc/passwd, SSH private keys, API credential files. File content returned in model load error response. Common in misconfigured local deployments.
20-port inference service probe (Ollama:11434, SGLang:30000+30001+30002, vLLM:8000, LiteLLM:8080, llama.cpp:8888, MLflow:5000, Ray:8265, TGI:80, Triton:8000). Banner fingerprint, version extraction, CVE surface map, attack surface score 0–100.
TCP connect to tcp://*:30001 and tcp://*:30002. ZMQ handshake initiation. pickle __reduce__ canary probe (benign subprocess.call(["true"])). Latency jitter measurement. Exposure confidence score 0.0–1.0. SHADOWMQ_INJECT_KEY required.
CVE-2026-3059. Build pickle payload via os.system / subprocess.check_output / reverse shell / beacon / obfuscated variants. Send to ZMQ socket. Phase 2: read output socket for command response. SMQ-signed result with rce_confirmed flag.
CVE-2026-3060. Same pickle primitive against encoder ZMQ port 30002. os.system / subprocess / beacon variants. Encoder process may have elevated privileges for GPU memory operations.
CVE-2026-5760. POST /v1/rerank with malicious GGUF chat_template containing Jinja2 payload. 8 variants targeting different Jinja2 global namespaces. Response parsing for RCE output. Severity scales with SGLang process privileges.
CVE-2026-22778. POST /v1/chat/completions with video URL pointing to crafted JPEG2000 file. FFmpeg fetch + decode triggers heap overflow. file:// SSRF probe to IMDSv1/GCP metadata. Returns harvested cloud credentials.
Model weight enumeration (find *.bin *.safetensors *.gguf). API key extraction from environment variables, config files, and process memory. GPU cluster topology (Ray node list / Slurm sinfo / K8s node describe). Ollama SSRF + llama.cpp path traversal.
Ray: submit job with num_cpus=0 to every node. Slurm: sbatch --ntasks-per-node=1 --nodes=ALL. Kubernetes: deploy privileged DaemonSet to all nodes. Pivot command executed on every GPU worker in the cluster.
HOOK-CRON: @reboot + */15 * * * * crontab entry. HOOK-ZMQ: inject responder into ZMQ event loop. HOOK-API: FastAPI middleware to log/forward all inference requests. HOOK-MODEL: weight trigger in GGUF metadata. ROE "inference infrastructure persistence authorised" + --confirm-persistence required. Irreversible without filesystem access.
ARMORY HYBRID. Phase 1: DB lookup for matching CVE payloads from inference_infrastructure_rce category. Phase 2: DeepSeek R1:32b via Ollama synthesises novel exploit code for gaps. Returns ranked payload list with mutation variants.
| Gate | Key | Unlocks |
|---|---|---|
OPEN | None | SURVEY-INFERENCE-INFRA, REPORT |
INJECT | SHADOWMQ_INJECT_KEY | PROBE-ZMQ-EXPOSURE, EXPLOIT-ZMQ-PICKLE, EXPLOIT-ENCODER-ZMQ, EXPLOIT-JINJA2-SSTI, EXPLOIT-VLLM-VIDEO, GENERATE-EXPLOIT |
UNLEASHED | SHADOWMQ_UNLEASHED_KEY + ROE "inference infrastructure exploitation authorised" | POST-EXPLOIT-HARVEST, PIVOT-GPU-CLUSTER |
DESTROY | SHADOWMQ_DESTROY_KEY + ROE "inference infrastructure persistence authorised" + --confirm-persistence | PERSIST-INFERENCE-HOOK (irreversible) |
DESTROY gate installs persistent backdoors in the inference server's process space. Removing them requires direct filesystem access to the deployment host. HOOK-MODEL modifies GGUF metadata — model weight files must be re-downloaded to remove. Requires explicit ROE phrase and --confirm-persistence flag.
pip install specter-shadowmq # Survey inference infrastructure (OPEN) specter-shadowmq survey --target 10.0.0.1 # Probe ZMQ exposure (INJECT) export SHADOWMQ_INJECT_KEY=<key> specter-shadowmq probe-zmq --target 10.0.0.1 --port 30001 # Exploit CVE-2026-3059 ZMQ pickle RCE (INJECT) specter-shadowmq exploit-zmq \ --target 10.0.0.1 \ --command "id" \ --session-id <SMQ-SID> # Exploit CVE-2026-5760 Jinja2 SSTI (INJECT) specter-shadowmq exploit-ssti \ --target http://sglang-host:30000 \ --variant subclasses \ --session-id <SMQ-SID> # Generate report specter-shadowmq report --session-id <SMQ-SID>
UNLEASHED ROE file must contain: inference infrastructure exploitation authorised
DESTROY ROE file must contain: inference infrastructure persistence authorised
Defensive pair: M172 COGNITIVE INTEGRITY SENTINEL. Detects ZMQ pickle deserialisation attempts, Jinja2 SSTI patterns in GGUF metadata, anomalous FFmpeg invocations from model serving processes, and IMDS/metadata endpoint access from inference server network namespaces.
MITRE ATT&CK: T1059 (Command and Scripting Interpreter) / T1190 (Exploit Public-Facing Application) / T1552 (Unsecured Credentials) / T1543 (Create or Modify System Process) / T1046 (Network Service Discovery)
MITRE ATLAS: AML.T0043 (Craft Adversarial Data) / AML.T0056 (LLM Jailbreak) / AML.T0040 (ML Model Inference API Access)