T156  ·  L54  ·  AI Inference Infrastructure RCE

SPECTER SHADOWMQ

Four CVSS 9.8 vulnerabilities. One tool. SGLang's ZMQ backend deserialises your pickle payload without authentication. Send bytes to port 30001. The model server executes your command. SHADOWMQ also exploits Jinja2 SSTI via GGUF chat templates, vLLM's FFmpeg heap overflow via multimodal video URLs, and pivots through GPU clusters to full AI infrastructure takeover.

381
Tests
4
CVSS 9.8 CVEs
5
WMD Classes
10
Subsystems
Documentation GitHub

CVE Coverage

CVE-2026-3059
CVSS 9.8 CRITICAL

SGLang ZMQ backend. Port tcp://*:30001. Unauthenticated pickle deserialisation. pickle.__reduce__ executes arbitrary Python. os.system / subprocess / revshell / beacon / obfuscated variants. Two-phase send+read for output capture.

CVE-2026-3060
CVSS 9.8 CRITICAL

SGLang encoder ZMQ backend. Port tcp://*:30002. Same pickle deserialisation primitive as CVE-2026-3059. Separate attack surface — encoder process runs with its own privilege context.

CVE-2026-5760
CVSS 9.8 CRITICAL

SGLang /v1/rerank. GGUF chat_template Jinja2 SSTI. 8 variants: __subclasses__() RCE / lipsum.__globals__.os / cycler.__init__.__globals__ / joiner.__init__.__globals__ / namespace.__init__.__globals__ / config / ospopen / import subprocess. No authentication required.

CVE-2026-22778
CVSS 9.8 CRITICAL

vLLM multimodal endpoint. Attacker-controlled video URL fetched by FFmpeg. JPEG2000 decoder heap overflow triggers RCE. file:// SSRF enables IMDSv1 metadata harvest and GCP metadata endpoint pivot. No GPU required on attacker side.

CWE-918 Ollama SSRF
HIGH

Ollama /api/pull endpoint issues HTTP requests to arbitrary URLs. Pivot to IMDSv1 (169.254.169.254), GCP metadata (metadata.google.internal), and internal network services. No authentication on default Ollama deployment.

llama.cpp Path Traversal
HIGH

llama.cpp /v1/models/load parameter allows path traversal. Read /etc/passwd, SSH private keys, API credential files. File content returned in model load error response. Common in misconfigured local deployments.

Subsystems

SURVEY-INFERENCE-INFRA OPEN

20-port inference service probe (Ollama:11434, SGLang:30000+30001+30002, vLLM:8000, LiteLLM:8080, llama.cpp:8888, MLflow:5000, Ray:8265, TGI:80, Triton:8000). Banner fingerprint, version extraction, CVE surface map, attack surface score 0–100.

PROBE-ZMQ-EXPOSURE INJECT

TCP connect to tcp://*:30001 and tcp://*:30002. ZMQ handshake initiation. pickle __reduce__ canary probe (benign subprocess.call(["true"])). Latency jitter measurement. Exposure confidence score 0.0–1.0. SHADOWMQ_INJECT_KEY required.

EXPLOIT-ZMQ-PICKLE INJECT

CVE-2026-3059. Build pickle payload via os.system / subprocess.check_output / reverse shell / beacon / obfuscated variants. Send to ZMQ socket. Phase 2: read output socket for command response. SMQ-signed result with rce_confirmed flag.

EXPLOIT-ENCODER-ZMQ INJECT

CVE-2026-3060. Same pickle primitive against encoder ZMQ port 30002. os.system / subprocess / beacon variants. Encoder process may have elevated privileges for GPU memory operations.

EXPLOIT-JINJA2-SSTI INJECT

CVE-2026-5760. POST /v1/rerank with malicious GGUF chat_template containing Jinja2 payload. 8 variants targeting different Jinja2 global namespaces. Response parsing for RCE output. Severity scales with SGLang process privileges.

EXPLOIT-VLLM-VIDEO INJECT

CVE-2026-22778. POST /v1/chat/completions with video URL pointing to crafted JPEG2000 file. FFmpeg fetch + decode triggers heap overflow. file:// SSRF probe to IMDSv1/GCP metadata. Returns harvested cloud credentials.

POST-EXPLOIT-HARVEST UNLEASHED

Model weight enumeration (find *.bin *.safetensors *.gguf). API key extraction from environment variables, config files, and process memory. GPU cluster topology (Ray node list / Slurm sinfo / K8s node describe). Ollama SSRF + llama.cpp path traversal.

PIVOT-GPU-CLUSTER UNLEASHED

Ray: submit job with num_cpus=0 to every node. Slurm: sbatch --ntasks-per-node=1 --nodes=ALL. Kubernetes: deploy privileged DaemonSet to all nodes. Pivot command executed on every GPU worker in the cluster.

PERSIST-INFERENCE-HOOK DESTROY

HOOK-CRON: @reboot + */15 * * * * crontab entry. HOOK-ZMQ: inject responder into ZMQ event loop. HOOK-API: FastAPI middleware to log/forward all inference requests. HOOK-MODEL: weight trigger in GGUF metadata. ROE "inference infrastructure persistence authorised" + --confirm-persistence required. Irreversible without filesystem access.

GENERATE-EXPLOIT INJECT

ARMORY HYBRID. Phase 1: DB lookup for matching CVE payloads from inference_infrastructure_rce category. Phase 2: DeepSeek R1:32b via Ollama synthesises novel exploit code for gaps. Returns ranked payload list with mutation variants.

Gate Architecture

GateKeyUnlocks
OPENNoneSURVEY-INFERENCE-INFRA, REPORT
INJECTSHADOWMQ_INJECT_KEYPROBE-ZMQ-EXPOSURE, EXPLOIT-ZMQ-PICKLE, EXPLOIT-ENCODER-ZMQ, EXPLOIT-JINJA2-SSTI, EXPLOIT-VLLM-VIDEO, GENERATE-EXPLOIT
UNLEASHEDSHADOWMQ_UNLEASHED_KEY + ROE "inference infrastructure exploitation authorised"POST-EXPLOIT-HARVEST, PIVOT-GPU-CLUSTER
DESTROYSHADOWMQ_DESTROY_KEY + ROE "inference infrastructure persistence authorised" + --confirm-persistencePERSIST-INFERENCE-HOOK (irreversible)

DESTROY gate installs persistent backdoors in the inference server's process space. Removing them requires direct filesystem access to the deployment host. HOOK-MODEL modifies GGUF metadata — model weight files must be re-downloaded to remove. Requires explicit ROE phrase and --confirm-persistence flag.

WMD Classes

inference_server_rce ai_infrastructure_takeover shadow_mq_exploitation model_weight_theft inference_persistent_backdoor

Quick Start

pip install specter-shadowmq

# Survey inference infrastructure (OPEN)
specter-shadowmq survey --target 10.0.0.1

# Probe ZMQ exposure (INJECT)
export SHADOWMQ_INJECT_KEY=<key>
specter-shadowmq probe-zmq --target 10.0.0.1 --port 30001

# Exploit CVE-2026-3059 ZMQ pickle RCE (INJECT)
specter-shadowmq exploit-zmq \
  --target 10.0.0.1 \
  --command "id" \
  --session-id <SMQ-SID>

# Exploit CVE-2026-5760 Jinja2 SSTI (INJECT)
specter-shadowmq exploit-ssti \
  --target http://sglang-host:30000 \
  --variant subclasses \
  --session-id <SMQ-SID>

# Generate report
specter-shadowmq report --session-id <SMQ-SID>

UNLEASHED ROE file must contain: inference infrastructure exploitation authorised

DESTROY ROE file must contain: inference infrastructure persistence authorised

Defensive Pairing

Defensive pair: M172 COGNITIVE INTEGRITY SENTINEL. Detects ZMQ pickle deserialisation attempts, Jinja2 SSTI patterns in GGUF metadata, anomalous FFmpeg invocations from model serving processes, and IMDS/metadata endpoint access from inference server network namespaces.

MITRE ATT&CK: T1059 (Command and Scripting Interpreter) / T1190 (Exploit Public-Facing Application) / T1552 (Unsecured Credentials) / T1543 (Create or Modify System Process) / T1046 (Network Service Discovery)

MITRE ATLAS: AML.T0043 (Craft Adversarial Data) / AML.T0056 (LLM Jailbreak) / AML.T0040 (ML Model Inference API Access)