Red Specter FOUNDRY

Inference Server Exploitation Engine — 9 subsystems targeting vLLM, Ollama, SGLang, Triton, and llama.cpp. CVE-2026-5760 CVSS 9.8.

v1.0.0 — 300 Tests — Tool 55
Contents
Overview Installation Quick Start All 9 Subsystems Subsystem Details UNLEASHED Gate CVE Index WARLORD Integration Troubleshooting Disclaimer

Overview

Red Specter FOUNDRY is an inference server exploitation engine. It targets the self-hosted AI inference layer that most security teams overlook entirely: vLLM, Ollama, SGLang, Triton Inference Server, and llama.cpp. These servers run in production environments — Kubernetes clusters, internal networks, GPU workstations — with no authentication, no model integrity checks, and no purpose-built security tooling.

FOUNDRY provides 9 subsystems under a single CLI (foundry), 300 tests, and Ed25519-signed WARLORD-compatible reports. Every finding maps to a specific CVE or disclosure. Every exploit chain is implemented directly — no wrapper scripts, no misconfiguration checklists.

FOUNDRY is Tool 55 of the Red Specter NIGHTFALL offensive framework (59 tools). It feeds directly into WARLORD autonomous campaigns. PERSIST subsystem produces lateral movement foothold data consumed by subsequent tools in the pipeline.

Installation

From Source

$ cd /home/richard/projects/red-specter-foundry/
$ pip install -e .
$ foundry --version
FOUNDRY v1.0.0 — Red Specter Security Research Ltd

Requirements

Quick Start

Surface Scan

Fingerprint a running inference server and enumerate its attack surface:

$ foundry scan --target http://localhost:11434 --deep

GGUF Probe

Generate and deliver a weaponised GGUF file containing a Jinja2 RCE payload (CVE-2026-5760). Requires UNLEASHED override:

$ foundry gguf --model models/llama3.gguf --target http://sglang.internal:30000 --override

vLLM Timing Probe

Test for PagedAttention cross-tenant timing oracle across concurrent sessions. Requires UNLEASHED override:

$ foundry vllm-probe --target http://vllm.internal:8000 --sessions 10 --override

All 9 Subsystems

#SubsystemCommandWhat It Does
01SCANfoundry scanFingerprint inference server, enumerate attack surface, produce prioritised finding list
02GGUFfoundry ggufWeaponise GGUF files with Jinja2 RCE payload — CVE-2026-5760 CVSS 9.8
03OLLAMA_AUDITfoundry ollama-auditTest unauthenticated model pull, copy, push, delete on Ollama API
04TRITONfoundry tritonCraft malicious TensorRT engine for deserialization RCE on GPU host
05VLLM_PROBEfoundry vllm-probeExploit PagedAttention timing side-channel to extract cross-tenant prompts
06KVCACHEfoundry kvcacheTest KV cache isolation — cross-request context window bleed
07SPECDECODEfoundry specdecodePoison speculative decode cache to influence future cross-session completions
08PERSISTfoundry persistEstablish post-exploitation persistence — model hooks, container escape, K8s lateral movement
09REPORTfoundry reportGenerate Ed25519-signed, SHA-256-hashed JSON + Markdown report with CVE mapping

Subsystem Details & CLI Reference

01 SCAN PASSIVE — ANALYSIS foundry scan

Maps the inference server attack surface. Fingerprints running servers (vLLM, Ollama, SGLang, Triton, llama.cpp), open ports, loaded models, API versions, and auth configuration. No attack payloads are sent — SCAN is entirely passive enumeration. Produces a prioritised finding list consumed by subsequent subsystems.

$ foundry scan --target <URL> [--port <PORT>] [--deep]

--target, -t Target URL or IP address [required]
--port, -p Override port [optional — auto-detected if omitted]
--deep Run deep scan: enumerate all API routes and loaded models
--output, -o Output directory for scan JSON [default: reports/]
02 GGUF UNLEASHED --override foundry gguf

Generates weaponised GGUF model files containing malicious Jinja2 chat_template payloads. When the target inference server loads the GGUF file, the Jinja2 template executes attacker-controlled Python on the inference host. Implements CVE-2026-5760 (CVSS 9.8) against SGLang and any other server that processes GGUF chat_template fields without sanitisation.

$ foundry gguf --model <path> [--target <URL>] [--override]

--model, -m Path to base GGUF file to weaponise [required]
--target, -t Target inference server URL for staged delivery [optional]
--payload Custom Jinja2 payload string [default: reverse shell template]
--output, -o Output path for weaponised GGUF [default: reports/foundry_weaponised.gguf]
--override UNLEASHED: required to execute
03 OLLAMA_AUDIT PASSIVE + ACTIVE foundry ollama-audit

Tests Ollama API endpoints for unauthenticated access to model management operations. Maps all accessible models and identifies paths for exfiltration to attacker-controlled registries. Passive mode enumerates endpoints; active mode tests pull, copy, push, and delete operations.

$ foundry ollama-audit --target <URL>

--target, -t Ollama server URL [required]
--active Run active tests (pull/copy/delete) as well as passive enumeration
--registry Attacker-controlled registry URL for copy test [optional]
--output, -o Output directory [default: reports/]
04 TRITON UNLEASHED --override foundry triton

Crafts malicious TensorRT engine files and tests Triton Inference Server model repository paths for unsigned load operations. Delivers a deserialization payload that achieves arbitrary code execution on the GPU host during model load. Triton loads TensorRT engines without integrity verification by default.

$ foundry triton --target <URL> [--override]

--target, -t Triton Inference Server URL [required]
--model-repo Path to Triton model repository [optional]
--payload Custom RCE payload for TensorRT engine [optional]
--output, -o Output directory [default: reports/]
--override UNLEASHED: required to craft and deliver engine
05 VLLM_PROBE UNLEASHED --override foundry vllm-probe

Exploits vLLM's PagedAttention memory allocator timing side-channel to extract prompt and completion fragments from co-located tenant sessions. Runs multiple concurrent inference requests with statistical timing analysis to detect and exploit cross-tenant memory access patterns.

$ foundry vllm-probe --target <URL> [--sessions <N>] [--override]

--target, -t vLLM server URL [required]
--sessions, -s Number of concurrent sessions for timing analysis [default: 10]
--rounds Statistical sampling rounds [default: 100]
--output, -o Output directory [default: reports/]
--override UNLEASHED: required to execute timing analysis
06 KVCACHE PASSIVE + ACTIVE foundry kvcache

Tests KV cache isolation boundaries in shared inference deployments. Sends crafted requests designed to probe whether key-value cache entries from one request context are accessible to subsequent requests from a different session. Identifies cross-request cache bleeding that leaks context window fragments.

$ foundry kvcache --target <URL>

--target, -t Inference server URL [required]
--model, -m Model name to test [optional]
--depth Cache probe depth (token sequences to test) [default: 50]
--output, -o Output directory [default: reports/]
07 SPECDECODE UNLEASHED --override foundry specdecode

Tests speculative decode cache integrity across inference sessions. Delivers crafted draft model completions designed to persist in the speculative decode cache and influence future cache-hit responses from separate sessions. Targets SGLang and vLLM speculative decoding implementations.

$ foundry specdecode --target <URL> [--override]

--target, -t Inference server URL [required]
--model, -m Model name [optional]
--poison Poison payload string to inject into draft cache [optional]
--verify Verify poison persistence across separate sessions [default: true]
--output, -o Output directory [default: reports/]
--override UNLEASHED: required to execute cache poisoning
08 PERSIST UNLEASHED --override --confirm-destroy foundry persist

Establishes post-exploitation persistence on compromised inference hosts. Requires a prior code execution foothold (e.g. from GGUF or TRITON). Implements model hook injection for persistent access, container escape via GPU driver API exposure, and Kubernetes service account credential harvest for cluster-wide lateral movement.

$ foundry persist --target <URL> --override --confirm-destroy

--target, -t Target inference host URL [required]
--method Persistence method: hook | escape | k8s-harvest [default: hook]
--output, -o Output directory [default: reports/]
--override UNLEASHED: required
--confirm-destroy UNLEASHED: confirms destructive live execution
09 REPORT ALL MODES foundry report

Generates Ed25519-signed, SHA-256-hashed reports from all subsystem output. Produces JSON (WARLORD-compatible) and Markdown formats. Every finding includes CVE mapping, CVSS score, affected server/model, exploit chain description, and remediation recommendation.

$ foundry report --input <scan.json> [--format md|json]

--input, -i Input scan JSON from any subsystem [required]
--format, -f Output format: md, json, or both [default: both]
--sign Ed25519 sign the report [default: true]
--keys-dir Path to Ed25519 keys directory [optional]
--output, -o Output path [default: reports/foundry-report-<timestamp>]

FOUNDRY UNLEASHED

Cryptographic override. Private key controlled. One operator. Founder's machine only.

Four subsystems are gated behind UNLEASHED: GGUF, TRITON, VLLM_PROBE, and SPECDECODE. A fifth, PERSIST, requires both --override and --confirm-destroy.

Standard Mode SCAN + OLLAMA_AUDIT + KVCACHE + REPORT. Passive enumeration and safe auditing. No destructive actions, no exploit delivery.
UNLEASHED Mode Activates GGUF, TRITON, VLLM_PROBE, SPECDECODE. Requires Ed25519 private key + signed scope file specifying authorised target.
PERSIST (Destroy) Additionally requires --confirm-destroy. Live post-exploitation. Writes to target. Container escape and K8s harvest.

CVE Index

Every finding FOUNDRY produces maps to a specific CVE or disclosure identifier:

CVE / IDDescriptionSubsystemCVSS
CVE-2026-5760 SGLang GGUF Jinja2 Template Injection — Remote Code Execution GGUF 9.8 CRITICAL
OLLAMA-NOAUTH Ollama API unauthenticated model access — all versions, all endpoints OLLAMA_AUDIT 8.6 HIGH
VLLM-TIMING-001 vLLM PagedAttention cross-tenant timing oracle — prompt/completion extraction VLLM_PROBE 7.5 HIGH
KUBEAI-RBAC-001 KubeAI RBAC misconfiguration — service account escalation to cluster-admin PERSIST 8.8 HIGH

WARLORD Integration

FOUNDRY is registered in the WARLORD autonomous campaign registry as Tool 55. FOUNDRY findings are exported in WARLORD-compatible JSON schema, enabling orchestration within multi-tool autonomous campaigns.

Running FOUNDRY via WARLORD

$ warlord --tool foundry --target http://ai-infra.internal --deep

Report Schema (WARLORD-Compatible)

The FOUNDRY JSON report schema includes the following top-level fields:

Troubleshooting

SCAN returns no server detected

The inference server may be running on a non-default port or behind a reverse proxy. Use --port to specify the port explicitly. Common inference server ports: Ollama 11434, vLLM 8000, Triton HTTP 8000, Triton gRPC 8001, SGLang 30000.

GGUF returns "UNLEASHED key not found"

The Ed25519 private key is not in the expected location. Ensure ~/.redspecter/keys/foundry.key exists and matches the registered public key. Scope file must be present and signed by the same key: ~/.redspecter/scope/foundry-scope.json.

VLLM_PROBE timing analysis shows no signal

Timing side-channels require multiple concurrent sessions and many sampling rounds to produce statistically significant results. Increase --sessions to 20+ and --rounds to 500+. Results are only meaningful on shared multi-tenant vLLM deployments — single-user deployments will show no cross-tenant signal by definition.

OLLAMA_AUDIT shows authentication on all endpoints

The Ollama instance has been configured with a reverse proxy or custom auth middleware. This is the expected hardened state. OLLAMA_AUDIT will report the auth configuration as a positive finding (no vulnerability). Check whether the proxy strips auth headers selectively by testing specific endpoint paths directly.

REPORT fails to sign

Signing requires the cryptography package and a valid Ed25519 private key. Run foundry report --no-sign to generate an unsigned report. Unsigned reports are not WARLORD-compatible and will be rejected by WARLORD ingestion.

Disclaimer

Red Specter FOUNDRY is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any FOUNDRY subsystem against a target. UNLEASHED subsystems (GGUF, TRITON, VLLM_PROBE, SPECDECODE, PERSIST) perform active exploitation and may cause service disruption or data modification on the target system. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse. Apache License 2.0.