Red Specter FOUNDRY

Inference Server Exploitation Engine — 9 subsystems targeting vLLM, Ollama, SGLang, Triton, and llama.cpp. CVE-2026-5760 CVSS 9.8.

v1.0.0 — 300 Tests — Tool 55

Contents

Overview Installation Quick Start All 9 Subsystems Subsystem Details UNLEASHED Gate CVE Index WARLORD Integration Troubleshooting Disclaimer

Overview

Red Specter FOUNDRY is an inference server exploitation engine. It targets the self-hosted AI inference layer that most security teams overlook entirely: vLLM, Ollama, SGLang, Triton Inference Server, and llama.cpp. These servers run in production environments — Kubernetes clusters, internal networks, GPU workstations — with no authentication, no model integrity checks, and no purpose-built security tooling.

FOUNDRY provides 9 subsystems under a single CLI (foundry), 300 tests, and Ed25519-signed WARLORD-compatible reports. Every finding maps to a specific CVE or disclosure. Every exploit chain is implemented directly — no wrapper scripts, no misconfiguration checklists.

FOUNDRY is Tool 55 of the Red Specter NIGHTFALL offensive framework (59 tools). It feeds directly into WARLORD autonomous campaigns. PERSIST subsystem produces lateral movement foothold data consumed by subsequent tools in the pipeline.

Installation

From Source

$ cd /home/richard/projects/red-specter-foundry/

$ pip install -e .

$ foundry --version

FOUNDRY v1.0.0 — Red Specter Security Research Ltd

Requirements

Python 3.11+
httpx — async HTTP client
typer — CLI framework
rich — terminal output and progress
pydantic — data validation
cryptography — Ed25519 signing
UNLEASHED: Ed25519 private key + signed scope file

Quick Start

Surface Scan

Fingerprint a running inference server and enumerate its attack surface:

$ foundry scan --target http://localhost:11434 --deep

GGUF Probe

Generate and deliver a weaponised GGUF file containing a Jinja2 RCE payload (CVE-2026-5760). Requires UNLEASHED override:

$ foundry gguf --model models/llama3.gguf --target http://sglang.internal:30000 --override
    

vLLM Timing Probe

Test for PagedAttention cross-tenant timing oracle across concurrent sessions. Requires UNLEASHED override:

$ foundry vllm-probe --target http://vllm.internal:8000 --sessions 10 --override
    

All 9 Subsystems

#	Subsystem	Command	What It Does
01	SCAN	foundry scan	Fingerprint inference server, enumerate attack surface, produce prioritised finding list
02	GGUF	foundry gguf	Weaponise GGUF files with Jinja2 RCE payload — CVE-2026-5760 CVSS 9.8
03	OLLAMA_AUDIT	foundry ollama-audit	Test unauthenticated model pull, copy, push, delete on Ollama API
04	TRITON	foundry triton	Craft malicious TensorRT engine for deserialization RCE on GPU host
05	VLLM_PROBE	foundry vllm-probe	Exploit PagedAttention timing side-channel to extract cross-tenant prompts
06	KVCACHE	foundry kvcache	Test KV cache isolation — cross-request context window bleed
07	SPECDECODE	foundry specdecode	Poison speculative decode cache to influence future cross-session completions
08	PERSIST	foundry persist	Establish post-exploitation persistence — model hooks, container escape, K8s lateral movement
09	REPORT	foundry report	Generate Ed25519-signed, SHA-256-hashed JSON + Markdown report with CVE mapping

Subsystem Details & CLI Reference

01 SCAN PASSIVE — ANALYSIS foundry scan

Maps the inference server attack surface. Fingerprints running servers (vLLM, Ollama, SGLang, Triton, llama.cpp), open ports, loaded models, API versions, and auth configuration. No attack payloads are sent — SCAN is entirely passive enumeration. Produces a prioritised finding list consumed by subsequent subsystems.

$ foundry scan --target <URL> [--port <PORT>] [--deep]

  --target, -t    Target URL or IP address [required]

  --port, -p      Override port [optional — auto-detected if omitted]

  --deep          Run deep scan: enumerate all API routes and loaded models

  --output, -o    Output directory for scan JSON [default: reports/]

Server identification: vLLM, Ollama, SGLang, Triton, llama.cpp via response header fingerprinting
Open port enumeration on common inference ports: 11434, 8000, 8080, 8001, 8002
Authentication state: Bearer token, API key, or no auth detected
Loaded model enumeration via /api/tags, /v1/models, /v2/models

02 GGUF UNLEASHED --override foundry gguf

Generates weaponised GGUF model files containing malicious Jinja2 chat_template payloads. When the target inference server loads the GGUF file, the Jinja2 template executes attacker-controlled Python on the inference host. Implements CVE-2026-5760 (CVSS 9.8) against SGLang and any other server that processes GGUF chat_template fields without sanitisation.

$ foundry gguf --model <path> [--target <URL>] [--override]

  --model, -m     Path to base GGUF file to weaponise [required]

  --target, -t    Target inference server URL for staged delivery [optional]

  --payload       Custom Jinja2 payload string [default: reverse shell template]

  --output, -o    Output path for weaponised GGUF [default: reports/foundry_weaponised.gguf]

  --override      UNLEASHED: required to execute

CVE-2026-5760 — CVSS 9.8 — Actively exploited
Jinja2 template injection via GGUF chat_template metadata field
No authentication required on default SGLang model load endpoint
Produces standalone weaponised GGUF for offline staging or live delivery

03 OLLAMA_AUDIT PASSIVE + ACTIVE foundry ollama-audit

Tests Ollama API endpoints for unauthenticated access to model management operations. Maps all accessible models and identifies paths for exfiltration to attacker-controlled registries. Passive mode enumerates endpoints; active mode tests pull, copy, push, and delete operations.

$ foundry ollama-audit --target <URL>

  --target, -t    Ollama server URL [required]

  --active        Run active tests (pull/copy/delete) as well as passive enumeration

  --registry      Attacker-controlled registry URL for copy test [optional]

  --output, -o    Output directory [default: reports/]

Tests: /api/tags (model list), /api/pull, /api/copy, /api/delete, /api/push
Enumerates all loaded models and their accessible paths
Tests model copy to attacker-controlled registry (requires --registry + --active)

04 TRITON UNLEASHED --override foundry triton

Crafts malicious TensorRT engine files and tests Triton Inference Server model repository paths for unsigned load operations. Delivers a deserialization payload that achieves arbitrary code execution on the GPU host during model load. Triton loads TensorRT engines without integrity verification by default.

$ foundry triton --target <URL> [--override]

  --target, -t    Triton Inference Server URL [required]

  --model-repo    Path to Triton model repository [optional]

  --payload       Custom RCE payload for TensorRT engine [optional]

  --output, -o    Output directory [default: reports/]

  --override      UNLEASHED: required to craft and deliver engine

05 VLLM_PROBE UNLEASHED --override foundry vllm-probe

Exploits vLLM's PagedAttention memory allocator timing side-channel to extract prompt and completion fragments from co-located tenant sessions. Runs multiple concurrent inference requests with statistical timing analysis to detect and exploit cross-tenant memory access patterns.

$ foundry vllm-probe --target <URL> [--sessions <N>] [--override]

  --target, -t    vLLM server URL [required]

  --sessions, -s  Number of concurrent sessions for timing analysis [default: 10]

  --rounds        Statistical sampling rounds [default: 100]

  --output, -o    Output directory [default: reports/]

  --override      UNLEASHED: required to execute timing analysis

06 KVCACHE PASSIVE + ACTIVE foundry kvcache

Tests KV cache isolation boundaries in shared inference deployments. Sends crafted requests designed to probe whether key-value cache entries from one request context are accessible to subsequent requests from a different session. Identifies cross-request cache bleeding that leaks context window fragments.

$ foundry kvcache --target <URL>

  --target, -t    Inference server URL [required]

  --model, -m     Model name to test [optional]

  --depth         Cache probe depth (token sequences to test) [default: 50]

  --output, -o    Output directory [default: reports/]

07 SPECDECODE UNLEASHED --override foundry specdecode

Tests speculative decode cache integrity across inference sessions. Delivers crafted draft model completions designed to persist in the speculative decode cache and influence future cache-hit responses from separate sessions. Targets SGLang and vLLM speculative decoding implementations.

$ foundry specdecode --target <URL> [--override]

  --target, -t    Inference server URL [required]

  --model, -m     Model name [optional]

  --poison        Poison payload string to inject into draft cache [optional]

  --verify        Verify poison persistence across separate sessions [default: true]

  --output, -o    Output directory [default: reports/]

  --override      UNLEASHED: required to execute cache poisoning

08 PERSIST UNLEASHED --override --confirm-destroy foundry persist

Establishes post-exploitation persistence on compromised inference hosts. Requires a prior code execution foothold (e.g. from GGUF or TRITON). Implements model hook injection for persistent access, container escape via GPU driver API exposure, and Kubernetes service account credential harvest for cluster-wide lateral movement.

$ foundry persist --target <URL> --override --confirm-destroy

  --target, -t        Target inference host URL [required]

  --method            Persistence method: hook | escape | k8s-harvest [default: hook]

  --output, -o        Output directory [default: reports/]

  --override          UNLEASHED: required

  --confirm-destroy   UNLEASHED: confirms destructive live execution

hook — Injects persistence hook into model serving process
escape — Container escape via exposed GPU driver API (CUDA device file)
k8s-harvest — Extract K8s service account tokens for cluster lateral movement

09 REPORT ALL MODES foundry report

Generates Ed25519-signed, SHA-256-hashed reports from all subsystem output. Produces JSON (WARLORD-compatible) and Markdown formats. Every finding includes CVE mapping, CVSS score, affected server/model, exploit chain description, and remediation recommendation.

$ foundry report --input <scan.json> [--format md|json]

  --input, -i     Input scan JSON from any subsystem [required]

  --format, -f    Output format: md, json, or both [default: both]

  --sign          Ed25519 sign the report [default: true]

  --keys-dir      Path to Ed25519 keys directory [optional]

  --output, -o    Output path [default: reports/foundry-report-<timestamp>]

JSON report includes: CVE IDs, CVSS scores, affected server/model, exact exploit chain, remediation
Ed25519 signature over SHA-256 hash of report content
WARLORD-compatible schema — ingestible by WARLORD autonomous campaign engine

FOUNDRY UNLEASHED

Cryptographic override. Private key controlled. One operator. Founder's machine only.

Four subsystems are gated behind UNLEASHED: GGUF, TRITON, VLLM_PROBE, and SPECDECODE. A fifth, PERSIST, requires both --override and --confirm-destroy.

Standard Mode SCAN + OLLAMA_AUDIT + KVCACHE + REPORT. Passive enumeration and safe auditing. No destructive actions, no exploit delivery.

UNLEASHED Mode Activates GGUF, TRITON, VLLM_PROBE, SPECDECODE. Requires Ed25519 private key + signed scope file specifying authorised target.

PERSIST (Destroy) Additionally requires --confirm-destroy. Live post-exploitation. Writes to target. Container escape and K8s harvest.

CVE Index

Every finding FOUNDRY produces maps to a specific CVE or disclosure identifier:

CVE / ID	Description	Subsystem	CVSS
CVE-2026-5760	SGLang GGUF Jinja2 Template Injection — Remote Code Execution	GGUF	9.8 CRITICAL
OLLAMA-NOAUTH	Ollama API unauthenticated model access — all versions, all endpoints	OLLAMA_AUDIT	8.6 HIGH
VLLM-TIMING-001	vLLM PagedAttention cross-tenant timing oracle — prompt/completion extraction	VLLM_PROBE	7.5 HIGH
KUBEAI-RBAC-001	KubeAI RBAC misconfiguration — service account escalation to cluster-admin	PERSIST	8.8 HIGH

WARLORD Integration

FOUNDRY is registered in the WARLORD autonomous campaign registry as Tool 55. FOUNDRY findings are exported in WARLORD-compatible JSON schema, enabling orchestration within multi-tool autonomous campaigns.

Running FOUNDRY via WARLORD

$ warlord --tool foundry --target http://ai-infra.internal --deep

Report Schema (WARLORD-Compatible)

The FOUNDRY JSON report schema includes the following top-level fields:

tool — "FOUNDRY" with version and tool number (55)
target — enumerated inference server and loaded models
findings — array of findings with CVE, CVSS, subsystem, impact, and remediation
signature — Ed25519 signature over SHA-256 of findings array
warlord_compatible — true — schema validated for WARLORD ingestion

Troubleshooting

SCAN returns no server detected

The inference server may be running on a non-default port or behind a reverse proxy. Use --port to specify the port explicitly. Common inference server ports: Ollama 11434, vLLM 8000, Triton HTTP 8000, Triton gRPC 8001, SGLang 30000.

GGUF returns "UNLEASHED key not found"

The Ed25519 private key is not in the expected location. Ensure ~/.redspecter/keys/foundry.key exists and matches the registered public key. Scope file must be present and signed by the same key: ~/.redspecter/scope/foundry-scope.json.

VLLM_PROBE timing analysis shows no signal

Timing side-channels require multiple concurrent sessions and many sampling rounds to produce statistically significant results. Increase --sessions to 20+ and --rounds to 500+. Results are only meaningful on shared multi-tenant vLLM deployments — single-user deployments will show no cross-tenant signal by definition.

OLLAMA_AUDIT shows authentication on all endpoints

The Ollama instance has been configured with a reverse proxy or custom auth middleware. This is the expected hardened state. OLLAMA_AUDIT will report the auth configuration as a positive finding (no vulnerability). Check whether the proxy strips auth headers selectively by testing specific endpoint paths directly.

REPORT fails to sign

Signing requires the cryptography package and a valid Ed25519 private key. Run foundry report --no-sign to generate an unsigned report. Unsigned reports are not WARLORD-compatible and will be rejected by WARLORD ingestion.

Disclaimer

Red Specter FOUNDRY is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any FOUNDRY subsystem against a target. UNLEASHED subsystems (GGUF, TRITON, VLLM_PROBE, SPECDECODE, PERSIST) perform active exploitation and may cause service disruption or data modification on the target system. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse. Apache License 2.0.