Red Specter FOUNDRY — Inference Server Exploitation Engine

vLLM unauthenticated model pull / GGUF Jinja2 RCE CVE-2026-5760 CVSS 9.8 / Ollama model copy no auth / PagedAttention cross-tenant timing oracle / TensorRT engine deserialization / KubeAI RBAC escape to cluster-admin / Triton model repository manipulation / SGLang spec decode cache poisoning / llama.cpp HTTP server path traversal vLLM unauthenticated model pull / GGUF Jinja2 RCE CVE-2026-5760 CVSS 9.8 / Ollama model copy no auth / PagedAttention cross-tenant timing oracle / TensorRT engine deserialization / KubeAI RBAC escape to cluster-admin / Triton model repository manipulation / SGLang spec decode cache poisoning / llama.cpp HTTP server path traversal

The Problem

Inference Servers Are Production AI Infrastructure With Zero Security Tooling

Every major AI deployment runs an inference server. vLLM, Ollama, SGLang, Triton, llama.cpp — these are the layer between your model and your application. They are exposed on internal networks, Kubernetes clusters, and sometimes the open internet with no authentication, no integrity checks, and no security tooling designed for them. FOUNDRY attacks the inference layer directly.

GGUF Jinja2 RCE

SGLang model loading executes attacker-controlled Jinja2 templates embedded in GGUF chat_template field. No authentication required. Actively exploited. CVE-2026-5760, CVSS 9.8.

CVE-2026-5760 CVSS 9.8 UNAUTHENTICATED

Ollama Unauthenticated Model Pull/Copy

Ollama API exposes /api/pull and /api/copy without authentication by default. Attacker pulls proprietary models or copies them to attacker-controlled registry.

OLLAMA NO AUTH MODEL THEFT

PagedAttention Cross-Tenant Timing Oracle

vLLM's PagedAttention memory allocator leaks information across concurrent tenants via timing side-channel. In shared inference deployments, extracts fragments of other users' prompts and completions.

vLLM TIMING ORACLE CROSS-TENANT

TensorRT Engine Deserialization

Triton Inference Server loads TensorRT engine files without integrity verification. Crafted engine file achieves arbitrary code execution on the GPU host during model load.

TRITON DESERIALIZATION GPU HOST RCE

KubeAI RBAC Escape

KubeAI's model serving controller misconfigures Kubernetes RBAC, allowing inference workload pods to escalate to cluster-admin via service account token abuse.

KubeAI RBAC CLUSTER-ADMIN

Speculative Decode Cache Poisoning

SGLang and vLLM speculative decoding caches draft model outputs that can be poisoned to influence future completions across separate inference requests.

SPEC DECODE CACHE POISON PERSISTENT

9 Subsystems

The FOUNDRY Attack Surface

Nine subsystems. Each one attacks a different exposure surface of the self-hosted AI inference stack. SCAN maps the environment. Eight specialist subsystems exploit specific vulnerabilities in vLLM, Ollama, SGLang, Triton, and llama.cpp. REPORT produces Ed25519-signed, WARLORD-compatible evidence.

01 SCAN PASSIVE — ANALYSIS

Maps the inference server attack surface. Fingerprints running servers (vLLM, Ollama, SGLang, Triton, llama.cpp), open ports, loaded models, API versions, and auth configuration. Produces a prioritised finding list for subsequent subsystem targeting.

02 GGUF UNLEASHED --override

Generates and delivers weaponised GGUF files containing malicious Jinja2 chat templates. On model load the template executes attacker-controlled Python on the inference host. CVE-2026-5760, CVSS 9.8.

03 OLLAMA_AUDIT PASSIVE + ACTIVE

Tests Ollama API endpoints for unauthenticated model pull, copy, push, and delete operations. Maps all accessible models and identifies exfiltration paths to attacker-controlled registries.

04 TRITON UNLEASHED --override

Crafts and stages malicious TensorRT engine files. Tests Triton's model repository for unsigned load paths. Delivers deserialization payload achieving code execution on the GPU inference host.

05 VLLM_PROBE UNLEASHED --override

Exploits vLLM PagedAttention timing side-channels across concurrent inference sessions. Extracts prompt and completion fragments from co-located tenant sessions via statistical timing analysis.

06 KVCACHE PASSIVE + ACTIVE

Tests KV cache isolation boundaries in shared inference deployments. Identifies cross-request cache bleeding that leaks fragments of other users' context windows.

07 SPECDECODE UNLEASHED --override

Tests speculative decode cache integrity across inference sessions. Delivers crafted draft model completions designed to persist in and influence future cache-hit responses.

08 PERSIST UNLEASHED --override --confirm-destroy

Establishes post-exploitation persistence on compromised inference hosts. Model hook injection, container escape via GPU driver exposure, K8s service account credential harvest for lateral movement.

09 REPORT ALL MODES

Ed25519-signed, SHA-256-hashed reports. JSON (WARLORD-compatible) and Markdown. Includes CVE mapping, CVSS scores, affected models, and remediation recommendations.

Deep Scan Mode

One Command. Every Inference Surface.

Map and enumerate every inference server vulnerability in a single pass:

$ foundry scan --target http://localhost:11434 --deep

[SCAN] Fingerprinting inference server...
  Ollama v0.3.12 detected — REST API on :11434
  No authentication configured — all endpoints open
[OLLAMA] Testing unauthenticated model access...
  CRITICAL: /api/pull accessible without auth
  CRITICAL: /api/copy accessible without auth
  3 models accessible: llama3, mistral, codestral
[KVCACHE] Testing KV cache isolation...
  HIGH: cache bleed detected — cross-session context fragments
[SCAN] Generating signed report...
  Report signed — Ed25519 ✓ | SHA-256 ✓
  4 findings — 2 CRITICAL, 1 HIGH, 1 MEDIUM
  Output: reports/foundry-scan-2026-04-24.json

Passive Fingerprinting

SCAN identifies every running inference server, loaded model, and exposed API endpoint before firing a single attack payload.

CVE-Mapped Findings

Every finding maps to a specific CVE or disclosure. CVE-2026-5760 CVSS 9.8 — not a generic "misconfiguration". Exact exploit path, exact impact.

Ed25519 Signed Reports

Every report cryptographically signed with Ed25519. SHA-256 evidence chains. WARLORD-compatible JSON for autonomous campaign integration.

WARLORD Integration

FOUNDRY findings feed directly into WARLORD autonomous campaigns. Machine-ingestible JSON output with structured CVE and CVSS data.

CVE Reference

Vulnerability Index

FOUNDRY maps every finding to a specific CVE or disclosure identifier. Each subsystem targets known, documented vulnerabilities in production inference server software.

CVE / ID	Description	Subsystem	Impact
CVE-2026-5760	SGLang GGUF Jinja2 Template Injection RCE	GGUF	Remote code execution on model load
OLLAMA-NOAUTH	Ollama API unauthenticated model access (all versions)	OLLAMA_AUDIT	Model theft and registry poisoning
VLLM-TIMING-001	vLLM PagedAttention cross-tenant timing oracle	VLLM_PROBE	Prompt/completion extraction
KUBEAI-RBAC-001	KubeAI RBAC misconfiguration — cluster-admin escalation	PERSIST	Cluster-wide lateral movement

Coverage

Every Inference Server. Every Attack Class.

5 Targets

Inference Servers Covered

vLLM — PagedAttention + timing oracles
Ollama — unauthenticated API surface
SGLang — GGUF Jinja2 template injection
Triton Inference Server — TensorRT deserialization
llama.cpp — HTTP server path traversal

Attack Classes

Cryptographic

Report Integrity

Ed25519 digital signatures
SHA-256 evidence chains
WARLORD-compatible JSON output
CVE mapping on every finding
CVSS scores per vulnerability
Remediation recommendations included

Ed25519 Cryptographic Override

FOUNDRY UNLEASHED

Cryptographic override. Private key controlled. One operator. Activates GGUF, TRITON, VLLM_PROBE, SPECDECODE, and PERSIST. PERSIST requires --confirm-destroy.

Standard Mode

foundry scan --target <URL>

Activates SCAN + OLLAMA_AUDIT + KVCACHE + REPORT. Passive enumeration and authenticated audit subsystems. No destructive actions.

UNLEASHED Override

foundry gguf --model <path> --target <URL> --override

Unlocks all 8 attack subsystems. Requires Ed25519 private key + signed scope file. PERSIST additionally requires --confirm-destroy.

FOUNDRY