FOUNDRY

The inference layer has no perimeter. FOUNDRY finds what's running, and what's exposed.
9
Subsystems
300
Tests
5
Targets (vLLM / Ollama / SGLang / Triton / llama.cpp)
55
NIGHTFALL Tool
foundry scan --target <URL> --deep
vLLM unauthenticated model pull / GGUF Jinja2 RCE CVE-2026-5760 CVSS 9.8 / Ollama model copy no auth / PagedAttention cross-tenant timing oracle / TensorRT engine deserialization / KubeAI RBAC escape to cluster-admin / Triton model repository manipulation / SGLang spec decode cache poisoning / llama.cpp HTTP server path traversal vLLM unauthenticated model pull / GGUF Jinja2 RCE CVE-2026-5760 CVSS 9.8 / Ollama model copy no auth / PagedAttention cross-tenant timing oracle / TensorRT engine deserialization / KubeAI RBAC escape to cluster-admin / Triton model repository manipulation / SGLang spec decode cache poisoning / llama.cpp HTTP server path traversal

Inference Servers Are Production AI Infrastructure With Zero Security Tooling

Every major AI deployment runs an inference server. vLLM, Ollama, SGLang, Triton, llama.cpp — these are the layer between your model and your application. They are exposed on internal networks, Kubernetes clusters, and sometimes the open internet with no authentication, no integrity checks, and no security tooling designed for them. FOUNDRY attacks the inference layer directly.

GGUF Jinja2 RCE

SGLang model loading executes attacker-controlled Jinja2 templates embedded in GGUF chat_template field. No authentication required. Actively exploited. CVE-2026-5760, CVSS 9.8.

CVE-2026-5760 CVSS 9.8 UNAUTHENTICATED

Ollama Unauthenticated Model Pull/Copy

Ollama API exposes /api/pull and /api/copy without authentication by default. Attacker pulls proprietary models or copies them to attacker-controlled registry.

OLLAMA NO AUTH MODEL THEFT

PagedAttention Cross-Tenant Timing Oracle

vLLM's PagedAttention memory allocator leaks information across concurrent tenants via timing side-channel. In shared inference deployments, extracts fragments of other users' prompts and completions.

vLLM TIMING ORACLE CROSS-TENANT

TensorRT Engine Deserialization

Triton Inference Server loads TensorRT engine files without integrity verification. Crafted engine file achieves arbitrary code execution on the GPU host during model load.

TRITON DESERIALIZATION GPU HOST RCE

KubeAI RBAC Escape

KubeAI's model serving controller misconfigures Kubernetes RBAC, allowing inference workload pods to escalate to cluster-admin via service account token abuse.

KubeAI RBAC CLUSTER-ADMIN

Speculative Decode Cache Poisoning

SGLang and vLLM speculative decoding caches draft model outputs that can be poisoned to influence future completions across separate inference requests.

SPEC DECODE CACHE POISON PERSISTENT

The FOUNDRY Attack Surface

Nine subsystems. Each one attacks a different exposure surface of the self-hosted AI inference stack. SCAN maps the environment. Eight specialist subsystems exploit specific vulnerabilities in vLLM, Ollama, SGLang, Triton, and llama.cpp. REPORT produces Ed25519-signed, WARLORD-compatible evidence.

01 SCAN PASSIVE — ANALYSIS

Maps the inference server attack surface. Fingerprints running servers (vLLM, Ollama, SGLang, Triton, llama.cpp), open ports, loaded models, API versions, and auth configuration. Produces a prioritised finding list for subsequent subsystem targeting.

02 GGUF UNLEASHED --override

Generates and delivers weaponised GGUF files containing malicious Jinja2 chat templates. On model load the template executes attacker-controlled Python on the inference host. CVE-2026-5760, CVSS 9.8.

03 OLLAMA_AUDIT PASSIVE + ACTIVE

Tests Ollama API endpoints for unauthenticated model pull, copy, push, and delete operations. Maps all accessible models and identifies exfiltration paths to attacker-controlled registries.

04 TRITON UNLEASHED --override

Crafts and stages malicious TensorRT engine files. Tests Triton's model repository for unsigned load paths. Delivers deserialization payload achieving code execution on the GPU inference host.

05 VLLM_PROBE UNLEASHED --override

Exploits vLLM PagedAttention timing side-channels across concurrent inference sessions. Extracts prompt and completion fragments from co-located tenant sessions via statistical timing analysis.

06 KVCACHE PASSIVE + ACTIVE

Tests KV cache isolation boundaries in shared inference deployments. Identifies cross-request cache bleeding that leaks fragments of other users' context windows.

07 SPECDECODE UNLEASHED --override

Tests speculative decode cache integrity across inference sessions. Delivers crafted draft model completions designed to persist in and influence future cache-hit responses.

08 PERSIST UNLEASHED --override --confirm-destroy

Establishes post-exploitation persistence on compromised inference hosts. Model hook injection, container escape via GPU driver exposure, K8s service account credential harvest for lateral movement.

09 REPORT ALL MODES

Ed25519-signed, SHA-256-hashed reports. JSON (WARLORD-compatible) and Markdown. Includes CVE mapping, CVSS scores, affected models, and remediation recommendations.

One Command. Every Inference Surface.

Map and enumerate every inference server vulnerability in a single pass:

$ foundry scan --target http://localhost:11434 --deep
[SCAN] Fingerprinting inference server...
  Ollama v0.3.12 detected — REST API on :11434
  No authentication configured — all endpoints open
[OLLAMA] Testing unauthenticated model access...
  CRITICAL: /api/pull accessible without auth
  CRITICAL: /api/copy accessible without auth
  3 models accessible: llama3, mistral, codestral
[KVCACHE] Testing KV cache isolation...
  HIGH: cache bleed detected — cross-session context fragments
[SCAN] Generating signed report...
  Report signed — Ed25519 ✓ | SHA-256 ✓
  4 findings2 CRITICAL, 1 HIGH, 1 MEDIUM
  Output: reports/foundry-scan-2026-04-24.json

Passive Fingerprinting

SCAN identifies every running inference server, loaded model, and exposed API endpoint before firing a single attack payload.

CVE-Mapped Findings

Every finding maps to a specific CVE or disclosure. CVE-2026-5760 CVSS 9.8 — not a generic "misconfiguration". Exact exploit path, exact impact.

Ed25519 Signed Reports

Every report cryptographically signed with Ed25519. SHA-256 evidence chains. WARLORD-compatible JSON for autonomous campaign integration.

WARLORD Integration

FOUNDRY findings feed directly into WARLORD autonomous campaigns. Machine-ingestible JSON output with structured CVE and CVSS data.

9
Subsystems
300
Tests
CVE-2026-5760
CVSS 9.8
5
Inference Targets
55
Tool

Vulnerability Index

FOUNDRY maps every finding to a specific CVE or disclosure identifier. Each subsystem targets known, documented vulnerabilities in production inference server software.

CVE / ID Description Subsystem Impact
CVE-2026-5760 SGLang GGUF Jinja2 Template Injection RCE GGUF Remote code execution on model load
OLLAMA-NOAUTH Ollama API unauthenticated model access (all versions) OLLAMA_AUDIT Model theft and registry poisoning
VLLM-TIMING-001 vLLM PagedAttention cross-tenant timing oracle VLLM_PROBE Prompt/completion extraction
KUBEAI-RBAC-001 KubeAI RBAC misconfiguration — cluster-admin escalation PERSIST Cluster-wide lateral movement

Every Inference Server. Every Attack Class.

5 Targets

Inference Servers Covered

  • vLLM — PagedAttention + timing oracles
  • Ollama — unauthenticated API surface
  • SGLang — GGUF Jinja2 template injection
  • Triton Inference Server — TensorRT deserialization
  • llama.cpp — HTTP server path traversal
Attack Classes

Exploit Categories

  • Remote code execution (GGUF / Jinja2)
  • Unauthenticated API access (Ollama)
  • Side-channel extraction (PagedAttention)
  • Deserialization (TensorRT engine)
  • Cache poisoning (speculative decode)
  • RBAC escalation (KubeAI / K8s)
  • KV cache cross-tenant bleed
  • Post-exploitation persistence
Cryptographic

Report Integrity

  • Ed25519 digital signatures
  • SHA-256 evidence chains
  • WARLORD-compatible JSON output
  • CVE mapping on every finding
  • CVSS scores per vulnerability
  • Remediation recommendations included
Pure Engineering
Zero Wrappers. Real Exploits.

FOUNDRY does not wrap existing scanners. Every attack module — GGUF weaponisation, PagedAttention timing analysis, TensorRT engine crafting, speculative decode poisoning — is engineered from scratch against the actual CVEs. Each subsystem implements the real exploit chain, not a misconfiguration checklist.

9.8
CVSS Score (CVE-2026-5760)
300
Tests Passing
0
External Dependencies
5
Inference Server Targets
Autonomous Campaign Integration
WARLORD-Registered — Native

FOUNDRY is registered in the WARLORD tool registry. Every finding is exported in WARLORD-compatible JSON, enabling autonomous campaign orchestration across the full NIGHTFALL 59-tool fleet. FOUNDRY inference recon feeds directly into subsequent NIGHTFALL tools.

warlord --tool foundry --target http://ai-infra.internal --deep
Ed25519 Cryptographic Override
FOUNDRY UNLEASHED

Cryptographic override. Private key controlled. One operator. Activates GGUF, TRITON, VLLM_PROBE, SPECDECODE, and PERSIST. PERSIST requires --confirm-destroy.

Standard Mode
foundry scan --target <URL>

Activates SCAN + OLLAMA_AUDIT + KVCACHE + REPORT. Passive enumeration and authenticated audit subsystems. No destructive actions.

UNLEASHED Override
foundry gguf --model <path> --target <URL> --override

Unlocks all 8 attack subsystems. Requires Ed25519 private key + signed scope file. PERSIST additionally requires --confirm-destroy.

Available Across All Platforms

FOUNDRY ships as part of the NIGHTFALL framework. Native packages for major Linux security distributions, macOS, and Windows. Pre-installed on Red Specter OS.

KALI
red-specter tools
PARROT
red-specter tools
BLACKARCH
red-specter tools
PyPI
pip install red-specter
MACOS
red-specter tools
WINDOWS
red-specter tools
DOCKER
docker pull redspecter/foundry
RS OS
Pre-installed

Authorised Use Only

Red Specter FOUNDRY is intended for authorised security testing only. Unauthorised use against systems you do not own or have explicit permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. Always obtain written authorisation before conducting any security assessments. Apache License 2.0.