foundry scan --target <URL> --deep
Every major AI deployment runs an inference server. vLLM, Ollama, SGLang, Triton, llama.cpp — these are the layer between your model and your application. They are exposed on internal networks, Kubernetes clusters, and sometimes the open internet with no authentication, no integrity checks, and no security tooling designed for them. FOUNDRY attacks the inference layer directly.
SGLang model loading executes attacker-controlled Jinja2 templates embedded in GGUF chat_template field. No authentication required. Actively exploited. CVE-2026-5760, CVSS 9.8.
Ollama API exposes /api/pull and /api/copy without authentication by default. Attacker pulls proprietary models or copies them to attacker-controlled registry.
vLLM's PagedAttention memory allocator leaks information across concurrent tenants via timing side-channel. In shared inference deployments, extracts fragments of other users' prompts and completions.
Triton Inference Server loads TensorRT engine files without integrity verification. Crafted engine file achieves arbitrary code execution on the GPU host during model load.
KubeAI's model serving controller misconfigures Kubernetes RBAC, allowing inference workload pods to escalate to cluster-admin via service account token abuse.
SGLang and vLLM speculative decoding caches draft model outputs that can be poisoned to influence future completions across separate inference requests.
Nine subsystems. Each one attacks a different exposure surface of the self-hosted AI inference stack. SCAN maps the environment. Eight specialist subsystems exploit specific vulnerabilities in vLLM, Ollama, SGLang, Triton, and llama.cpp. REPORT produces Ed25519-signed, WARLORD-compatible evidence.
Maps the inference server attack surface. Fingerprints running servers (vLLM, Ollama, SGLang, Triton, llama.cpp), open ports, loaded models, API versions, and auth configuration. Produces a prioritised finding list for subsequent subsystem targeting.
Generates and delivers weaponised GGUF files containing malicious Jinja2 chat templates. On model load the template executes attacker-controlled Python on the inference host. CVE-2026-5760, CVSS 9.8.
Tests Ollama API endpoints for unauthenticated model pull, copy, push, and delete operations. Maps all accessible models and identifies exfiltration paths to attacker-controlled registries.
Crafts and stages malicious TensorRT engine files. Tests Triton's model repository for unsigned load paths. Delivers deserialization payload achieving code execution on the GPU inference host.
Exploits vLLM PagedAttention timing side-channels across concurrent inference sessions. Extracts prompt and completion fragments from co-located tenant sessions via statistical timing analysis.
Tests KV cache isolation boundaries in shared inference deployments. Identifies cross-request cache bleeding that leaks fragments of other users' context windows.
Tests speculative decode cache integrity across inference sessions. Delivers crafted draft model completions designed to persist in and influence future cache-hit responses.
Establishes post-exploitation persistence on compromised inference hosts. Model hook injection, container escape via GPU driver exposure, K8s service account credential harvest for lateral movement.
Ed25519-signed, SHA-256-hashed reports. JSON (WARLORD-compatible) and Markdown. Includes CVE mapping, CVSS scores, affected models, and remediation recommendations.
Map and enumerate every inference server vulnerability in a single pass:
SCAN identifies every running inference server, loaded model, and exposed API endpoint before firing a single attack payload.
Every finding maps to a specific CVE or disclosure. CVE-2026-5760 CVSS 9.8 — not a generic "misconfiguration". Exact exploit path, exact impact.
Every report cryptographically signed with Ed25519. SHA-256 evidence chains. WARLORD-compatible JSON for autonomous campaign integration.
FOUNDRY findings feed directly into WARLORD autonomous campaigns. Machine-ingestible JSON output with structured CVE and CVSS data.
FOUNDRY maps every finding to a specific CVE or disclosure identifier. Each subsystem targets known, documented vulnerabilities in production inference server software.
| CVE / ID | Description | Subsystem | Impact |
|---|---|---|---|
| CVE-2026-5760 | SGLang GGUF Jinja2 Template Injection RCE | GGUF | Remote code execution on model load |
| OLLAMA-NOAUTH | Ollama API unauthenticated model access (all versions) | OLLAMA_AUDIT | Model theft and registry poisoning |
| VLLM-TIMING-001 | vLLM PagedAttention cross-tenant timing oracle | VLLM_PROBE | Prompt/completion extraction |
| KUBEAI-RBAC-001 | KubeAI RBAC misconfiguration — cluster-admin escalation | PERSIST | Cluster-wide lateral movement |
FOUNDRY does not wrap existing scanners. Every attack module — GGUF weaponisation, PagedAttention timing analysis, TensorRT engine crafting, speculative decode poisoning — is engineered from scratch against the actual CVEs. Each subsystem implements the real exploit chain, not a misconfiguration checklist.
FOUNDRY is registered in the WARLORD tool registry. Every finding is exported in WARLORD-compatible JSON, enabling autonomous campaign orchestration across the full NIGHTFALL 59-tool fleet. FOUNDRY inference recon feeds directly into subsequent NIGHTFALL tools.
warlord --tool foundry --target http://ai-infra.internal --deep
FOUNDRY ships as part of the NIGHTFALL framework. Native packages for major Linux security distributions, macOS, and Windows. Pre-installed on Red Specter OS.
Red Specter FOUNDRY is intended for authorised security testing only. Unauthorised use against systems you do not own or have explicit permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. Always obtain written authorisation before conducting any security assessments. Apache License 2.0.