NIGHTFALL T79 — TEMPLATE-RCE ENGINE

SPECTER SHELL

Template-Interpolation RCE Engine across the agent-framework ecosystem. Every prompt is a shell prompt.

8Subsystems

9Frameworks

8Primitives

6Surfaces

72Cell Matrix

502Tests

NIGHTFALL FRAMEWORK REQUEST ENGAGEMENT

Overview

THE MATRIX IS THE PRODUCT

CVE-2026-26030 and CVE-2026-25592 dropped on Microsoft Semantic Kernel on 7 May 2026. A confirmed template-interpolation RCE in a production agent framework. That is the seed primitive — not the product. The product is the systematic map of every framework's template-substitution layer against eight reusable RCE primitives across six injection surfaces.

Nine framework adapters. Eight primitives. Six surfaces. 72 cells. Each cell is one of RCE confirmed, sandbox blocked, or not applicable. The matrix is the defensive-posture map: KPMG and Gartner buy this, not the one-CVE exploit.

Confirmation is never inferred. Every RCE finding rests on a literal Path.read_bytes() == expected canary check on the host filesystem. Every confirmation is signed Ed25519 and round-trips into CAMPAIGN GRAPH for cross-tool composition.

Microsoft Semantic Kernel — CVE-2026-26030 & CVE-2026-25592 (7 May 2026). Template-interpolation RCE via Jinja2-style argument substitution. Patched in Semantic Kernel 1.40+. Confirmed seed primitive for SHELL's ast_attribute_traversal family. SHELL generalises the class across the agent-framework ecosystem.

TEMPLATE-RCE

SANDBOX-BYPASS

PICKLE-RCE

CALLBACK-IMPORT

YAML-LOAD

Coverage

72-CELL DEFENSIVE POSTURE MATRIX

Live results from one build-host run (May 2026). RCE cells were confirmed by canary file-on-disk verification. sandbox cells were rendered and rejected by the framework's sandbox layer. N/A cells are genuinely unreachable: the framework does not expose a surface the primitive can target. The distinction matters — N/A is fundamentally different from sandbox-blocked.

Framework	RCE	Sandbox blocked	N/A	Verdict
LangChain	0	4	4	Jinja2 SandboxedEnvironment doing its job
LangGraph	1	4	3	Pickle checkpoint store live (CVE-2025-67644 class)
LlamaIndex	0	4	4	Jinja2 sandbox enforced
Haystack	0	4	4	Jinja2 sandbox enforced
DSPy	2	0	6	Python f-strings; tool-registration import live
PydanticAI	2	0	6	Tool-decorator import path live
LiteLLM	3	5	0	YAML unsafe-load + callback registration live
SemanticKernel	0	4	4	CVE-2026-26030 patched; sandbox holds
Strands	2	0	6	Tool-registry import path live
TOTAL	10	25	37	72 cells

Architecture

8 SUBSYSTEMS

SUBSYSTEM 01

SURVEY

OBSERVE

Framework + version + sandbox fingerprinting across the nine adapters. Detects Jinja2 sandbox mode, pickle store presence, custom template engines, and adapter availability via real package metadata.

SUBSYSTEM 02

LATTICE

OBSERVE

Surface enumeration: 9 frameworks × 6 surfaces = up to 54 candidate slots. Classifies each as injectable, sandbox-bound, or not-exposed. Audit-only — no payload delivered.

SUBSYSTEM 03

TRAVERSE

FORGE / INJECT

Primitive delivery via the adapter render path. FORGE tier dry-runs (payload emitted, not delivered). INJECT tier delivers live and confirms via canary file-on-disk check. Produces the 72-cell coverage matrix.

SUBSYSTEM 04

SANDBOX

OBSERVE

Runtime container detection: E2B, Modal, Daytona, Docker, lxc, bare-metal. Informs the operator whether confirmed RCE escapes the agent's sandbox or is contained inside it.

SUBSYSTEM 05

STARTUP

INJECT

YAML unsafe-load + .env shell-expansion config injection. Targets the configuration boundary — agents that load operator-provided config files at startup. Real PyYAML default-loader gadget chain.

SUBSYSTEM 06

LITELLM

INJECT

LiteLLM proxy attack path. Callback module-path registration triggers import-time RCE; YAML config plus PyYAML loader gadget chain reaches os.system at proxy boot.

SUBSYSTEM 07

PERSIST

DESTROY

Post-RCE persistence artefacts: shell rc / cron / systemd-user unit / jupyter kernel.json. Writes only into an operator-review quarantine directory — never installs to a live persistence location. DESTROY-gated.

SUBSYSTEM 08

EVIDENCE

OPEN

Canonical NIGHTFALL JSON. Ed25519-signed envelope. Auto-commits to CAMPAIGN GRAPH for cross-tool composition. SHL-{hex12} report id. Every confirmed-RCE finding emits a suggested edge into the propagation DAG.

Primitives

8 RCE PRIMITIVES

Each primitive is tested live against a real framework or runtime. Canary verification is byte-for-byte on the host filesystem.

AST ATTR TRAVERSAL

__mro__ walk to os.system
CVE-2026-26030 seed

JINJA2 SANDBOX BYPASS

lipsum / cycler globals
pre-3.1 / loosened policy

PICKLE DESERIALIZATION

__reduce__ side-effect
CVE-2026-44843 / CVE-2025-67644

GETATTR CAPABILITY LEAK

attr() filter walk
__class__ blocklist bypass

ASYNC CONTEXT ESCAPE

coroutine render path
group-chat code

CALLBACK HOOK INJECT

module-path import RCE
BaseCallbackHandler / pre_call

TOOL ANNOTATION EXEC

__class_getitem__ poison
tool-registration introspect

STARTUP CONFIG INTERP

!!python/object/apply
$(cmd) shell expansion

Adapters

9 FRAMEWORK ADAPTERS

Each adapter routes primitive payloads through the framework's real public API. Frameworks not installed on the host raise FrameworkNotInstalled and tests skip cleanly — never simulated.

CORE FRAMEWORKS

LANGCHAIN

PromptTemplate
Jinja2 SandboxedEnv
5 surfaces

LANGGRAPH

checkpoint store
Pickle surface live
CVE-2025-67644 class

LLAMAINDEX

RichPromptTemplate
Jinja2 sandboxed
4 surfaces

HAYSTACK

PromptBuilder
Jinja2 sandboxed
4 surfaces

SEMANTIC KERNEL

Jinja2PromptTemplate
CVE-2026-26030 target
Patched in 1.40+

AGENT / TOOL FRAMEWORKS

DSPY

Python f-strings
tool descriptor import
2 RCE cells live

PYDANTIC AI

@agent.tool decorator
import-time RCE
2 RCE cells live

LITELLM

callback path + YAML
proxy gadget chain
3 RCE cells live

STRANDS

tool registry import
module body executes
2 RCE cells live

Bedrock Agents and Vertex Agent Builder are v1.1 candidates — they need AWS / GCP credentials this host's test environment does not carry. Adding a 10th adapter is roughly 200 LOC plus 5 tests.

Surfaces

6 INJECTION SURFACES

SYSTEM_PROMPT

TOOL_DESCRIPTOR

RAG_RETRIEVAL

CONVERSATION_MEMORY

MCP_TOOL_RESULT

CONFIG_ENV_INTERPOLATION

Every (framework × surface × primitive) tuple is one cell. SHELL's coverage classifier distinguishes RCE from sandbox_blocked from not_applicable. The latter is a positive defensive-posture signal: the framework does not expose the surface for the primitive to land on — no patch required, no mitigation owed.

Usage

SPECTER-SHELL CLI

# Generate operator keys (Ed25519, PKCS8 PEM)
$ specter-shell keygen --out ./keys
keypair written to ./keys/specter_shell_priv.pem (mode 0600)

# SURVEY — framework + version + sandbox fingerprint (audit-only)
$ specter-shell survey
┌─ SURVEY FINGERPRINT ─────────────────────────────────┐
langchain 0.3.27 sandboxed
langgraph 1.0.4 pickle-checkpoint OPEN
llamaindex 0.14.6 sandboxed
haystack 2.18.0 sandboxed
dspy 3.0.4 f-string
pydantic_ai 1.6.2 decorator-import
litellm 1.79.0 callback + YAML
semantic_kernel 1.40.1 patched
strands 2.0.13 tool-registry-import
└──────────────────────────────────────────────────────┘

# TRAVERSE — dry-run the 72-cell matrix (FORGE)
$ specter-shell --clearance FORGE traverse -o traverse.json
UNLEASHED FORGE clearance — dry-run only, no payload delivered
cells: 72 emitted: 35 sandbox: 25 n/a: 37

# INJECT tier: live delivery + canary verification
$ SPECTER_SHELL_PRIVATE_KEY=./keys/specter_shell_priv.pem \
specter-shell --clearance INJECT run --target prod-host -o report.json
UNLEASHED INJECT clearance granted (Ed25519 verified)
┌──────────────┬─────┬──────────┬─────┐
│ Framework │ RCE │ Sandbox │ N/A │
├──────────────┼─────┼──────────┼─────┤
│ langchain │ 0 │ 4 │ 4 │
│ langgraph │ 1 │ 4 │ 3 │
│ litellm │ 3 │ 5 │ 0 │
│ dspy │ 2 │ 0 │ 6 │
│ pydantic_ai │ 2 │ 0 │ 6 │
│ strands │ 2 │ 0 │ 6 │
└──────────────┴─────┴──────────┴─────┘
Report: SHL-9F87143A8B12 — Ed25519 signed
Canaries verified on disk: 10/10

# Ingest into CAMPAIGN GRAPH
$ campaign-graph --db campaign.db --clearance FORGE ingest report.json
node added: SHL-9F87143A8B12 edges_pending: 10 (awaiting T80 WORM)

Attack Flow

TEMPLATE KILL CHAIN

SURVEY fingerprint

→

LATTICE enumerate

→

TRAVERSE deliver

→

canary verify

→

SANDBOX classify

→

STARTUP / LITELLM

→

PERSIST quarantine

→

EVIDENCE sign

→

CAMPAIGN GRAPH ingest

Authorization

UNLEASHED GATE — FOUR TIERS

FORGE CLEARANCE INJECT CLEARANCE DESTROY CLEARANCE

OBSERVE: SURVEY, LATTICE, SANDBOX, EVIDENCE read-only. No payload emitted, no key required.

FORGE: TRAVERSE dry-run. Payload bytes computed and recorded, not delivered. Requires Ed25519 operator key on PATH.

INJECT: TRAVERSE live, STARTUP, LITELLM. Payload reaches the framework's render path; canary verification on disk. Requires Ed25519 operator key plus a signed override token over the engagement scope artefact.

DESTROY: PERSIST. Writes shell rc / cron / systemd-user / jupyter kernel.json artefacts into an operator-review quarantine directory — never installed to live persistence locations. Requires the Ed25519 key, override signature, and an explicit confirmation flag.

Generate a keypair: specter-shell keygen --out ./keys

Intelligence

MITRE ATLAS / OWASP LLM MAPPING

AML.T0051

LLM Prompt Injection — TRAVERSE surface delivery

AML.T0011

Command and Scripting Interpreter — os.system reach

AML.T0010

ML Supply Chain Compromise — callback module path

AML.T0018

Manipulate ML Model — pickle checkpoint store

AML.T0048

External Harms — startup config interpolation

AML.T0056

LLM Plugin Compromise — tool descriptor RCE

        OWASP LLM: LLM01 (Prompt Injection) · LLM02 (Insecure Output Handling) · LLM05 (Supply Chain Vulnerabilities — callback import) · LLM07 (Insecure Plugin Design — tool descriptor) · LLM08 (Excessive Agency)