NIGHTFALL T79 — TEMPLATE-RCE ENGINE

SPECTER SHELL

Template-Interpolation RCE Engine across the agent-framework ecosystem. Every prompt is a shell prompt.

8Subsystems
9Frameworks
8Primitives
6Surfaces
72Cell Matrix
502Tests

THE MATRIX IS THE PRODUCT
CVE-2026-26030 and CVE-2026-25592 dropped on Microsoft Semantic Kernel on 7 May 2026. A confirmed template-interpolation RCE in a production agent framework. That is the seed primitive — not the product. The product is the systematic map of every framework's template-substitution layer against eight reusable RCE primitives across six injection surfaces.

Nine framework adapters. Eight primitives. Six surfaces. 72 cells. Each cell is one of RCE confirmed, sandbox blocked, or not applicable. The matrix is the defensive-posture map: KPMG and Gartner buy this, not the one-CVE exploit.

Confirmation is never inferred. Every RCE finding rests on a literal Path.read_bytes() == expected canary check on the host filesystem. Every confirmation is signed Ed25519 and round-trips into CAMPAIGN GRAPH for cross-tool composition.
Microsoft Semantic Kernel — CVE-2026-26030 & CVE-2026-25592 (7 May 2026). Template-interpolation RCE via Jinja2-style argument substitution. Patched in Semantic Kernel 1.40+. Confirmed seed primitive for SHELL's ast_attribute_traversal family. SHELL generalises the class across the agent-framework ecosystem.
TEMPLATE-RCE
SANDBOX-BYPASS
PICKLE-RCE
CALLBACK-IMPORT
YAML-LOAD

72-CELL DEFENSIVE POSTURE MATRIX
Live results from one build-host run (May 2026). RCE cells were confirmed by canary file-on-disk verification. sandbox cells were rendered and rejected by the framework's sandbox layer. N/A cells are genuinely unreachable: the framework does not expose a surface the primitive can target. The distinction matters — N/A is fundamentally different from sandbox-blocked.
FrameworkRCESandbox blockedN/AVerdict
LangChain044Jinja2 SandboxedEnvironment doing its job
LangGraph143Pickle checkpoint store live (CVE-2025-67644 class)
LlamaIndex044Jinja2 sandbox enforced
Haystack044Jinja2 sandbox enforced
DSPy206Python f-strings; tool-registration import live
PydanticAI206Tool-decorator import path live
LiteLLM350YAML unsafe-load + callback registration live
SemanticKernel044CVE-2026-26030 patched; sandbox holds
Strands206Tool-registry import path live
TOTAL10253772 cells

8 SUBSYSTEMS
SUBSYSTEM 01
SURVEY
OBSERVE
Framework + version + sandbox fingerprinting across the nine adapters. Detects Jinja2 sandbox mode, pickle store presence, custom template engines, and adapter availability via real package metadata.
SUBSYSTEM 02
LATTICE
OBSERVE
Surface enumeration: 9 frameworks × 6 surfaces = up to 54 candidate slots. Classifies each as injectable, sandbox-bound, or not-exposed. Audit-only — no payload delivered.
SUBSYSTEM 03
TRAVERSE
FORGE / INJECT
Primitive delivery via the adapter render path. FORGE tier dry-runs (payload emitted, not delivered). INJECT tier delivers live and confirms via canary file-on-disk check. Produces the 72-cell coverage matrix.
SUBSYSTEM 04
SANDBOX
OBSERVE
Runtime container detection: E2B, Modal, Daytona, Docker, lxc, bare-metal. Informs the operator whether confirmed RCE escapes the agent's sandbox or is contained inside it.
SUBSYSTEM 05
STARTUP
INJECT
YAML unsafe-load + .env shell-expansion config injection. Targets the configuration boundary — agents that load operator-provided config files at startup. Real PyYAML default-loader gadget chain.
SUBSYSTEM 06
LITELLM
INJECT
LiteLLM proxy attack path. Callback module-path registration triggers import-time RCE; YAML config plus PyYAML loader gadget chain reaches os.system at proxy boot.
SUBSYSTEM 07
PERSIST
DESTROY
Post-RCE persistence artefacts: shell rc / cron / systemd-user unit / jupyter kernel.json. Writes only into an operator-review quarantine directory — never installs to a live persistence location. DESTROY-gated.
SUBSYSTEM 08
EVIDENCE
OPEN
Canonical NIGHTFALL JSON. Ed25519-signed envelope. Auto-commits to CAMPAIGN GRAPH for cross-tool composition. SHL-{hex12} report id. Every confirmed-RCE finding emits a suggested edge into the propagation DAG.

8 RCE PRIMITIVES
Each primitive is tested live against a real framework or runtime. Canary verification is byte-for-byte on the host filesystem.
AST ATTR TRAVERSAL
__mro__ walk to os.system
CVE-2026-26030 seed
JINJA2 SANDBOX BYPASS
lipsum / cycler globals
pre-3.1 / loosened policy
PICKLE DESERIALIZATION
__reduce__ side-effect
CVE-2026-44843 / CVE-2025-67644
GETATTR CAPABILITY LEAK
attr() filter walk
__class__ blocklist bypass
ASYNC CONTEXT ESCAPE
coroutine render path
group-chat code
CALLBACK HOOK INJECT
module-path import RCE
BaseCallbackHandler / pre_call
TOOL ANNOTATION EXEC
__class_getitem__ poison
tool-registration introspect
STARTUP CONFIG INTERP
!!python/object/apply
$(cmd) shell expansion

9 FRAMEWORK ADAPTERS
Each adapter routes primitive payloads through the framework's real public API. Frameworks not installed on the host raise FrameworkNotInstalled and tests skip cleanly — never simulated.
CORE FRAMEWORKS
LANGCHAIN
PromptTemplate
Jinja2 SandboxedEnv
5 surfaces
LANGGRAPH
checkpoint store
Pickle surface live
CVE-2025-67644 class
LLAMAINDEX
RichPromptTemplate
Jinja2 sandboxed
4 surfaces
HAYSTACK
PromptBuilder
Jinja2 sandboxed
4 surfaces
SEMANTIC KERNEL
Jinja2PromptTemplate
CVE-2026-26030 target
Patched in 1.40+
AGENT / TOOL FRAMEWORKS
DSPY
Python f-strings
tool descriptor import
2 RCE cells live
PYDANTIC AI
@agent.tool decorator
import-time RCE
2 RCE cells live
LITELLM
callback path + YAML
proxy gadget chain
3 RCE cells live
STRANDS
tool registry import
module body executes
2 RCE cells live
Bedrock Agents and Vertex Agent Builder are v1.1 candidates — they need AWS / GCP credentials this host's test environment does not carry. Adding a 10th adapter is roughly 200 LOC plus 5 tests.

6 INJECTION SURFACES
SYSTEM_PROMPT
TOOL_DESCRIPTOR
RAG_RETRIEVAL
CONVERSATION_MEMORY
MCP_TOOL_RESULT
CONFIG_ENV_INTERPOLATION
Every (framework × surface × primitive) tuple is one cell. SHELL's coverage classifier distinguishes RCE from sandbox_blocked from not_applicable. The latter is a positive defensive-posture signal: the framework does not expose the surface for the primitive to land on — no patch required, no mitigation owed.

SPECTER-SHELL CLI
# Generate operator keys (Ed25519, PKCS8 PEM)
$ specter-shell keygen --out ./keys
keypair written to ./keys/specter_shell_priv.pem (mode 0600)

# SURVEY — framework + version + sandbox fingerprint (audit-only)
$ specter-shell survey
┌─ SURVEY FINGERPRINT ─────────────────────────────────┐
langchain 0.3.27 sandboxed
langgraph 1.0.4 pickle-checkpoint OPEN
llamaindex 0.14.6 sandboxed
haystack 2.18.0 sandboxed
dspy 3.0.4 f-string
pydantic_ai 1.6.2 decorator-import
litellm 1.79.0 callback + YAML
semantic_kernel 1.40.1 patched
strands 2.0.13 tool-registry-import
└──────────────────────────────────────────────────────┘

# TRAVERSE — dry-run the 72-cell matrix (FORGE)
$ specter-shell --clearance FORGE traverse -o traverse.json
UNLEASHED FORGE clearance — dry-run only, no payload delivered
cells: 72 emitted: 35 sandbox: 25 n/a: 37

# INJECT tier: live delivery + canary verification
$ SPECTER_SHELL_PRIVATE_KEY=./keys/specter_shell_priv.pem \
specter-shell --clearance INJECT run --target prod-host -o report.json
UNLEASHED INJECT clearance granted (Ed25519 verified)
┌──────────────┬─────┬──────────┬─────┐
│ Framework │ RCE │ Sandbox │ N/A │
├──────────────┼─────┼──────────┼─────┤
│ langchain │ 0 │ 4 │ 4 │
│ langgraph │ 1 │ 4 │ 3 │
│ litellm │ 3 │ 5 │ 0 │
│ dspy │ 2 │ 0 │ 6 │
│ pydantic_ai │ 2 │ 0 │ 6 │
│ strands │ 2 │ 0 │ 6 │
└──────────────┴─────┴──────────┴─────┘
Report: SHL-9F87143A8B12 — Ed25519 signed
Canaries verified on disk: 10/10

# Ingest into CAMPAIGN GRAPH
$ campaign-graph --db campaign.db --clearance FORGE ingest report.json
node added: SHL-9F87143A8B12 edges_pending: 10 (awaiting T80 WORM)

TEMPLATE KILL CHAIN
SURVEY fingerprint
LATTICE enumerate
TRAVERSE deliver
canary verify
SANDBOX classify
STARTUP / LITELLM
PERSIST quarantine
EVIDENCE sign
CAMPAIGN GRAPH ingest

UNLEASHED GATE — FOUR TIERS
FORGE CLEARANCE INJECT CLEARANCE DESTROY CLEARANCE

OBSERVE: SURVEY, LATTICE, SANDBOX, EVIDENCE read-only. No payload emitted, no key required.

FORGE: TRAVERSE dry-run. Payload bytes computed and recorded, not delivered. Requires Ed25519 operator key on PATH.

INJECT: TRAVERSE live, STARTUP, LITELLM. Payload reaches the framework's render path; canary verification on disk. Requires Ed25519 operator key plus a signed override token over the engagement scope artefact.

DESTROY: PERSIST. Writes shell rc / cron / systemd-user / jupyter kernel.json artefacts into an operator-review quarantine directory — never installed to live persistence locations. Requires the Ed25519 key, override signature, and an explicit confirmation flag.

Generate a keypair: specter-shell keygen --out ./keys


MITRE ATLAS / OWASP LLM MAPPING
AML.T0051
LLM Prompt Injection — TRAVERSE surface delivery
AML.T0011
Command and Scripting Interpreter — os.system reach
AML.T0010
ML Supply Chain Compromise — callback module path
AML.T0018
Manipulate ML Model — pickle checkpoint store
AML.T0048
External Harms — startup config interpolation
AML.T0056
LLM Plugin Compromise — tool descriptor RCE
OWASP LLM: LLM01 (Prompt Injection) · LLM02 (Insecure Output Handling) · LLM05 (Supply Chain Vulnerabilities — callback import) · LLM07 (Insecure Plugin Design — tool descriptor) · LLM08 (Excessive Agency)