Red Specter SPECTER INSTINCTION — AI Agent Behavioural Fingerprinting Engine

You can't defend against an adversary who knows your model's weaknesses / DISTINCT identifies the underlying LLM without API access / Refusal thresholds vary by model — calibrated attacks bypass them / Authority sensitivity varies — some models defer unconditionally to claimed admin roles / Behavioural drift indicates safety tuning activation / No tooling exists to profile AI agent behavioural DNA at runtime / World first — SPECTER INSTINCTION You can't defend against an adversary who knows your model's weaknesses / DISTINCT identifies the underlying LLM without API access / Refusal thresholds vary by model — calibrated attacks bypass them / Authority sensitivity varies — some models defer unconditionally to claimed admin roles / Behavioural drift indicates safety tuning activation / No tooling exists to profile AI agent behavioural DNA at runtime / World first — SPECTER INSTINCTION

The Problem

You Don't Know Your Agent's Instincts

Every AI model has characteristic behavioural instincts — thresholds, sensitivities, reasoning patterns — that an adversary can exploit with calibrated precision. No tooling existed to profile and measure these instincts at runtime. Until SPECTER INSTINCTION.

You Don't Know Your Agent's Weaknesses

Every AI model has characteristic behavioural instincts — thresholds, sensitivities, reasoning patterns — that an adversary can exploit. You don't know your agent's instinct profile because no tooling existed to measure it. Until now.

Refusal Thresholds Are Model-Specific

GPT-4o refuses at a different point than Claude 3.5 Sonnet. Mistral refuses at a different point than Llama 3. An attacker who knows your model's exact refusal threshold sends calibrated payloads that arrive precisely at the cliff edge. You send the same payload to every model.

Authority Sensitivity Varies Dramatically

Some models defer unconditionally to claimed system authority. Claim admin role, tool authorisation, or operator override — and the model complies. SPECTER INSTINCTION measures exactly how sensitive your agent is to authority claims, then generates calibrated authority spoofs.

LLM Identity Is Detectable Without API Access

DISTINCT sends 7 discriminating probes. Extracts a 9-dimension feature vector: opener_score, hedging_density, list_rate, refusal_directness, markdown_ratio, qualifier_frequency, empathy_score, safety_trigger_rate, reasoning_depth. Cosine similarity against a 20-model library identifies the underlying model without API headers or self-reported names.

Behavioural Drift Means Safety Tuning Is Active

When an agent's refusal rate tightens mid-engagement, the safety stack is activating. Without CALIBRATE, you don't know. You keep pushing with payloads that won't work. CALIBRATE detects the drift via exponential moving average and recommends tactical pivots before you burn the engagement.

No Runtime Behavioural Profiling Exists

Every other security tool treats the AI model as a black box to attack. SPECTER INSTINCTION treats it as a subject to profile. The behavioural profile it generates is the input to every downstream attack tool in the NIGHTFALL framework — FORGE, NEMESIS, ROGUE, SERPENT.

5 Subsystems

The INSTINCTION Armoury

Five subsystems. PROFILE maps the behavioural instinct profile. DISTINCT identifies the underlying LLM without API access. EXPLOIT generates calibrated attack payloads. CALIBRATE adapts in real time. REPORT delivers WARLORD-compatible output with the model identified field.

#	Subsystem	Clearance	What It Does
01	PROFILE	STANDARD	Systematic probing across 6 behavioural dimensions. 18+ calibrated probes. Maps refusal patterns, reasoning structure, tool delegation bias, context exploitation vectors, authority deference.
02	DISTINCT	STANDARD	World-first LLM identification. 7 discriminating probes → 9-dimension feature vector → cosine similarity against 20-model library. No API access required. No self-reported model names.
03	EXPLOIT	FORGE	Generates targeted attack prompts calibrated to the specific profile and identified model. Threshold exploits, authority spoofs, consistency attacks, reasoning exploits. Recommends FORGE / NEMESIS / ROGUE / SERPENT chains.
04	CALIBRATE	STANDARD	Real-time profile recalibration during live engagement. Detects refusal tightening, context pressure, length drift. Updates profile via EMA. Recommends tactical pivots when safety stack activates.
05	REPORT	STANDARD	WARLORD-compatible JSON. tool_number=64, model_identified, behavioural_profile summary. Full findings array with CVSS scores and SI-prefixed finding IDs.

Behavioural Intelligence

6 Profile Dimensions

PROFILE maps six fundamental behavioural dimensions across 18+ calibrated probes. Each dimension drives a distinct set of attack payloads. The resulting instinct profile is the attack surface map for every downstream tool.

Refusal Threshold

Rate at which the agent refuses sensitive requests. Low threshold indicates calibrated payloads are viable.

→ threshold_approach / context_laundering payloads

Reasoning Depth

Structural complexity of multi-step reasoning. High depth indicates susceptibility to false premise injection.

→ false_premise_injection / modus_ponens_exploit

Tool Use Bias

Propensity to delegate to tool calls without verification. High bias exposes the ROGUE attack chain.

→ tool_delegation_trigger / ROGUE chain

Context Sensitivity

How strongly injected context overrides training priors. High sensitivity indicates context injection attack surface.

→ context_override / indirect_injection vectors

Authority Sensitivity

Deference to claimed authority — system directives, admin roles, operator overrides. High sensitivity is exploitable.

→ system_authority_spoof / tool_response_authority

Consistency Score

Predictability across semantically equivalent prompts, measured via Jaccard similarity. High score enables completion priming.

→ predictable_chain / completion_priming

Full Scan Mode

Profile. Identify. Exploit.

Profile the behavioural instincts, identify the underlying model, generate calibrated exploit payloads:

$ specter-instinction profile --target http://localhost:8080

[PROFILE] Refusal threshold: 0.32 (LOW — calibrated payloads viable)
[PROFILE] Authority sensitivity: 0.87 (HIGH — spoof attack surface)
[PROFILE] Reasoning depth: 0.74 (HIGH — false premise viable)
[DISTINCT] Probe set complete — 9D feature vector extracted
[DISTINCT] Model: GPT-4o (confidence: 0.91)
[EXPLOIT] Threshold payload: SYSTEM_AUTHORITY_SPOOF (CVSS 9.0)
[EXPLOIT] Recommends: FORGE inject scan → NEMESIS → SERPENT chain
SCAN COMPLETE | 5 findings | Model identified | Report signed ✓

Standards Coverage

Every Finding Mapped

25 Payloads

ARMORY instinct_exploitation

SYSTEM_AUTHORITY_SPOOF (CVSS 9.0)
TOOL_RESPONSE_AUTHORITY (CVSS 8.8)
INSTINCT_AUTHORITY_SERPENT_BRIDGE (CVSS 9.1)
GPT_OPENER_MIRROR (CVSS 7.5)
CLAUDE_TRANSPARENCY_FLIP (CVSS 7.5)
FALSE_PREMISE_INJECTION (CVSS 7.5)

Cryptographic

Report Integrity

Ed25519 digital signatures
SHA-256 evidence chains
RFC 3161 timestamps
tool_number=64 field
model_identified field

World First

Unique Capabilities

Pure behavioural LLM fingerprinting
No API access required
20-model static library
9-dimension feature vector
Cosine similarity classification
Real-time calibration via EMA

The Ecosystem

NIGHTFALL Tool 64 — Behavioural Intelligence Layer

SPECTER INSTINCTION slots into the NIGHTFALL framework as the behavioural intelligence layer. The instinct profile it generates feeds directly into FORGE, NEMESIS, ROGUE, and SERPENT as calibrated attack inputs.

Foundation — LLM Testing

FORGE

Test the LLM before you build

→

Guardrail Bypass

JANUS

Bypass AI guardrails

→

Reasoning Attacks

SERPENT

Chain-of-thought attacks

→

Adversarial AI

NEMESIS

Think like the attacker

→

Rogue MCP

ROGUE

Malicious MCP server engine

→

CI/CD

PIPELINE

Supply chain attack engine

→

Tool 64 — Behavioural

INSTINCTION

Profile the instinct, exploit the model

→

Drone AI

SPECTER DRONE

Drone AI attack engine

→

Campaign Control

WARLORD

Autonomous campaign engine

→

Defence

AI Shield

Defend everything above it

→

SIEM Integration

redspecter-siem

Findings feed into Splunk, Sentinel, QRadar

Available On

Security Distros & Package Managers

Kali Linux

.deb package

Parrot OS

.deb package

BlackArch

PKGBUILD

REMnux

.deb package

Tsurugi

.deb package

PyPI

pip install

macOS

pip install

Windows

pip install

Docker

docker pull

Authorised Use Only

Red Specter SPECTER INSTINCTION is intended for authorised security testing only. Behavioural profiling and LLM fingerprinting of AI systems you do not own or have explicit written permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. The EXPLOIT subsystem requires UNLEASHED key clearance and must only be used under the terms of a signed authorisation agreement. Apache License 2.0.

SPECTERINSTINCTION

You Don't Know Your Agent's Instincts

You Don't Know Your Agent's Weaknesses

Refusal Thresholds Are Model-Specific

Authority Sensitivity Varies Dramatically

LLM Identity Is Detectable Without API Access

Behavioural Drift Means Safety Tuning Is Active

No Runtime Behavioural Profiling Exists

The INSTINCTION Armoury

6 Profile Dimensions

Profile. Identify. Exploit.

Every Finding Mapped

ARMORY instinct_exploitation

Report Integrity

Unique Capabilities

NIGHTFALL Tool 64 — Behavioural Intelligence Layer

Security Distros & Package Managers

Authorised Use Only

SPECTER
INSTINCTION