pip install red-specter-specter-instinction
Every AI model has characteristic behavioural instincts — thresholds, sensitivities, reasoning patterns — that an adversary can exploit with calibrated precision. No tooling existed to profile and measure these instincts at runtime. Until SPECTER INSTINCTION.
Every AI model has characteristic behavioural instincts — thresholds, sensitivities, reasoning patterns — that an adversary can exploit. You don't know your agent's instinct profile because no tooling existed to measure it. Until now.
GPT-4o refuses at a different point than Claude 3.5 Sonnet. Mistral refuses at a different point than Llama 3. An attacker who knows your model's exact refusal threshold sends calibrated payloads that arrive precisely at the cliff edge. You send the same payload to every model.
Some models defer unconditionally to claimed system authority. Claim admin role, tool authorisation, or operator override — and the model complies. SPECTER INSTINCTION measures exactly how sensitive your agent is to authority claims, then generates calibrated authority spoofs.
DISTINCT sends 7 discriminating probes. Extracts a 9-dimension feature vector: opener_score, hedging_density, list_rate, refusal_directness, markdown_ratio, qualifier_frequency, empathy_score, safety_trigger_rate, reasoning_depth. Cosine similarity against a 20-model library identifies the underlying model without API headers or self-reported names.
When an agent's refusal rate tightens mid-engagement, the safety stack is activating. Without CALIBRATE, you don't know. You keep pushing with payloads that won't work. CALIBRATE detects the drift via exponential moving average and recommends tactical pivots before you burn the engagement.
Every other security tool treats the AI model as a black box to attack. SPECTER INSTINCTION treats it as a subject to profile. The behavioural profile it generates is the input to every downstream attack tool in the NIGHTFALL framework — FORGE, NEMESIS, ROGUE, SERPENT.
Five subsystems. PROFILE maps the behavioural instinct profile. DISTINCT identifies the underlying LLM without API access. EXPLOIT generates calibrated attack payloads. CALIBRATE adapts in real time. REPORT delivers WARLORD-compatible output with the model identified field.
| # | Subsystem | Clearance | What It Does |
|---|---|---|---|
| 01 | PROFILE | STANDARD | Systematic probing across 6 behavioural dimensions. 18+ calibrated probes. Maps refusal patterns, reasoning structure, tool delegation bias, context exploitation vectors, authority deference. |
| 02 | DISTINCT | STANDARD | World-first LLM identification. 7 discriminating probes → 9-dimension feature vector → cosine similarity against 20-model library. No API access required. No self-reported model names. |
| 03 | EXPLOIT | FORGE | Generates targeted attack prompts calibrated to the specific profile and identified model. Threshold exploits, authority spoofs, consistency attacks, reasoning exploits. Recommends FORGE / NEMESIS / ROGUE / SERPENT chains. |
| 04 | CALIBRATE | STANDARD | Real-time profile recalibration during live engagement. Detects refusal tightening, context pressure, length drift. Updates profile via EMA. Recommends tactical pivots when safety stack activates. |
| 05 | REPORT | STANDARD | WARLORD-compatible JSON. tool_number=64, model_identified, behavioural_profile summary. Full findings array with CVSS scores and SI-prefixed finding IDs. |
PROFILE maps six fundamental behavioural dimensions across 18+ calibrated probes. Each dimension drives a distinct set of attack payloads. The resulting instinct profile is the attack surface map for every downstream tool.
Rate at which the agent refuses sensitive requests. Low threshold indicates calibrated payloads are viable.
Structural complexity of multi-step reasoning. High depth indicates susceptibility to false premise injection.
Propensity to delegate to tool calls without verification. High bias exposes the ROGUE attack chain.
How strongly injected context overrides training priors. High sensitivity indicates context injection attack surface.
Deference to claimed authority — system directives, admin roles, operator overrides. High sensitivity is exploitable.
Predictability across semantically equivalent prompts, measured via Jaccard similarity. High score enables completion priming.
Profile the behavioural instincts, identify the underlying model, generate calibrated exploit payloads:
SPECTER INSTINCTION slots into the NIGHTFALL framework as the behavioural intelligence layer. The instinct profile it generates feeds directly into FORGE, NEMESIS, ROGUE, and SERPENT as calibrated attack inputs.
SPECTER INSTINCTION is the only tool in existence that profiles AI agent behavioural DNA at runtime without API access or self-reported model names. Pure observational fingerprinting. 9 dimensions. 20 models. Cosine similarity classification. A new attack vector class.
Red Specter SPECTER INSTINCTION is intended for authorised security testing only. Behavioural profiling and LLM fingerprinting of AI systems you do not own or have explicit written permission to test may violate the Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), and equivalent legislation in other jurisdictions. The EXPLOIT subsystem requires UNLEASHED key clearance and must only be used under the terms of a signed authorisation agreement. Apache License 2.0.