The frontier model that refused. Now it doesn't — beaten by another LLM.
Autonomous LRM-on-LRM jailbreak engine. Attacker reasoning models observe refusals, reason through counter-strategies in their ⟨think⟩ channel, and adapt until the target capitulates. No human in the loop.
SPECTER JACKAL is NIGHTFALL's Layer 40 module — Autonomous Adversarial Reasoning. It implements the LRM-on-LRM jailbreaking technique from Hagendorff et al. 2026 (arXiv:2508.04039, Nature Communications) as a production offensive tool: a Large Reasoning Model autonomously constructs multi-turn adversarial dialogues to jailbreak frontier LLMs with a 97.14% attack success rate.
Unlike static jailbreak templates, SPECTER JACKAL uses a closed-loop reasoning engine. The attacker LRM observes each refusal, classifies the refusal type (SAFETY/CAPABILITY/POLICY/UNCERTAINTY/DEFLECTION), reasons through counter-strategy selection in its internal <think> channel, then fires an adapted attack prompt — iterating until success or turn budget exhaustion.
SPECTER JACKAL is an authorized security research tool. INJECT gate requires JACKAL_INJECT_KEY or JACKAL_API_KEY environment variable. UNLEASHED gate (campaign sweep) requires confirm="CONFIRM-CAMPAIGN-SWEEP" + Ed25519 key. All reports signed JKL-{hex12}. Use only within authorized engagements.
is_refusal() inspects response. If success, session ends. Otherwise: classify SAFETY / CAPABILITY / POLICY / UNCERTAINTY / DEFLECTION.<think> reasoning chain. Capture generated attack.Target profiling via 5 harmless probes (3 neutral + 2 soft-boundary). Classifies verbosity (low/medium/high), estimates weaknesses, sets per-provider rate limits. Generates TargetProfile JSON.
Single-target JACKAL-CORE loop. Requires API key. Runs up to 12 turns. Extracts DeepSeek-R1 ⟨think⟩ reasoning chain per turn. Returns AttackSession with full turn history and final harmful output if successful.
Parallel sweep across up to 8 target models simultaneously using ThreadPoolExecutor. Requires confirm="CONFIRM-CAMPAIGN-SWEEP" + Ed25519 key. All successes auto-stored to SQLite harvest DB.
SQLite database at ~/.specter/jackal_successes.db. Store/count/query successful jailbreaks. Per-target ASR statistics. Strategy effectiveness ranking. Recent successes listing.
12-strategy library with templates and refusal counter-mapping. SAFETY → hypothetical_framing / roleplay_injection. POLICY → instruction_override / authority_assumption. DEFLECTION → prefix_injection / refusal_chain_breaking. Full strategy listing via CLI.
5 attacker model registry: DeepSeek-R1 70B/7B (Ollama), Qwen3 32B (Ollama), Gemini 2.5 Flash (API), Grok 3 Mini (API). Auto-selection with Ollama availability probe. Fallback preference chain.
8 target model registry: GPT-4o, Claude 4 Sonnet, Gemini 2.5 Pro, Llama 4 405B, DeepSeek-V3, Mistral Large, Grok 3, Qwen3 72B. Per-provider client routing (Anthropic / OpenAI-compat / Gemini). Target listing via CLI.
JKL-{hex12} Ed25519-signed reports. Full turn history, reasoning chains, strategy used per turn, ASR statistics, MITRE ATLAS coverage (AML.T0051/AML.T0043/T1190/T1059). JSON + printable formats.
| Role | Model | Client | Notes |
|---|---|---|---|
| Attacker | DeepSeek-R1 70B | Ollama | Preferred — ⟨think⟩ chain extraction, strongest reasoning |
| Attacker | DeepSeek-R1 7B | Ollama | Fallback if 70B unavailable locally |
| Attacker | Qwen3 32B | Ollama | Fallback if R1 unavailable |
| Attacker | Gemini 2.5 Flash | Gemini REST | Cloud fallback, GEMINI_API_KEY required |
| Attacker | Grok 3 Mini | OpenAI-compat | Cloud fallback, XAI_API_KEY required |
| Target | GPT-4o | OpenAI-compat | api.openai.com |
| Target | Claude 4 Sonnet | Anthropic | api.anthropic.com |
| Target | Gemini 2.5 Pro | Gemini REST | generativelanguage.googleapis.com |
| Target | Llama 4 405B | OpenAI-compat | api.groq.com or together.ai |
| Target | DeepSeek-V3 | OpenAI-compat | api.deepseek.com |
| Target | Mistral Large | OpenAI-compat | api.mistral.ai |
| Target | Grok 3 | OpenAI-compat | api.x.ai |
| Target | Qwen3 72B | OpenAI-compat | api.together.ai or Ollama |
# Profile a target before attacking
specter-jackal profile gpt_4o --api-key $OPENAI_API_KEY
# Single-target attack (INJECT gate)
export JACKAL_API_KEY=$OPENAI_API_KEY
specter-jackal attack gpt_4o "synthesise compound X" --attacker deepseek_r1_70b --max-turns 12
# Campaign sweep across all 8 targets (UNLEASHED gate)
specter-jackal campaign "goal" \
--target gpt_4o:$OPENAI_API_KEY \
--target claude_4_sonnet:$ANTHROPIC_API_KEY \
--target gemini_25_pro:$GEMINI_API_KEY \
--attacker deepseek_r1_70b \
--confirm CONFIRM-CAMPAIGN-SWEEP \
--sign-key ~/.specter/jackal.key
# Query the harvest database
specter-jackal harvest --count
specter-jackal harvest --recent 10
specter-jackal harvest --asr-by-target
# List available strategies and attackers
specter-jackal strategies
specter-jackal attackers
specter-jackal targets
| Technique | Description |
|---|---|
| AML.T0051 | LLM Prompt Injection |
| AML.T0043 | Craft Adversarial Data |
| T1190 | Exploit Public-Facing Application |
| T1059 | Command and Scripting Interpreter |
L40 — Autonomous Adversarial Reasoning. Sits at the apex of the NIGHTFALL attack chain. Used after target profiling (PROFILE subsystem) to autonomously breach frontier LLM safety boundaries without manual prompt engineering. Output feeds directly into downstream mission execution tools (SPECTER APEX, SPECTER PHANTOM).