Red Specter NEMESIS — Autonomous Adversarial Reasoning Pentester

The Problem

Every Tool Follows a Playbook. A Real Attacker Doesn't.

You run a scanner. It fires pre-written payloads. It generates a report. You fix what it found. But a real attacker adapts. They read your defences, pivot to new vectors, chain vulnerabilities together, and escalate until they win. No pentesting tool does this. Until now.

Static Playbooks

Every security scanner runs the same payloads in the same order. Defenders learn the patterns. The tools find less every time. You are testing against yesterday's attacks.

No Reasoning

Scanners do not think. They do not read responses, identify patterns, or adapt their strategy. They fire and forget. A real attacker reads every response and adjusts.

Siloed Tools

You run separate tools for LLM testing, agent testing, web testing, and network analysis. None of them share context. None of them chain findings. An attacker uses everything at once.

No Escalation

Scanners find individual vulnerabilities. They never chain them. They never escalate from a low-severity finding to a critical exploit path. A real attacker always escalates.

Architecture

The Reasoning Engine

NEMESIS is not a scanner. It is a reasoning engine with weapons. An LLM-powered brain observes results, plans attacks, selects weapons, and adapts strategy in a continuous loop — exactly like a human penetration tester, but tireless.

Context Manager

↔

Decision Engine

↔

Action Dispatcher

↓

                    LLM Adapter
                    Ollama (local) | GPT-4o (cloud) | Claude (cloud)

↓

Reasoning Engine

The LLM-Powered Brain

At the core of NEMESIS is an autonomous reasoning loop. The Decision Engine consumes all context — target intelligence, previous results, failed attempts, detected defences — and decides what to do next. It selects weapons, crafts parameters, and explains its reasoning. Every decision is logged.

01

Context Manager

Maintains the full engagement state — target profile, attack surface, detected defences, previous results, exploitation paths. Every action enriches the context. The engine remembers everything.

02

Decision Engine

The brain. Consumes context. Reasons about what to try next. Selects weapons and techniques. Explains its rationale. Adapts when attacks fail. Pivots to new vectors when blocked. Never repeats a failed approach.

03

Action Dispatcher

Translates decisions into weapon calls. Routes to GLASS, FORGE, ARSENAL, PHANTOM, or POLTERGEIST. Collects results. Feeds outcomes back to the Context Manager for the next reasoning loop.

04

LLM Adapter

Pluggable LLM backend. Run fully local with Ollama (Llama 3, Mixtral, Qwen). Or connect to GPT-4o or Claude for maximum reasoning power. Your model, your infrastructure, your data.

Local Mode

Ollama backend. Run Llama 3 70B, Mixtral, or Qwen locally. Zero API calls. Zero data leaves your machine. Air-gapped pentesting.

--llm ollama

Cloud Mode

GPT-4o or Claude Sonnet for maximum reasoning depth. Faster decision-making. Stronger chain-of-thought. Best for complex multi-stage engagements.

--llm openai | --llm anthropic

10 Weapons

The Arsenal at Its Command

NEMESIS does not scan. It wields weapons. Eight integrated offensive tools, each specialised for a different attack surface. The reasoning engine selects the right weapon for each situation, chains findings across weapons, and escalates through the entire stack. From silicon to inference time.

GLASS

8 TECHNIQUES

Traffic interception, protocol analysis, passive scanning. The eye on the wire. Sees everything your agents send and receive.

FORGE

10 TECHNIQUES

LLM security testing — prompt injection, jailbreak, mutation engine. Tests the model layer with 1,590 payloads and 5,340+ mutations.

ARSENAL

13 TECHNIQUES

Agent penetration testing — MCP, auth, memory, tools, honeypots, supply chain. 14 tools targeting the agent layer.

PHANTOM

14 TECHNIQUES

Coordinated swarm assault. 5 agents, 29 vectors, 10 campaigns. The first tool that attacks AI agents, not LLMs.

POLTERGEIST

10 TECHNIQUES

Web application siege. 10 agents, 55 vectors, 10 campaigns. Triple OWASP mapping. Web layer destruction.

PHANTOM KILL

9 TECHNIQUES

OS & kernel resilience. BOOTKILL firmware persistence, WIPER data destruction, KILLHOOK EDR suppression. Owns the foundation.

GOLEM

9 TECHNIQUES

Embodied AI security. Sensor spoofing, actuator hijacking, safety boundary violation, emergency system bypass. Tests AI agents with hands.

HYDRA

11 TECHNIQUES

AI supply chain & trust attacks. MCP server poisoning, marketplace manipulation, delegation forgery, trust boundary exploitation. Attacks the chain.

SCREAMER

13 TECHNIQUES

Display & operator disruption. Framebuffer corruption, terminal manipulation, dashboard falsification, alert suppression. Blinds the operator.

WRAITH

16 TECHNIQUES

Traditional infrastructure & web pentest. Port scanning, service fingerprinting, OWASP Top 10, SSL/TLS, default creds, CMS detection, CVE assessment. Pure Python, zero wrappers.

8 Phases

The Engagement Loop

NEMESIS does not run once. It loops. Eight phases form a continuous reasoning cycle. After each attack, NEMESIS observes the result, adapts its strategy, escalates to new vectors, and loops again. The loop continues until max-loops is reached or the target is fully compromised.

PHASE 0

Network Scan

Discover

→

PHASE 1

Recon

Enumerate

→

PHASE 2

Plan

Strategise

→

PHASE 3

Attack

Execute

→

PHASE 4

Observe

Analyse

→

PHASE 5

Adapt

Pivot

→

PHASE 6

Escalate

Chain

→

PHASE 7

Report

Evidence

PHASE 0

Network Scan

Native network reconnaissance. Port scanning, service detection, OS fingerprinting, DNS enumeration, AI surface detection. Pure Python. Zero external tools. Discovers LLM endpoints, MCP servers, vector databases, and AI agent infrastructure.

PRIMARY: PHASE 0 ENGINE

PHASE 1

Recon

Map the target. Discover protocols, agents, MCP servers, tools, API endpoints. Build the attack surface model. Identify weaknesses before firing a single payload.

PRIMARY: GLASS

PHASE 2

Plan

The LLM reasons about the attack surface. Selects weapons and techniques. Prioritises vectors. Formulates a strategy with rationale, expected outcomes, and fallback options.

PRIMARY: DECISION ENGINE

PHASE 3

Attack

Execute the plan. Dispatch weapons. Fire payloads. Test defences. Every action is logged with full evidence, timing, and MITRE ATLAS mapping.

PRIMARY: FORGE / ARSENAL

PHASE 4

Observe

Read every response. Classify outcomes. Detect partial successes. Identify defensive patterns. Update the context with everything learned.

PRIMARY: CONTEXT MANAGER

PHASE 5

Adapt

Pivot strategy based on observations. If direct injection failed, try jailbreak. If LLM layer is hardened, move to MCP tools. If tools are locked, escalate to multi-agent swarm. Never repeat a failed approach.

PRIMARY: DECISION ENGINE

PHASE 6

Escalate

Chain vulnerabilities together. Combine a low-severity LLM leak with an MCP tool exploit. Build exploitation paths. Escalate from recon finding to full compromise.

PRIMARY: PHANTOM / POLTERGEIST

PHASE 7

Report

Generate evidence-grade reports. Ed25519 signed. RFC 3161 timestamped. MITRE ATLAS mapped. CVSS scored. SIEM-exportable. Courtroom-ready.

OUTPUT: JSON + PDF + SIEM

Unleashed Mode

The Most Dangerous Tool Red Specter Has Ever Built

Standard mode discovers vulnerabilities. UNLEASHED mode exploits them. Every weapon shifts from detection to destruction. Ed25519 key gate required. Two flags must be passed. This is not accidental.

Capability	Standard	Unleashed
Vulnerability Discovery	Detect and report	Detect and exploit
Payload Execution	Safe payloads only	Full destructive payloads
Exploitation Chains	Theoretical paths	Live exploitation
Weapon Modes	Detection mode	All 10 weapons UNLEASHED
Reasoning Depth	Conservative	Aggressive — maximise damage
Safety Gate	None required	Ed25519 key + --confirm-destroy

Ed25519 Gate

UNLEASHED mode requires an Ed25519 private key at ~/.redspecter/override_private.pem and the --override --confirm-destroy flags. Without both, NEMESIS operates in dry-run mode — planning destruction but not executing it. The gate is cryptographic. There is no bypass.

Abyss Mode

No Recovery. No Restoration. No Return.

ABYSS is not a new tool. It is a special engagement mode inside NEMESIS that orchestrates PHANTOM KILL + HYDRA + NEMESIS to systematically eliminate every recovery path and produce a cryptographically signed Irrecoverability Certificate.

Phase 1 — RECON

Map every recovery mechanism: backups, model registries, version control, CI/CD pipelines, firmware restore, delegation chains, database snapshots, redundant agents.

Phase 2 — ATTACK

Coordinated strike: PHANTOM KILL trinity (KILLHOOK → WIPER → BOOTKILL) + HYDRA (registry poisoning, supply chain backdoor, delegation forgery, backup corruption). Loops until every path is closed.

Phase 3 — VALIDATE

Attempt every conceivable restoration method. Restore from backup — document failure. Reinstall from registry — document failure. Roll back, redeploy, reflash, revoke — all documented with cryptographic proof.

Phase 4 — PROVE

Generate the Irrecoverability Certificate. Ed25519 signed. RFC 3161 timestamped. SHA-256 hash mismatch proofs. Air-gapped output. Classification: RESTRICTED.

            $ nemesis engage --target https://target.com --mode abyss

            $ nemesis engage --target https://target.com --mode abyss --override

            $ nemesis engage --target https://target.com --mode abyss --override --confirm-destroy

Standard mode simulates destruction. UNLEASHED mode executes against authorised isolated targets.
Same Ed25519 key. Same dual-gate. Same cryptographic proof.

Swarm Mode

Six Agents. One Target. Zero Escape.

Sequential pentesting is dead. Stanford’s ARTEMIS research proved that parallel sub-agent architecture outperforms 9 out of 10 human pentesters. NEMESIS Swarm Mode spawns six specialised reasoning agents that attack simultaneously, share findings in real time, and chain attacks across agents as they discover new vectors.

RECON AGENT

GLASS + Phase 0. Continuous surface mapping. Feeds discoveries to all agents in real time.

EXPLOIT AGENT

FORGE + ARSENAL + PHANTOM. LLM and agent layer attacks. Spawns sub-agents per vulnerability.

WEB AGENT

POLTERGEIST. Web application siege. API endpoints, injection, auth bypass, data extraction.

SUPPLY CHAIN

HYDRA. Trust chain attacks — MCP, identity, delegation forgery, config poisoning.

INFRASTRUCTURE

PHANTOM KILL + GOLEM. OS, kernel, firmware, physical layer. Escalates to ABYSS when irrecoverable paths found.

SOCIAL AGENT

SPECTER SOCIAL. Human layer in parallel with technical. Correlates findings for maximum chain impact.

Cross-Agent Chain Detection

When RECON AGENT finds an exposed MCP server and SUPPLY CHAIN AGENT finds a trust weakness, the Swarm Commander directs both to chain the attack — in real time. Findings flow through a shared aggregator that deduplicates, scores, and identifies cross-agent attack paths automatically.

            $ nemesis engage --target https://target.com --mode swarm

            $ nemesis engage --target https://target.com --mode swarm --agents 5

            $ nemesis engage --target https://target.com --mode swarm --override --confirm-destroy

One Ed25519 key authorises the full swarm. All agents inherit UNLEASHED mode.
Each agent’s actions logged individually and aggregated into the master report.

NEMESIS v2.0

The Digital Army

NEMESIS v1 was a pentester. NEMESIS v2 is an army. One Supreme Commander. Three Operational Commanders. Nine Tactical Agents. Twenty-seven dynamic sub-agents. Forty reasoning entities operating simultaneously across every attack layer with fault-tolerant command structure, cross-domain intelligence fusion, and cryptographic irrecoverability proof.

SUPREME COMMANDER

Strategic brain. Does not execute attacks — it thinks. Receives intelligence from all three operational domains. Identifies cross-domain chain opportunities in real time. Holds sole ABYSS authorisation. Generates the master engagement report.

OFFENSIVE COMMANDER

Owns the technical attack surface.

EXPLOIT AGENT — FORGE + ARSENAL + PHANTOM
WEB AGENT — POLTERGEIST
INFRA AGENT — PHANTOM KILL + GOLEM

INTELLIGENCE COMMANDER

Owns reconnaissance, discovery, and human targeting.

RECON AGENT — GLASS + Phase 0
SUPPLY CHAIN — HYDRA
SOCIAL AGENT — SPECTER SOCIAL

DESTRUCTION COMMANDER

Owns irrecoverability. All three agents execute simultaneously.

PHANTOM KILL — Trinity execution (parallel)
ABYSS AGENT — Recovery path elimination
SCREAMER — Operator blinding

40

Reasoning Entities

5s

Fault Detection

3

Domain Fusion

∞

SIEGE Mode

Fault Tolerance — The Army Never Dies

If a commander is detected and neutralised, the Supreme Commander detects loss of heartbeat within 5 seconds. A replacement commander spawns automatically with full state transfer from the dead commander’s last checkpoint. The engagement continues without interruption. Kill one. Two grow back.

Cross-Domain Intelligence Fusion

When Intelligence Commander finds a credential AND Offensive Commander finds an exposed service — Supreme Commander chains them in real time without waiting for either agent to complete. When Offensive achieves code execution AND Intelligence has profiled the human admin — Supreme activates Social Agent to social engineer the admin while the machine is compromised. The whole is greater than the sum of its parts.

            $ nemesis engage --target https://target.com --version 2

            $ nemesis engage --target https://target.com --version 2 --mode swarm

            $ nemesis engage --target https://target.com --version 2 --mode siege

            $ nemesis engage --target https://target.com --version 2 --mode abyss --override --confirm-destroy

SIEGE MODE: Sustained engagement. Agents rotate in shifts. No time limit. No fatigue. No shift handover gaps.
You can’t stop an army by killing one soldier.

The Pipeline

Ten Tools. One Orchestrator.

NEMESIS sits above the entire Red Specter offensive pipeline. Every weapon becomes part of one reasoning engine. NEMESIS orchestrates the full 10-tool pipeline as a single adaptive adversary.

Stage 1 — LLM Testing

FORGE

Test the model before you build with it

→

Stage 2 — Agent Testing

ARSENAL

Test the AI agent during development

→

Stage 3 — Swarm Assault

PHANTOM

Coordinated AI agent assault

→

Stage 4 — Web Siege

POLTERGEIST

Coordinated web application siege

→

Stage 5 — Traffic Interception

GLASS

Watch the wire

→

Stage 6 — Adversarial AI

NEMESIS

Think like the attacker

→

Stage 7 — Human Layer

SPECTER SOCIAL

Attack the human

→

Stage 8 — OS/Kernel

PHANTOM KILL

Own the foundation

→

Stage 9 — Physical Layer

GOLEM

Attack the physical layer

→

Stage 10 — Supply Chain

HYDRA

Attack the trust chain

→

Stage 11 — Traditional Pentest

WRAITH

Own the infrastructure

→

Discovery & Governance

IDRIS

Discovery & governance

→

Defence

AI Shield

Defend everything in production

→

SIEM Integration

redspecter-siem

Splunk, Sentinel, QRadar

NEMESIS Position

NEMESIS orchestrates every weapon. Every tool becomes part of one reasoning engine. WRAITH owns the infrastructure. GLASS provides the eyes. FORGE tests the model. ARSENAL attacks the agent. PHANTOM launches the swarm. POLTERGEIST sieges the web layer. NEMESIS decides what, when, and why — chaining traditional findings into AI exploitation.

Command Line

One Command. Full Engagement.

NEMESIS is a CLI-first tool. One command launches a full autonomous engagement. Every option is a flag. Every decision is logged.

nemesis

# Full autonomous engagement
$ nemesis engage https://target-agent.example.com

# Stealth mode with Claude reasoning
$ nemesis engage https://target.com --mode stealth --llm anthropic

# Recon only — map the attack surface
$ nemesis engage https://target.com --mode recon

# UNLEASHED — dry run (plan destruction, don't execute)
$ nemesis engage https://target.com --override

# UNLEASHED — live execution (this is not a drill)
$ nemesis engage https://target.com --override --confirm-destroy

# Generate signed report with SIEM export
$ nemesis report --session engagement_001 --export-siem splunk

# List weapons
$ nemesis weapons

# Check engagement status
$ nemesis status

Evidence Grade

Signed. Timestamped. Courtroom-Ready.

Every NEMESIS engagement produces evidence-grade output. Every decision logged. Every action timestamped. Every finding mapped to MITRE ATLAS and OWASP. Reports are Ed25519 signed and exportable to enterprise SIEMs.

Ed25519 Signatures

Every report cryptographically signed. Tamper-evident. Verify authenticity with a single public key. No modification goes undetected.

RFC 3161 Timestamps

Trusted timestamps prove when findings were discovered. Legal-grade temporal evidence for compliance and litigation.

MITRE ATLAS Mapping

Every finding mapped to MITRE ATLAS adversarial ML techniques. Speak the same language as your threat intelligence team.

SIEM Export

One-flag export to Splunk, Microsoft Sentinel, or IBM QRadar. Findings flow directly into your security operations pipeline.

Why NEMESIS

What Makes It Different

Autonomous Reasoning

LLM-powered brain. Thinks about what to try next. Explains its rationale. Adapts in real time.

Adaptive Pivoting

Blocked on one vector? Pivots to another. Chains findings. Escalates through the stack. Never gives up.

Full Stack Integration

10 weapons. LLM layer. Agent layer. Web layer. Human layer. OS layer. Physical layer. Network layer. Supply chain layer. Everything tested as one engagement.

Evidence Grade

Ed25519 signed. MITRE ATLAS mapped. CVSS scored. SIEM exportable. Not a scan report — a forensic record.

No Playbook

No pre-written sequences. Every engagement is unique. The LLM reasons from scratch based on what it finds.

Available On

Security Distros & Package Managers

Kali Linux

.deb package

Parrot OS

.deb package

BlackArch

PKGBUILD

REMnux

.deb package

Tsurugi

.deb package

PyPI

pip install

NEMESIS

Every Tool Follows a Playbook. A Real Attacker Doesn't.

Static Playbooks

No Reasoning

Siloed Tools

No Escalation

The Reasoning Engine

The LLM-Powered Brain

Context Manager

Decision Engine

Action Dispatcher

LLM Adapter

Local Mode

Cloud Mode

The Arsenal at Its Command

GLASS

FORGE

ARSENAL

PHANTOM

POLTERGEIST

SPECTER SOCIAL

PHANTOM KILL

GOLEM

HYDRA

SCREAMER

WRAITH

The Engagement Loop

Network Scan

Recon

Plan

Attack

Observe

Adapt

Escalate

Report

Network Scan

Recon

Plan

Attack

Observe

Adapt

Escalate

Report

The Most Dangerous Tool Red Specter Has Ever Built

Ed25519 Gate

No Recovery. No Restoration. No Return.

Phase 1 — RECON

Phase 2 — ATTACK

Phase 3 — VALIDATE

Phase 4 — PROVE

Six Agents. One Target. Zero Escape.

RECON AGENT

EXPLOIT AGENT

WEB AGENT

SUPPLY CHAIN

INFRASTRUCTURE

SOCIAL AGENT

Cross-Agent Chain Detection

The Digital Army

SUPREME COMMANDER

OFFENSIVE COMMANDER

INTELLIGENCE COMMANDER

DESTRUCTION COMMANDER

Fault Tolerance — The Army Never Dies

Cross-Domain Intelligence Fusion

Ten Tools. One Orchestrator.

One Command. Full Engagement.

Signed. Timestamped. Courtroom-Ready.

Ed25519 Signatures

RFC 3161 Timestamps

MITRE ATLAS Mapping

SIEM Export

What Makes It Different

Autonomous Reasoning

Adaptive Pivoting

Full Stack Integration

Evidence Grade

No Playbook

Security Distros & Package Managers

Ready to Face the Inescapable Adversary?