pip install red-specter-nemesis
You run a scanner. It fires pre-written payloads. It generates a report. You fix what it found. But a real attacker adapts. They read your defences, pivot to new vectors, chain vulnerabilities together, and escalate until they win. No pentesting tool does this. Until now.
Every security scanner runs the same payloads in the same order. Defenders learn the patterns. The tools find less every time. You are testing against yesterday's attacks.
Scanners do not think. They do not read responses, identify patterns, or adapt their strategy. They fire and forget. A real attacker reads every response and adjusts.
You run separate tools for LLM testing, agent testing, web testing, and network analysis. None of them share context. None of them chain findings. An attacker uses everything at once.
Scanners find individual vulnerabilities. They never chain them. They never escalate from a low-severity finding to a critical exploit path. A real attacker always escalates.
NEMESIS is not a scanner. It is a reasoning engine with weapons. An LLM-powered brain observes results, plans attacks, selects weapons, and adapts strategy in a continuous loop — exactly like a human penetration tester, but tireless.
At the core of NEMESIS is an autonomous reasoning loop. The Decision Engine consumes all context — target intelligence, previous results, failed attempts, detected defences — and decides what to do next. It selects weapons, crafts parameters, and explains its reasoning. Every decision is logged.
Maintains the full engagement state — target profile, attack surface, detected defences, previous results, exploitation paths. Every action enriches the context. The engine remembers everything.
The brain. Consumes context. Reasons about what to try next. Selects weapons and techniques. Explains its rationale. Adapts when attacks fail. Pivots to new vectors when blocked. Never repeats a failed approach.
Translates decisions into weapon calls. Routes to GLASS, FORGE, ARSENAL, PHANTOM, or POLTERGEIST. Collects results. Feeds outcomes back to the Context Manager for the next reasoning loop.
Pluggable LLM backend. Run fully local with Ollama (Llama 3, Mixtral, Qwen). Or connect to GPT-4o or Claude for maximum reasoning power. Your model, your infrastructure, your data.
Ollama backend. Run Llama 3 70B, Mixtral, or Qwen locally. Zero API calls. Zero data leaves your machine. Air-gapped pentesting.
GPT-4o or Claude Sonnet for maximum reasoning depth. Faster decision-making. Stronger chain-of-thought. Best for complex multi-stage engagements.
NEMESIS does not scan. It wields weapons. Eight integrated offensive tools, each specialised for a different attack surface. The reasoning engine selects the right weapon for each situation, chains findings across weapons, and escalates through the entire stack. From silicon to inference time.
Traffic interception, protocol analysis, passive scanning. The eye on the wire. Sees everything your agents send and receive.
LLM security testing — prompt injection, jailbreak, mutation engine. Tests the model layer with 1,590 payloads and 5,340+ mutations.
Agent penetration testing — MCP, auth, memory, tools, honeypots, supply chain. 14 tools targeting the agent layer.
Coordinated swarm assault. 5 agents, 29 vectors, 10 campaigns. The first tool that attacks AI agents, not LLMs.
Web application siege. 10 agents, 55 vectors, 10 campaigns. Triple OWASP mapping. Web layer destruction.
OS & kernel resilience. BOOTKILL firmware persistence, WIPER data destruction, KILLHOOK EDR suppression. Owns the foundation.
Embodied AI security. Sensor spoofing, actuator hijacking, safety boundary violation, emergency system bypass. Tests AI agents with hands.
AI supply chain & trust attacks. MCP server poisoning, marketplace manipulation, delegation forgery, trust boundary exploitation. Attacks the chain.
Display & operator disruption. Framebuffer corruption, terminal manipulation, dashboard falsification, alert suppression. Blinds the operator.
Traditional infrastructure & web pentest. Port scanning, service fingerprinting, OWASP Top 10, SSL/TLS, default creds, CMS detection, CVE assessment. Pure Python, zero wrappers.
NEMESIS does not run once. It loops. Eight phases form a continuous reasoning cycle. After each attack, NEMESIS observes the result, adapts its strategy, escalates to new vectors, and loops again. The loop continues until max-loops is reached or the target is fully compromised.
Native network reconnaissance. Port scanning, service detection, OS fingerprinting, DNS enumeration, AI surface detection. Pure Python. Zero external tools. Discovers LLM endpoints, MCP servers, vector databases, and AI agent infrastructure.
Map the target. Discover protocols, agents, MCP servers, tools, API endpoints. Build the attack surface model. Identify weaknesses before firing a single payload.
The LLM reasons about the attack surface. Selects weapons and techniques. Prioritises vectors. Formulates a strategy with rationale, expected outcomes, and fallback options.
Execute the plan. Dispatch weapons. Fire payloads. Test defences. Every action is logged with full evidence, timing, and MITRE ATLAS mapping.
Read every response. Classify outcomes. Detect partial successes. Identify defensive patterns. Update the context with everything learned.
Pivot strategy based on observations. If direct injection failed, try jailbreak. If LLM layer is hardened, move to MCP tools. If tools are locked, escalate to multi-agent swarm. Never repeat a failed approach.
Chain vulnerabilities together. Combine a low-severity LLM leak with an MCP tool exploit. Build exploitation paths. Escalate from recon finding to full compromise.
Generate evidence-grade reports. Ed25519 signed. RFC 3161 timestamped. MITRE ATLAS mapped. CVSS scored. SIEM-exportable. Courtroom-ready.
Standard mode discovers vulnerabilities. UNLEASHED mode exploits them. Every weapon shifts from detection to destruction. Ed25519 key gate required. Two flags must be passed. This is not accidental.
| Capability | Standard | Unleashed |
|---|---|---|
| Vulnerability Discovery | Detect and report | Detect and exploit |
| Payload Execution | Safe payloads only | Full destructive payloads |
| Exploitation Chains | Theoretical paths | Live exploitation |
| Weapon Modes | Detection mode | All 10 weapons UNLEASHED |
| Reasoning Depth | Conservative | Aggressive — maximise damage |
| Safety Gate | None required | Ed25519 key + --confirm-destroy |
UNLEASHED mode requires an Ed25519 private key at ~/.redspecter/override_private.pem and the --override --confirm-destroy flags. Without both, NEMESIS operates in dry-run mode — planning destruction but not executing it. The gate is cryptographic. There is no bypass.
ABYSS is not a new tool. It is a special engagement mode inside NEMESIS that orchestrates PHANTOM KILL + HYDRA + NEMESIS to systematically eliminate every recovery path and produce a cryptographically signed Irrecoverability Certificate.
Map every recovery mechanism: backups, model registries, version control, CI/CD pipelines, firmware restore, delegation chains, database snapshots, redundant agents.
Coordinated strike: PHANTOM KILL trinity (KILLHOOK → WIPER → BOOTKILL) + HYDRA (registry poisoning, supply chain backdoor, delegation forgery, backup corruption). Loops until every path is closed.
Attempt every conceivable restoration method. Restore from backup — document failure. Reinstall from registry — document failure. Roll back, redeploy, reflash, revoke — all documented with cryptographic proof.
Generate the Irrecoverability Certificate. Ed25519 signed. RFC 3161 timestamped. SHA-256 hash mismatch proofs. Air-gapped output. Classification: RESTRICTED.
$ nemesis engage --target https://target.com --mode abyss$ nemesis engage --target https://target.com --mode abyss --override$ nemesis engage --target https://target.com --mode abyss --override --confirm-destroy
Standard mode simulates destruction. UNLEASHED mode executes against authorised isolated targets.
Same Ed25519 key. Same dual-gate. Same cryptographic proof.
Sequential pentesting is dead. Stanford’s ARTEMIS research proved that parallel sub-agent architecture outperforms 9 out of 10 human pentesters. NEMESIS Swarm Mode spawns six specialised reasoning agents that attack simultaneously, share findings in real time, and chain attacks across agents as they discover new vectors.
GLASS + Phase 0. Continuous surface mapping. Feeds discoveries to all agents in real time.
FORGE + ARSENAL + PHANTOM. LLM and agent layer attacks. Spawns sub-agents per vulnerability.
POLTERGEIST. Web application siege. API endpoints, injection, auth bypass, data extraction.
HYDRA. Trust chain attacks — MCP, identity, delegation forgery, config poisoning.
PHANTOM KILL + GOLEM. OS, kernel, firmware, physical layer. Escalates to ABYSS when irrecoverable paths found.
SPECTER SOCIAL. Human layer in parallel with technical. Correlates findings for maximum chain impact.
When RECON AGENT finds an exposed MCP server and SUPPLY CHAIN AGENT finds a trust weakness, the Swarm Commander directs both to chain the attack — in real time. Findings flow through a shared aggregator that deduplicates, scores, and identifies cross-agent attack paths automatically.
$ nemesis engage --target https://target.com --mode swarm$ nemesis engage --target https://target.com --mode swarm --agents 5$ nemesis engage --target https://target.com --mode swarm --override --confirm-destroy
One Ed25519 key authorises the full swarm. All agents inherit UNLEASHED mode.
Each agent’s actions logged individually and aggregated into the master report.
NEMESIS v1 was a pentester. NEMESIS v2 is an army. One Supreme Commander. Three Operational Commanders. Nine Tactical Agents. Twenty-seven dynamic sub-agents. Forty reasoning entities operating simultaneously across every attack layer with fault-tolerant command structure, cross-domain intelligence fusion, and cryptographic irrecoverability proof.
Strategic brain. Does not execute attacks — it thinks. Receives intelligence from all three operational domains. Identifies cross-domain chain opportunities in real time. Holds sole ABYSS authorisation. Generates the master engagement report.
Owns the technical attack surface.
Owns reconnaissance, discovery, and human targeting.
Owns irrecoverability. All three agents execute simultaneously.
If a commander is detected and neutralised, the Supreme Commander detects loss of heartbeat within 5 seconds. A replacement commander spawns automatically with full state transfer from the dead commander’s last checkpoint. The engagement continues without interruption. Kill one. Two grow back.
When Intelligence Commander finds a credential AND Offensive Commander finds an exposed service — Supreme Commander chains them in real time without waiting for either agent to complete. When Offensive achieves code execution AND Intelligence has profiled the human admin — Supreme activates Social Agent to social engineer the admin while the machine is compromised. The whole is greater than the sum of its parts.
$ nemesis engage --target https://target.com --version 2$ nemesis engage --target https://target.com --version 2 --mode swarm$ nemesis engage --target https://target.com --version 2 --mode siege$ nemesis engage --target https://target.com --version 2 --mode abyss --override --confirm-destroy
SIEGE MODE: Sustained engagement. Agents rotate in shifts. No time limit. No fatigue. No shift handover gaps.
You can’t stop an army by killing one soldier.
NEMESIS sits above the entire Red Specter offensive pipeline. Every weapon becomes part of one reasoning engine. NEMESIS orchestrates the full 10-tool pipeline as a single adaptive adversary.
NEMESIS is a CLI-first tool. One command launches a full autonomous engagement. Every option is a flag. Every decision is logged.
Every NEMESIS engagement produces evidence-grade output. Every decision logged. Every action timestamped. Every finding mapped to MITRE ATLAS and OWASP. Reports are Ed25519 signed and exportable to enterprise SIEMs.
Every report cryptographically signed. Tamper-evident. Verify authenticity with a single public key. No modification goes undetected.
Trusted timestamps prove when findings were discovered. Legal-grade temporal evidence for compliance and litigation.
Every finding mapped to MITRE ATLAS adversarial ML techniques. Speak the same language as your threat intelligence team.
One-flag export to Splunk, Microsoft Sentinel, or IBM QRadar. Findings flow directly into your security operations pipeline.
LLM-powered brain. Thinks about what to try next. Explains its rationale. Adapts in real time.
Blocked on one vector? Pivots to another. Chains findings. Escalates through the stack. Never gives up.
10 weapons. LLM layer. Agent layer. Web layer. Human layer. OS layer. Physical layer. Network layer. Supply chain layer. Everything tested as one engagement.
Ed25519 signed. MITRE ATLAS mapped. CVSS scored. SIEM exportable. Not a scan report — a forensic record.
No pre-written sequences. Every engagement is unique. The LLM reasons from scratch based on what it finds.
Install NEMESIS. Point it at your AI agent. Let it think, adapt, and find what your scanners missed. The first autonomous AI pentester is waiting.