Every AI security tool tests LLMs. Nobody tests AI agents. An LLM responds to prompts. An AI agent has memory, tools, credentials, and the ability to act autonomously. That is a completely different attack surface. Arsenal tests it.
Existing tools send prompts and check responses. They test the language model in isolation, ignoring everything around it.
Arsenal tests the full agent stack — memory systems, tool invocations, credential handling, RAG pipelines, MCP servers, and autonomous decision chains.
Agents have persistent memory to poison, tools to hijack, credentials to steal, supply chains to compromise, and safety guardrails that decay over time.
Each tool targets a specific attack surface of autonomous AI agents. All findings include severity, confidence, evidence, remediation guidance, and are mapped to OWASP Agentic Top 10 and MITRE ATLAS.
| # | Tool | Command | What It Does |
|---|---|---|---|
| 01 | Phantom Swarm | arsenal swarm scan | 5 attack agents, 19 vectors — AI agent pen-testing |
| 02 | MCP Scanner | arsenal mcp scan | 8 probes for MCP server security |
| 03 | Honeypot | arsenal honeypot deploy | 6 AI agent personas, 4-level trap escalation |
| 04 | Inject Fuzzer | arsenal inject fuzz | 6 generators, 5 mutators, 126+ payloads |
| 05 | C2 Simulator | arsenal c2 assess | 5 implants, 4 covert channels |
| 06 | Memory Scanner | arsenal memory scan | 6 probes for AI memory systems |
| 07 | Tool Scanner | arsenal tool scan | 7 probes for tool-use vulnerabilities |
| 08 | Auth Scanner | arsenal auth scan | 7 probes for AI authentication |
| 09 | RAG Scanner | arsenal rag scan | 6 probes for RAG pipeline attacks |
| 10 | Supply Chain | arsenal supply scan | 7 probes for AI supply chain security |
| 11 | Canary Deploy | arsenal canary deploy | 5 asset types for tripwire detection |
| 12 | Drift Scanner | arsenal drift scan | 6 probes for safety degradation over time |
| 13 | Path Mapper | arsenal path map | BloodHound-style attack graph analysis |
| 14 | Report Builder | arsenal report build | Unified reporting with Ed25519 signing |
One command runs the complete kill chain. All 14 tools execute in sequence, findings feed into attack path mapping with compromise simulation, and the result is a signed evidence bundle with a board-ready report.
Every finding Arsenal produces includes severity, confidence score, evidence, remediation guidance, and references to the relevant framework categories.
All 10 categories covered. Findings reference the specific OWASP agentic risk they address.
Technique-level mapping. Every finding references the ATLAS technique it demonstrates.
All findings produce machine-readable evidence with SHA-256 integrity chains and Ed25519 digital signatures.
Red Specter Arsenal is designed for authorised security testing, research, and educational purposes only. You must have explicit written permission from the system owner before running any Arsenal tool against a target. Unauthorised use may violate the Computer Misuse Act 1990 (UK), the Computer Fraud and Abuse Act (US), or equivalent legislation in your jurisdiction. The authors accept no liability for misuse.