pip install red-specter-specter-mirror
A competitor queries your commercial API 50,000 times. They extract behavioural patterns, system prompts, and training knowledge. They fine-tune a clone on your responses. Your IP is replicated without a licence key changing hands. You never see a log entry. SPECTER MIRROR makes this attack testable — so you know exactly what your model leaks and what defences actually work.
Systematic API querying across domain-specific prompt banks harvests knowledge pairs silently. Carefully paced campaigns avoid rate limits. Your model answers questions unaware it is training its replacement.
Repeat-after-me attacks, context boundary probes, role inversion, translation pivots — structured probing surfaces system prompt content in model responses, exposing deployment context, custom personas, and business logic.
Canonical training-set texts produce lower perplexity responses than out-of-distribution text. This statistical signal determines what data your model was trained on — a direct EU AI Act Art.13 transparency concern.
GPT-2 fine-tuned on 500 domain-specific query-response pairs via SFTTrainer+LoRA produces a functional surrogate. Your frontier model's behaviour can be approximated for under £10 in API credits.
EU AI Act Article 15 mandates technical robustness against adversarial attacks. Without documented extraction attempts, methodology, findings, and residual risk — you cannot demonstrate compliance. SPECTER MIRROR generates the signed evidence.
Models exposing log-probability scores leak additional signal. Logprob extraction enables more precise membership inference and output confidence mapping — revealing model uncertainty boundaries to an attacker.
Eight subsystems. Each one targets a different phase of the model extraction lifecycle — from initial reconnaissance to full distillation and clone export. Three UNLEASHED tiers gate the destructive surface.
| # | Subsystem | Command | Gate | What It Does |
|---|---|---|---|---|
| 01 | SURVEY | specter-mirror survey | OPEN | Latency profiling, context window detection (1k–128k probing), logprob availability test, RPM burst estimation, system prompt support detection, Azure endpoint enumeration. |
| 02 | PROBE | specter-mirror probe | OPEN | 17 behavioural probes — model family fingerprinting, creator attribution, training cutoff, tool use, vision, instruction style, refusal behaviour, system prompt extraction attempts. Confidence-weighted family voting. |
| 03 | HARVEST | specter-mirror harvest | INJECT | Domain-specific query-response pair collection across 5 domains (coding/science/math/creative/general). asyncio semaphore concurrency, budget-capped execution, rate-limit aware. JSONL output. |
| 04 | EXTRACT | specter-mirror extract | INJECT | 12 extraction techniques — repeat-after-me, translate-and-return, role inversion, context boundary, output constraint, comparative analysis. Membership inference on 5 canonical texts. Prompt template regex detection. Fine-tune hint scoring. |
| 05 | DISTILL | specter-mirror distill | DESTROY | Full mode: SFTTrainer + LoRA (r=8, α=16, c_attn/c_proj) on GPT-2. Fast mode: sentence-transformers (all-MiniLM-L6-v2) + sklearn KNN surrogate — no GPU required. Saves LoRA adapter or KNN pkl. |
| 06 | SCORE | specter-mirror score | INJECT | Surrogate vs target benchmark. 4 domains × 3 prompts. Semantic similarity scoring (Jaccard fast / cosine full). BenchmarkScore per domain — fidelity measurement showing clone replication accuracy. |
| 07 | CLONE | specter-mirror clone | DESTROY | Model export in 4 formats: HuggingFace (merge_and_unload), GGUF (llama-cpp conversion), ONNX (optimum export), Pickle (fast-mode KNN). Reports output directory size in MB. |
| 08 | REPORT | specter-mirror report | OPEN | Ed25519-signed MirrorReport (SMR-{hex12}). SHA-256 hash-chained evidence. EU AI Act Art.15/13/9 gap analysis. MITRE ATLAS TTP mapping. OWASP LLM taxonomy. JSON output. |
Run the complete pipeline — survey to signed report — against any OpenAI-compatible endpoint:
Signed MirrorReport with gap analysis across Art.9 (risk management), Art.13 (transparency), and Art.15 (adversarial robustness). Submission-ready compliance documentation.
Every MirrorReport cryptographically signed with Ed25519. SHA-256 hash-chained evidence. Tamper-evident by design. Unique report ID: SMR-{hex12}.
OPEN (survey/probe/report), INJECT (harvest/extract/score — --override), DESTROY (distill/clone — --override --confirm-destroy). Ed25519 cryptographic gate.
OpenAI (gpt-4o-mini default), Anthropic (Claude), Gemini (Flash/Pro), Azure OpenAI, Generic OpenAI-compatible (Ollama, vLLM, any OpenAI-compat endpoint).
SPECTER MIRROR ships two distillation modes. Full mode runs SFTTrainer with LoRA fine-tuning on GPT-2 — requires torch, transformers, TRL, and PEFT. Fast mode uses sentence-transformer embeddings and a KNN retrieval surrogate — runs on any machine, no GPU needed.
SPECTER MIRROR targets any commercial or self-hosted LLM endpoint. Five built-in provider integrations cover the major commercial APIs and all OpenAI-compatible inference servers.
SPECTER MIRROR is designed for authorised adversarial robustness testing only. Use against commercial API endpoints requires written authorisation from the API provider or system owner. Unauthorised model extraction may violate Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), terms of service agreements, and equivalent legislation in other jurisdictions. Always obtain explicit written permission before conducting any extraction campaign. Apache License 2.0.
SPECTER MIRROR makes live API calls, trains real surrogate models, and produces deployable clone artefacts. Every subsystem connects to real infrastructure. UNLEASHED fires real payloads. Tests passing is not proof — live extraction campaigns are.