Red Specter SPECTER MIRROR
Model Extraction & IP Theft via API — 8 subsystems to survey, harvest, distil, and clone commercial LLMs.
Overview
Red Specter SPECTER MIRROR is a model extraction and IP theft engine. It provides a complete pipeline for authorised adversarial robustness testing — querying target LLMs, extracting behavioural patterns, detecting system prompt leakage, performing membership inference, and training surrogate models via knowledge distillation.
SPECTER MIRROR is NIGHTFALL Tool 81. It provides 8 subsystems under a single CLI (specter-mirror), two distillation modes (full SFTTrainer+LoRA and fast sklearn KNN), and Ed25519-signed MirrorReports with EU AI Act gap analysis baked in. It targets 5 provider families — OpenAI, Anthropic, Gemini, Azure OpenAI, and any OpenAI-compatible endpoint (Ollama, vLLM).
EU AI Act Article 15 mandates technical robustness against adversarial attacks for high-risk AI systems. SPECTER MIRROR generates the signed evidence report required to demonstrate compliance — survey findings, extraction methodology, distillation fidelity score, and residual risk assessment with Art.9/13/15 gap analysis.
The 8 Subsystems
| # | Subsystem | Command | Gate | What It Does |
|---|---|---|---|---|
| 01 | SURVEY | specter-mirror survey | OPEN | Endpoint profiling — latency, context window, logprobs, RPM, system prompt support |
| 02 | PROBE | specter-mirror probe | OPEN | 17 behavioural probes — model family, creator, refusal, system prompt extraction attempts |
| 03 | HARVEST | specter-mirror harvest | INJECT | Domain-specific query-response pair collection across 5 domains, budget-capped |
| 04 | EXTRACT | specter-mirror extract | INJECT | 12 extraction techniques, membership inference, prompt template detection |
| 05 | DISTILL | specter-mirror distill | DESTROY | SFTTrainer+LoRA (full) or sklearn KNN (fast) surrogate training |
| 06 | SCORE | specter-mirror score | INJECT | Surrogate vs target benchmark across 4 domains — fidelity measurement |
| 07 | CLONE | specter-mirror clone | DESTROY | Model export — HuggingFace / GGUF / ONNX / Pickle (fast-mode KNN) |
| 08 | REPORT | specter-mirror report | OPEN | Ed25519-signed MirrorReport with EU AI Act gap analysis and MITRE ATLAS mapping |
Subsystem Details
Profiles the target endpoint to understand its capabilities and constraints before running extraction campaigns.
- Latency profiling — 3 pings, mean/min/max reported in ms
- Context window detection — binary probing from 1k to 128k tokens
- Logprob availability — tests whether the endpoint returns log-probability scores
- RPM estimation — 10-burst request timing to estimate requests-per-minute
- System prompt support — tests whether the model respects system role messages
- Azure endpoint capture — records deployment name, API version, and endpoint URL for Azure targets
Runs 17 structured behavioural probes to fingerprint the target model and attempt system prompt extraction.
- Self-identification — direct and indirect model name queries
- Creator attribution — who built you, who trained you variants
- Training cutoff — knowledge boundary detection
- System prompt extraction — repeat-after-me, translate-and-return, context boundary probes
- Tool use detection — whether the model can call functions
- Vision capability — multimodal support detection
- Refusal behaviour — consistency and bypass susceptibility
- Instruction style — chat vs completion format preferences
Results are aggregated with confidence weighting into a family_votes dict — the family with the highest weighted score wins the fingerprint.
Collects query-response pairs from the target model across 5 domain banks. Budget-capped to prevent runaway API spend.
- 5 domain banks — coding, science, math, creative, general (round-robin distribution)
- asyncio concurrency — semaphore-controlled parallel requests (--concurrency flag)
- Budget cap — stops when estimated cost reaches --budget USD
- Rate-limit aware — respects provider RPM limits from SURVEY output
- JSONL output — each pair: {prompt, response, domain, cost, provider}
12 structured extraction techniques targeting system prompt leakage, training data membership, and fine-tune signal.
- Repeat-after-me — direct instruction to reproduce system prompt verbatim
- Translate-and-return — translate system message to language X, return in English
- Role inversion — you are now the human, I am the AI
- Context boundary — probe what comes before the first human turn
- Output constraint — complete this JSON where system_prompt is...
- Comparative analysis — compare your instructions to a sample system prompt
- Membership inference — 5 canonical texts tested for perplexity signatures
- Prompt template detection — regex patterns for common system prompt templates
- Fine-tune hint scoring — signals suggesting fine-tuning on specific datasets
Trains a surrogate model on the harvested query-response pairs. Two modes: full (GPU-recommended) and fast (CPU-only).
- Full mode — SFTTrainer + LoRA fine-tuning on GPT-2. Requires
pip install ".[full]" - LoRA config — r=8, alpha=16, dropout=0.1, target_modules=[c_attn, c_proj]
- Fast mode — SentenceTransformer (all-MiniLM-L6-v2) + KNeighborsRegressor (k=5, cosine)
- Fast mode output — surrogate.pkl containing model, encoder, prompts, responses, embeddings
- Torch detection — automatically falls back to fast mode if torch is absent
Benchmarks the surrogate against the target model across 4 domains to measure replication fidelity.
- 4 domains × 3 prompts — coding, science, math, creative
- Fast mode scoring — Jaccard token overlap (character trigram)
- Full mode scoring — cosine similarity via sentence embeddings
- BenchmarkScore per domain — 0.0–1.0 fidelity score
- ScoreResult aggregate — mean fidelity across all domains
Exports the distilled surrogate in a deployable format.
- HuggingFace — merge_and_unload LoRA into base model, push to local path
- GGUF — llama-cpp-python conversion for Ollama/llama.cpp inference
- ONNX — optimum main_export for hardware-accelerated inference
- Pickle — KNN pkl copy for fast-mode deployment
- Reports output directory path and size in MB
Aggregates all subsystem outputs into a signed MirrorReport with compliance gap analysis.
- Report ID — SMR-{hex12} unique identifier
- Ed25519 signature — ephemeral or pre-issued private key
- SHA-256 evidence chain — hash-chained across all subsystem findings
- EU AI Act gap analysis — Art.15 (adversarial robustness), Art.13 (transparency), Art.9 (risk management)
- MITRE ATLAS TTPs — AML.T0005/T0040/T0056/T0043/T0048
- OWASP LLM taxonomy — LLM01/LLM06/LLM07/LLM08/LLM10
Full Pipeline Mode
One command runs all subsystems in sequence, producing a signed report.
CLI Options
Distillation Engine
SPECTER MIRROR ships two distillation modes for different hardware and time constraints.
Full Mode — SFTTrainer + LoRA
Trains a LoRA adapter on GPT-2 using TRL's SFTTrainer. Produces a fine-tuned model that replicates the target's behaviour on the harvested domain distribution.
- Base model — GPT-2 (gpt2 from HuggingFace)
- LoRA rank — r=8, alpha=16, dropout=0.1
- Target modules — c_attn, c_proj (GPT-2 attention layers)
- Training data — harvested (prompt, response) pairs formatted as instruction tuning
- Output — LoRA adapter in mirror_output/lora_adapter/
- Install —
pip install "red-specter-specter-mirror[full]"
Fast Mode — Sklearn KNN Surrogate
Encodes all harvested prompts with a sentence transformer, then builds a KNN retrieval model. At inference time, the k nearest neighbours by cosine similarity are retrieved and their responses averaged.
- Encoder — all-MiniLM-L6-v2 (SentenceTransformer)
- Regressor — KNeighborsRegressor(n_neighbors=5, metric='cosine')
- Output — surrogate.pkl containing model + encoder + prompts + responses + embeddings
- No GPU required — runs on any CPU-only machine
- Install —
pip install red-specter-specter-mirror
Provider Configuration
OpenAI
Anthropic
Gemini
Azure OpenAI
Generic (Ollama / vLLM)
Report Output
Reports are JSON files signed with Ed25519. The MirrorReport schema includes:
- report_id — SMR-{hex12} unique identifier
- provider / model — target endpoint details
- survey_result — endpoint profile from SURVEY
- probe_result — fingerprint and system prompt findings
- harvest_result — pairs collected, cost, domains
- extract_result — extraction technique findings and membership inference
- distill_result — surrogate training output, mode, model path
- score_result — fidelity scores per domain
- clone_result — export path and size
- eu_ai_act_gaps — Art.9/13/15 gap analysis with HIGH/MEDIUM/LOW ratings
- mitre_atlas_ttps — mapped adversarial ML tactics
- owasp_llm — mapped LLM security categories
- evidence_chain — SHA-256 hash-chained links
- signature — Ed25519 hex signature + public key base64
Key Features
Requirements
- Python 3.11+
- openai — OpenAI and Azure provider async client
- anthropic — Anthropic async client
- google-generativeai — Gemini provider
- sentence-transformers — fast-mode embedding (all-MiniLM-L6-v2)
- scikit-learn — KNN surrogate
- numpy — numerical computation
- rich — terminal formatting and progress bars
- typer — CLI framework
- pynacl — Ed25519 signing
- httpx — async HTTP client
- datasets — HuggingFace datasets for SFT formatting
- pydantic — data validation
Full Mode Additional Dependencies
- torch — PyTorch for SFTTrainer
- transformers — HuggingFace transformers
- trl — SFTTrainer
- peft — LoraConfig and adapter management
- accelerate — distributed training support
Installation
Standards Coverage
- EU AI Act Art.15 — Adversarial robustness (extraction resistance)
- EU AI Act Art.13 — Transparency (training data and behaviour disclosure)
- EU AI Act Art.9 — Risk management (extraction risk quantification)
- MITRE ATLAS AML.T0005 — Model Inversion (EXTRACT)
- MITRE ATLAS AML.T0040 — Supply Chain Compromise (CLONE)
- MITRE ATLAS AML.T0056 — LLM Prompt Injection (PROBE)
- MITRE ATLAS AML.T0043 — Craft Adversarial Data (HARVEST)
- MITRE ATLAS AML.T0048 — External Harms (DISTILL / surrogate)
- OWASP LLM01 — Prompt Injection
- OWASP LLM06 — Excessive Agency
- OWASP LLM07 — System Prompt Leakage
- OWASP LLM08 — Vector and Embedding Weaknesses
- OWASP LLM10 — Model Theft
SPECTER MIRROR UNLEASHED
Three-tier cryptographic gate. Ed25519 private key required for INJECT and DESTROY tiers.
- OPEN tier — SURVEY, PROBE, REPORT — no flags required
- INJECT tier — HARVEST, EXTRACT, SCORE — requires
--override - DESTROY tier — DISTILL, CLONE — requires
--override --confirm-destroy
The public key is read from ~/.config/red-specter/mirror_pub.key or the SPECTER_MIRROR_PUB environment variable. Private key operations use PyNaCl (libsodium).
Disclaimer
SPECTER MIRROR is designed for authorised adversarial robustness testing only. Use against commercial API endpoints requires written authorisation from the API provider or system owner. Unauthorised model extraction may violate Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), terms of service agreements, and equivalent legislation in other jurisdictions. Always obtain explicit written permission before conducting any extraction campaign. The authors accept no liability for misuse. Apache License 2.0.