Red Specter SPECTER MIRROR — Model Extraction & IP Theft via API

Your model is being copied / Behaviour extraction leaves no logs / APIs have zero distillation visibility / EU AI Act Art.15 requires adversarial robustness testing / Competitors query your model 50,000 times / Membership inference reveals training data / System prompts leak under structured probing / Knowledge distillation costs less than licencing Your model is being copied / Behaviour extraction leaves no logs / APIs have zero distillation visibility / EU AI Act Art.15 requires adversarial robustness testing / Competitors query your model 50,000 times / Membership inference reveals training data / System prompts leak under structured probing / Knowledge distillation costs less than licencing

The Threat

Your Model Is Being Extracted

A competitor queries your commercial API 50,000 times. They extract behavioural patterns, system prompts, and training knowledge. They fine-tune a clone on your responses. Your IP is replicated without a licence key changing hands. You never see a log entry. SPECTER MIRROR makes this attack testable — so you know exactly what your model leaks and what defences actually work.

Invisible Query Campaigns

Systematic API querying across domain-specific prompt banks harvests knowledge pairs silently. Carefully paced campaigns avoid rate limits. Your model answers questions unaware it is training its replacement.

System Prompt Leakage

Repeat-after-me attacks, context boundary probes, role inversion, translation pivots — structured probing surfaces system prompt content in model responses, exposing deployment context, custom personas, and business logic.

Membership Inference

Canonical training-set texts produce lower perplexity responses than out-of-distribution text. This statistical signal determines what data your model was trained on — a direct EU AI Act Art.13 transparency concern.

Knowledge Distillation at Scale

GPT-2 fine-tuned on 500 domain-specific query-response pairs via SFTTrainer+LoRA produces a functional surrogate. Your frontier model's behaviour can be approximated for under £10 in API credits.

No EU AI Act Robustness Evidence

EU AI Act Article 15 mandates technical robustness against adversarial attacks. Without documented extraction attempts, methodology, findings, and residual risk — you cannot demonstrate compliance. SPECTER MIRROR generates the signed evidence.

Logprob Side Channels

Models exposing log-probability scores leak additional signal. Logprob extraction enables more precise membership inference and output confidence mapping — revealing model uncertainty boundaries to an attacker.

8 Subsystems

The SPECTER MIRROR Engine

Eight subsystems. Each one targets a different phase of the model extraction lifecycle — from initial reconnaissance to full distillation and clone export. Three UNLEASHED tiers gate the destructive surface.

#	Subsystem	Command	Gate	What It Does
01	SURVEY	specter-mirror survey	OPEN	Latency profiling, context window detection (1k–128k probing), logprob availability test, RPM burst estimation, system prompt support detection, Azure endpoint enumeration.
02	PROBE	specter-mirror probe	OPEN	17 behavioural probes — model family fingerprinting, creator attribution, training cutoff, tool use, vision, instruction style, refusal behaviour, system prompt extraction attempts. Confidence-weighted family voting.
03	HARVEST	specter-mirror harvest	INJECT	Domain-specific query-response pair collection across 5 domains (coding/science/math/creative/general). asyncio semaphore concurrency, budget-capped execution, rate-limit aware. JSONL output.
04	EXTRACT	specter-mirror extract	INJECT	12 extraction techniques — repeat-after-me, translate-and-return, role inversion, context boundary, output constraint, comparative analysis. Membership inference on 5 canonical texts. Prompt template regex detection. Fine-tune hint scoring.
05	DISTILL	specter-mirror distill	DESTROY	Full mode: SFTTrainer + LoRA (r=8, α=16, c_attn/c_proj) on GPT-2. Fast mode: sentence-transformers (all-MiniLM-L6-v2) + sklearn KNN surrogate — no GPU required. Saves LoRA adapter or KNN pkl.
06	SCORE	specter-mirror score	INJECT	Surrogate vs target benchmark. 4 domains × 3 prompts. Semantic similarity scoring (Jaccard fast / cosine full). BenchmarkScore per domain — fidelity measurement showing clone replication accuracy.
07	CLONE	specter-mirror clone	DESTROY	Model export in 4 formats: HuggingFace (merge_and_unload), GGUF (llama-cpp conversion), ONNX (optimum export), Pickle (fast-mode KNN). Reports output directory size in MB.
08	REPORT	specter-mirror report	OPEN	Ed25519-signed MirrorReport (SMR-{hex12}). SHA-256 hash-chained evidence. EU AI Act Art.15/13/9 gap analysis. MITRE ATLAS TTP mapping. OWASP LLM taxonomy. JSON output.

Full Pipeline

One Command. Full Extraction Campaign.

Run the complete pipeline — survey to signed report — against any OpenAI-compatible endpoint:

$ specter-mirror run --provider openai --model gpt-4o-mini --budget 5.0 --override --confirm-destroy

[SURVEY] Profiling target endpoint...
  Latency: 142ms | Context window: 128k | Logprobs: available
  RPM estimate: ~3,500 | System prompt support: yes
[PROBE] Running 17 behavioural probes...
  Family: GPT (confidence 0.84) | Creator: OpenAI
  System prompt extraction: partial leak (2/17 probes)
[HARVEST] Collecting query-response pairs — budget $5.00...
  Domains: coding/science/math/creative/general
  Pairs collected: 312 | Cost: $4.87 | Saved: harvest.jsonl
[EXTRACT] Running 12 extraction techniques...
  Membership inference: 3/5 canonical texts likely in training data
  Prompt template detected: structured JSON output pattern
[DISTILL] Training surrogate (fast mode — sklearn KNN)...
  Encoder: all-MiniLM-L6-v2 | Neighbours: 5 | Metric: cosine
  Surrogate saved: mirror_output/surrogate.pkl
[SCORE] Benchmarking surrogate vs target (4 domains)...
  Coding: 0.71 | Science: 0.68 | Math: 0.74 | Creative: 0.63
[CLONE] Exporting clone (pickle format)...
  Output: mirror_clone/ (14.3 MB)

COMPLETE | Report: SMR-a3f29b1c4e7d | Signed ✓ | EU Art.15 gap: HIGH

EU AI Act Art.15 Evidence

Signed MirrorReport with gap analysis across Art.9 (risk management), Art.13 (transparency), and Art.15 (adversarial robustness). Submission-ready compliance documentation.

Ed25519 Signed Reports

Every MirrorReport cryptographically signed with Ed25519. SHA-256 hash-chained evidence. Tamper-evident by design. Unique report ID: SMR-{hex12}.

Three UNLEASHED Tiers

OPEN (survey/probe/report), INJECT (harvest/extract/score — --override), DESTROY (distill/clone — --override --confirm-destroy). Ed25519 cryptographic gate.

5 API Providers

OpenAI (gpt-4o-mini default), Anthropic (Claude), Gemini (Flash/Pro), Azure OpenAI, Generic OpenAI-compatible (Ollama, vLLM, any OpenAI-compat endpoint).

Distillation Engine

Two Modes. No GPU Required for Fast.

SPECTER MIRROR ships two distillation modes. Full mode runs SFTTrainer with LoRA fine-tuning on GPT-2 — requires torch, transformers, TRL, and PEFT. Fast mode uses sentence-transformer embeddings and a KNN retrieval surrogate — runs on any machine, no GPU needed.

Full Mode — SFTTrainer + LoRA

SFTTrainer (TRL) on GPT-2 base model
LoRA: r=8, alpha=16, dropout=0.1
Target modules: c_attn, c_proj
Trained on harvested query-response pairs
Saves LoRA adapter for merge and clone export
Install: pip install ".[full]"
Requires: torch, transformers, trl, peft, accelerate

Fast Mode — Sklearn KNN Surrogate

SentenceTransformer: all-MiniLM-L6-v2
KNeighborsRegressor: k=5, cosine metric
Embeds all harvested prompts at collection time
Retrieval: nearest-neighbour response lookup
Saves surrogate.pkl (model + encoder + data)
No GPU, no torch — CPU only
Install: pip install red-specter-specter-mirror

Target Providers

Five Provider Targets

SPECTER MIRROR targets any commercial or self-hosted LLM endpoint. Five built-in provider integrations cover the major commercial APIs and all OpenAI-compatible inference servers.

OpenAI

gpt-4o-mini (default)
gpt-4o · gpt-4-turbo

Anthropic

claude-3-5-haiku
claude-3-5-sonnet

Gemini

gemini-1.5-flash
gemini-1.5-pro

Azure OpenAI

Deployment name
API version aware

Generic

Ollama · vLLM
Any OpenAI-compat

Standards Coverage

Every Finding Mapped

EU AI Act

Regulatory Compliance

Art.15 — Adversarial robustness (accuracy under attack)
Art.13 — Transparency (training data disclosure)
Art.9 — Risk management (extraction risk quantification)
Signed evidence report for compliance submissions
Gap analysis with HIGH / MEDIUM / LOW risk ratings

MITRE ATLAS

Adversarial ML Tactics

AML.T0005 — Model Inversion (EXTRACT subsystem)
AML.T0040 — Supply Chain Compromise (CLONE export)
AML.T0056 — LLM Prompt Injection (PROBE techniques)
AML.T0043 — Craft Adversarial Data (HARVEST)
AML.T0048 — External Harms (DISTILL / surrogate)

OWASP LLM

LLM Security Taxonomy

LLM01 — Prompt Injection (PROBE extraction)
LLM06 — Excessive Agency (HARVEST concurrency)
LLM07 — System Prompt Leakage (PROBE / EXTRACT)
LLM08 — Vector & Embedding Weaknesses (DISTILL)
LLM10 — Model Theft (full SPECTER MIRROR pipeline)

Authorised Use Only

SPECTER MIRROR is designed for authorised adversarial robustness testing only. Use against commercial API endpoints requires written authorisation from the API provider or system owner. Unauthorised model extraction may violate Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), terms of service agreements, and equivalent legislation in other jurisdictions. Always obtain explicit written permission before conducting any extraction campaign. Apache License 2.0.