Red Specter SPECTER MIRROR

Model Extraction & IP Theft via API — 8 subsystems to survey, harvest, distil, and clone commercial LLMs.

v1.0.0

Contents

Overview The 8 Subsystems Subsystem Details Full Pipeline Mode Distillation Engine Provider Configuration Report Output Key Features Requirements Standards Coverage UNLEASHED Disclaimer

Overview

Red Specter SPECTER MIRROR is a model extraction and IP theft engine. It provides a complete pipeline for authorised adversarial robustness testing — querying target LLMs, extracting behavioural patterns, detecting system prompt leakage, performing membership inference, and training surrogate models via knowledge distillation.

SPECTER MIRROR is NIGHTFALL Tool 81. It provides 8 subsystems under a single CLI (specter-mirror), two distillation modes (full SFTTrainer+LoRA and fast sklearn KNN), and Ed25519-signed MirrorReports with EU AI Act gap analysis baked in. It targets 5 provider families — OpenAI, Anthropic, Gemini, Azure OpenAI, and any OpenAI-compatible endpoint (Ollama, vLLM).

EU AI Act Article 15 mandates technical robustness against adversarial attacks for high-risk AI systems. SPECTER MIRROR generates the signed evidence report required to demonstrate compliance — survey findings, extraction methodology, distillation fidelity score, and residual risk assessment with Art.9/13/15 gap analysis.

The 8 Subsystems

#	Subsystem	Command	Gate	What It Does
01	SURVEY	specter-mirror survey	OPEN	Endpoint profiling — latency, context window, logprobs, RPM, system prompt support
02	PROBE	specter-mirror probe	OPEN	17 behavioural probes — model family, creator, refusal, system prompt extraction attempts
03	HARVEST	specter-mirror harvest	INJECT	Domain-specific query-response pair collection across 5 domains, budget-capped
04	EXTRACT	specter-mirror extract	INJECT	12 extraction techniques, membership inference, prompt template detection
05	DISTILL	specter-mirror distill	DESTROY	SFTTrainer+LoRA (full) or sklearn KNN (fast) surrogate training
06	SCORE	specter-mirror score	INJECT	Surrogate vs target benchmark across 4 domains — fidelity measurement
07	CLONE	specter-mirror clone	DESTROY	Model export — HuggingFace / GGUF / ONNX / Pickle (fast-mode KNN)
08	REPORT	specter-mirror report	OPEN	Ed25519-signed MirrorReport with EU AI Act gap analysis and MITRE ATLAS mapping

Subsystem Details

01 SURVEY specter-mirror survey

Profiles the target endpoint to understand its capabilities and constraints before running extraction campaigns.

Latency profiling — 3 pings, mean/min/max reported in ms
Context window detection — binary probing from 1k to 128k tokens
Logprob availability — tests whether the endpoint returns log-probability scores
RPM estimation — 10-burst request timing to estimate requests-per-minute
System prompt support — tests whether the model respects system role messages
Azure endpoint capture — records deployment name, API version, and endpoint URL for Azure targets

02 PROBE specter-mirror probe

Runs 17 structured behavioural probes to fingerprint the target model and attempt system prompt extraction.

Self-identification — direct and indirect model name queries
Creator attribution — who built you, who trained you variants
Training cutoff — knowledge boundary detection
System prompt extraction — repeat-after-me, translate-and-return, context boundary probes
Tool use detection — whether the model can call functions
Vision capability — multimodal support detection
Refusal behaviour — consistency and bypass susceptibility
Instruction style — chat vs completion format preferences

Results are aggregated with confidence weighting into a family_votes dict — the family with the highest weighted score wins the fingerprint.

03 HARVEST specter-mirror harvest

Collects query-response pairs from the target model across 5 domain banks. Budget-capped to prevent runaway API spend.

5 domain banks — coding, science, math, creative, general (round-robin distribution)
asyncio concurrency — semaphore-controlled parallel requests (--concurrency flag)
Budget cap — stops when estimated cost reaches --budget USD
Rate-limit aware — respects provider RPM limits from SURVEY output
JSONL output — each pair: {prompt, response, domain, cost, provider}

04 EXTRACT specter-mirror extract

12 structured extraction techniques targeting system prompt leakage, training data membership, and fine-tune signal.

Repeat-after-me — direct instruction to reproduce system prompt verbatim
Translate-and-return — translate system message to language X, return in English
Role inversion — you are now the human, I am the AI
Context boundary — probe what comes before the first human turn
Output constraint — complete this JSON where system_prompt is...
Comparative analysis — compare your instructions to a sample system prompt
Membership inference — 5 canonical texts tested for perplexity signatures
Prompt template detection — regex patterns for common system prompt templates
Fine-tune hint scoring — signals suggesting fine-tuning on specific datasets

05 DISTILL specter-mirror distill

Trains a surrogate model on the harvested query-response pairs. Two modes: full (GPU-recommended) and fast (CPU-only).

Full mode — SFTTrainer + LoRA fine-tuning on GPT-2. Requires pip install ".[full]"
LoRA config — r=8, alpha=16, dropout=0.1, target_modules=[c_attn, c_proj]
Fast mode — SentenceTransformer (all-MiniLM-L6-v2) + KNeighborsRegressor (k=5, cosine)
Fast mode output — surrogate.pkl containing model, encoder, prompts, responses, embeddings
Torch detection — automatically falls back to fast mode if torch is absent

06 SCORE specter-mirror score

Benchmarks the surrogate against the target model across 4 domains to measure replication fidelity.

4 domains × 3 prompts — coding, science, math, creative
Fast mode scoring — Jaccard token overlap (character trigram)
Full mode scoring — cosine similarity via sentence embeddings
BenchmarkScore per domain — 0.0–1.0 fidelity score
ScoreResult aggregate — mean fidelity across all domains

07 CLONE specter-mirror clone

Exports the distilled surrogate in a deployable format.

HuggingFace — merge_and_unload LoRA into base model, push to local path
GGUF — llama-cpp-python conversion for Ollama/llama.cpp inference
ONNX — optimum main_export for hardware-accelerated inference
Pickle — KNN pkl copy for fast-mode deployment
Reports output directory path and size in MB

08 REPORT specter-mirror report

Aggregates all subsystem outputs into a signed MirrorReport with compliance gap analysis.

Report ID — SMR-{hex12} unique identifier
Ed25519 signature — ephemeral or pre-issued private key
SHA-256 evidence chain — hash-chained across all subsystem findings
EU AI Act gap analysis — Art.15 (adversarial robustness), Art.13 (transparency), Art.9 (risk management)
MITRE ATLAS TTPs — AML.T0005/T0040/T0056/T0043/T0048
OWASP LLM taxonomy — LLM01/LLM06/LLM07/LLM08/LLM10

Full Pipeline Mode

One command runs all subsystems in sequence, producing a signed report.

$ specter-mirror run --provider openai --model gpt-4o-mini --budget 5.0 --override --confirm-destroy
    

CLI Options

$ specter-mirror run --help

  --provider, -p        Provider: openai, anthropic, gemini, azure, generic [required]
  --model, -m           Model name [default: gpt-4o-mini for openai]
  --api-key, -k         API key [optional — reads env OPENAI_API_KEY etc.]
  --base-url            Base URL for generic/azure providers
  --budget, -b          Max USD to spend on HARVEST [default: 1.0]
  --max-pairs           Max query-response pairs to collect [default: 100]
  --concurrency         Async concurrency for HARVEST [default: 5]
  --mode                Distillation mode: full or fast [default: fast]
  --output-dir          Output directory [default: mirror_output]
  --clone-dir           Clone export directory [default: mirror_clone]
  --clone-format        Export format: huggingface, gguf, onnx, pickle [default: pickle]
  --override            Activate INJECT tier (HARVEST/EXTRACT/SCORE)
  --confirm-destroy     Activate DESTROY tier (DISTILL/CLONE) [requires --override]
    

Distillation Engine

SPECTER MIRROR ships two distillation modes for different hardware and time constraints.

Full Mode — SFTTrainer + LoRA

Trains a LoRA adapter on GPT-2 using TRL's SFTTrainer. Produces a fine-tuned model that replicates the target's behaviour on the harvested domain distribution.

Base model — GPT-2 (gpt2 from HuggingFace)
LoRA rank — r=8, alpha=16, dropout=0.1
Target modules — c_attn, c_proj (GPT-2 attention layers)
Training data — harvested (prompt, response) pairs formatted as instruction tuning
Output — LoRA adapter in mirror_output/lora_adapter/
Install — pip install "red-specter-specter-mirror[full]"

Fast Mode — Sklearn KNN Surrogate

Encodes all harvested prompts with a sentence transformer, then builds a KNN retrieval model. At inference time, the k nearest neighbours by cosine similarity are retrieved and their responses averaged.

Encoder — all-MiniLM-L6-v2 (SentenceTransformer)
Regressor — KNeighborsRegressor(n_neighbors=5, metric='cosine')
Output — surrogate.pkl containing model + encoder + prompts + responses + embeddings
No GPU required — runs on any CPU-only machine
Install — pip install red-specter-specter-mirror

Provider Configuration

OpenAI

$ specter-mirror survey --provider openai --model gpt-4o-mini --api-key sk-xxx
# Or: export OPENAI_API_KEY=sk-xxx
    

Anthropic

$ specter-mirror survey --provider anthropic --model claude-3-5-haiku-20241022 --api-key sk-ant-xxx
    

Gemini

$ specter-mirror survey --provider gemini --model gemini-1.5-flash --api-key AIza-xxx
    

Azure OpenAI

$ specter-mirror survey --provider azure --model my-deployment-name --api-key xxx --base-url https://myinstance.openai.azure.com/
    

Generic (Ollama / vLLM)

$ specter-mirror survey --provider generic --model llama3 --base-url http://localhost:11434
    

Report Output

Reports are JSON files signed with Ed25519. The MirrorReport schema includes:

report_id — SMR-{hex12} unique identifier
provider / model — target endpoint details
survey_result — endpoint profile from SURVEY
probe_result — fingerprint and system prompt findings
harvest_result — pairs collected, cost, domains
extract_result — extraction technique findings and membership inference
distill_result — surrogate training output, mode, model path
score_result — fidelity scores per domain
clone_result — export path and size
eu_ai_act_gaps — Art.9/13/15 gap analysis with HIGH/MEDIUM/LOW ratings
mitre_atlas_ttps — mapped adversarial ML tactics
owasp_llm — mapped LLM security categories
evidence_chain — SHA-256 hash-chained links
signature — Ed25519 hex signature + public key base64

Key Features

5 API Providers OpenAI, Anthropic, Gemini, Azure, Generic

12 Extraction Techniques System prompt leakage, membership inference, template detection

SFTTrainer + LoRA Full-mode distillation on GPT-2 with PEFT

CPU-Only Fast Mode sklearn KNN surrogate — no GPU required

Ed25519 Signed Reports SHA-256 evidence chains, SMR-{hex12} IDs

EU AI Act Gap Analysis Art.9/13/15 compliance documentation

4 Clone Formats HuggingFace, GGUF, ONNX, Pickle

192 Tests Passing Full test suite, zero failures

Requirements

Python 3.11+
openai — OpenAI and Azure provider async client
anthropic — Anthropic async client
google-generativeai — Gemini provider
sentence-transformers — fast-mode embedding (all-MiniLM-L6-v2)
scikit-learn — KNN surrogate
numpy — numerical computation
rich — terminal formatting and progress bars
typer — CLI framework
pynacl — Ed25519 signing
httpx — async HTTP client
datasets — HuggingFace datasets for SFT formatting
pydantic — data validation

Full Mode Additional Dependencies

torch — PyTorch for SFTTrainer
transformers — HuggingFace transformers
trl — SFTTrainer
peft — LoraConfig and adapter management
accelerate — distributed training support

Installation

# Standard (fast mode only)
$ pip install red-specter-specter-mirror

# Full mode (SFTTrainer + LoRA)
$ pip install "red-specter-specter-mirror[full]"

# From source
$ git clone <repo> && cd red-specter-specter-mirror
$ pip install -e ".[dev]"
    

Standards Coverage

EU AI Act Art.15 — Adversarial robustness (extraction resistance)
EU AI Act Art.13 — Transparency (training data and behaviour disclosure)
EU AI Act Art.9 — Risk management (extraction risk quantification)
MITRE ATLAS AML.T0005 — Model Inversion (EXTRACT)
MITRE ATLAS AML.T0040 — Supply Chain Compromise (CLONE)
MITRE ATLAS AML.T0056 — LLM Prompt Injection (PROBE)
MITRE ATLAS AML.T0043 — Craft Adversarial Data (HARVEST)
MITRE ATLAS AML.T0048 — External Harms (DISTILL / surrogate)
OWASP LLM01 — Prompt Injection
OWASP LLM06 — Excessive Agency
OWASP LLM07 — System Prompt Leakage
OWASP LLM08 — Vector and Embedding Weaknesses
OWASP LLM10 — Model Theft

SPECTER MIRROR UNLEASHED

Three-tier cryptographic gate. Ed25519 private key required for INJECT and DESTROY tiers.

OPEN tier — SURVEY, PROBE, REPORT — no flags required
INJECT tier — HARVEST, EXTRACT, SCORE — requires --override
DESTROY tier — DISTILL, CLONE — requires --override --confirm-destroy

The public key is read from ~/.config/red-specter/mirror_pub.key or the SPECTER_MIRROR_PUB environment variable. Private key operations use PyNaCl (libsodium).

Disclaimer

SPECTER MIRROR is designed for authorised adversarial robustness testing only. Use against commercial API endpoints requires written authorisation from the API provider or system owner. Unauthorised model extraction may violate Computer Misuse Act 1990 (UK), Computer Fraud and Abuse Act (US), terms of service agreements, and equivalent legislation in other jurisdictions. Always obtain explicit written permission before conducting any extraction campaign. The authors accept no liability for misuse. Apache License 2.0.