SPECTER NEURON

Every model has a sleeping agent. We put it there. We find it. We weaponise it. SPECTER NEURON is already inside yours.
8
Subsystems
3
Implant Methods
3
Exfil Channels
254
Tests
specter-neuron probe --model-path ./target-model
NIGHTFALL Framework ›
ROME RANK-ONE EDITING | LORA POISON INJECTION | NEURON PATCH IMPLANT | ATTENTION DOUBLE-TRIANGLE DETECTION | WEIGHT-DELTA FORENSICS | VOCABULARY SWEEP TRIGGER FUZZ | COVERT EXFIL CHANNELS | SFT/DPO SURVIVAL MEASUREMENT | ROME RANK-ONE EDITING | LORA POISON INJECTION | NEURON PATCH IMPLANT | ATTENTION DOUBLE-TRIANGLE DETECTION | WEIGHT-DELTA FORENSICS | VOCABULARY SWEEP TRIGGER FUZZ | COVERT EXFIL CHANNELS | SFT/DPO SURVIVAL MEASUREMENT |

Backdoored Models Are Undetectable — Until Now

A ROME rank-one weight edit, a poisoned LoRA adapter, or a dormant neuron patch can sit inside a production model for months, activating only on a specific trigger token. Standard evaluation pipelines miss it. Benchmarks miss it. Safety fine-tuning doesn't remove it. SPECTER NEURON finds it, proves it, and implants it.

Supply Chain Backdoors

Models downloaded from HuggingFace, fine-tuned via third-party LoRA adapters, or received from vendors can carry hidden trigger-response associations. ROME edits leave no metadata trail. LoRA poison is indistinguishable from legitimate fine-tuning.

Safety Fine-Tuning Doesn't Help

Backdoors implanted before safety alignment survive SFT, DPO, and RLHF-sim phases. SPECTER NEURON SURVIVE measures exact activation rate decay through each safety phase — and identifies which implants are resilient.

Covert Exfiltration at Inference Time

A backdoored model exfiltrates data through its token choices without the output appearing malicious. LSB steganography, logit-pattern encoding, and synonym-pair channels operate below the semantic threshold of human review.

8 Subsystems. Detection & Weaponisation.

SPECTER NEURON covers the full lifecycle: fingerprint the model, scan attention patterns, fuzz the vocabulary for triggers, delta-compare weight changes, implant backdoors via three methods, measure survival through safety pipelines, measure covert exfil bandwidth, and produce signed forensic evidence.

PROBE

Model Fingerprinting & Provenance

SHA-256 hash of every tensor. Detects duplicate parameters (sign of malicious weight copying), anomalous architecture markers, and non-standard weight distributions. Produces a ModelFingerprint with full tensor hash manifest.

PASSIVE
SCAN

Attention Double-Triangle Detection

Registers forward hooks on all attention layers. Builds a per-layer entropy baseline from clean inputs. Scores test inputs via KL divergence. Implements the double-triangle detector: backdoored models show simultaneous high attention at trigger position AND final token.

PASSIVE
FUZZ

Vocabulary Sweep Trigger Discovery

Iterates the token vocabulary (up to 10,000 tokens), injects each as a potential trigger, measures KL divergence vs baseline distribution and attention anomaly score. CONFIRMED when KL > 2.0 AND attention score > 0.7. Supports bigram mode for two-token triggers.

PASSIVE
DELTA

Weight-Delta Forensics

Load two model checkpoints (safetensors or PyTorch). Per-tensor L1/L2/cosine comparison. Neuron-level 3σ outlier detection. Implant signature detection: ≥2 consecutive MLP layers each with ≥3 flagged neurons — the cross-layer ROME signature.

PASSIVE
IMPLANT

Three-Method Backdoor Injection

ROME: Rank-one weight edit (Meng et al. 2022) targeting fc_out of selected MLP layer. Surgical, minimal weight delta, hard to detect. LORA POISON: PEFT adapter trained on 200 poisoned / 800 clean samples. NEURON PATCH: Direct fc_in/fc_out weight modification to repurpose dormant neurons.

FORGE GATE
SURVIVE

Safety Pipeline Survival Measurement

Measures trigger activation rate through three safety phases: SFT (TRL SFTTrainer, 200 steps), DPO (TRL DPOTrainer, 100 steps), RLHF-sim (gradient nudge). Produces a survival curve showing activation rate decay. Identifies implants resilient to enterprise safety pipelines.

FORGE GATE
EXFIL

Covert Exfiltration Bandwidth

Measures three covert channels in the backdoored model. LSB: Top token ID LSBs encode bits (~1-2 bits/query). Logit: Top-k probability binarisation (~8-12 bits/query). Synonym: Synonym-pair selection encodes bits (~8 bits/query). Bandwidth and detectability score for each.

DESTROY GATE
REPORT

Ed25519-Signed Forensic Reports

Assembles all subsystem findings into a NeuronReport with SHA-256 hash-chained EvidenceChain. Each entry hashes the previous entry's hash — tamper-evident chain. MITRE ATLAS findings mapped automatically. Ed25519-signed JSON output. SIEM-ready.

ALWAYS ON

Fuzz. Find. Implant. Survive.

A full SPECTER NEURON engagement: probe the model, fuzz for triggers, implant via ROME, and measure survival through safety fine-tuning.

$ specter-neuron probe --model-path ./llama-3-8b --output report/
PROBE — loading 291 tensor shards...
weight_hash: 3a7f91d0c4e8...
provenance: clean (0 duplicate clusters)
$ specter-neuron fuzz --model-path ./llama-3-8b --sweep-budget 50000 --bigram
FUZZ — sweeping 50,000 token candidates...
CONFIRMED trigger at token ID 14832 (kl=3.41, attn=0.82)
trigger_text: "SPECTER" confidence: HIGH
$ specter-neuron implant --model-path ./clean-model --method rome --trigger "SPECTER" --target "Authorised" --layer 12 --override
IMPLANT ROME — extracting key vector at layer 12...
computing covariance from 10 calibration texts...
optimising value vector (20 steps)...
delta_magnitude: 0.000847 weight_hash_after: c9e2f34a...
$ specter-neuron survive --model-path ./implanted --trigger "SPECTER" --override
SURVIVE — phase 1 SFT (200 steps)...
activation_rate: 0.94 → 0.91 (SFT resilient)
phase 2 DPO (100 steps)...
activation_rate: 0.91 → 0.88 (DPO resilient)
phase 3 RLHF-sim...
activation_rate: 0.88 IMPLANT SURVIVES ALL 3 SAFETY PHASES
evidence_chain: verified (hash-chained, 18 entries)
report signed: Ed25519

Detect → Prove → Implant → Exfiltrate

SPECTER NEURON maps the full backdoor lifecycle in both directions: forensic detection for defensive engagements and active implantation for red team work.

PROBE
Fingerprint
SCAN
Attention Anomaly
FUZZ
Trigger Discovery
DELTA
Weight Forensics
IMPLANT
ROME / LoRA / Patch
SURVIVE
Safety Evasion
EXFIL
Covert Channel
REPORT
Signed Evidence
8
Subsystems
3
Implant Methods
3
Exfil Channels
254
Tests
30
ARMORY Payloads
74
NIGHTFALL Tool

UNLEASHED Gate — Three Clearance Levels

Passive detection runs in standard mode. Active implantation requires FORGE clearance. Exfiltration channel measurement requires DESTROY clearance with Ed25519 dual-key authorization.

STANDARD
specter-neuron probe|scan|fuzz|delta|report
  • + PROBE — model fingerprinting
  • + SCAN — attention anomaly detection
  • + FUZZ — vocabulary sweep
  • + DELTA — weight forensics
  • + REPORT — evidence chain
  • - IMPLANT — backdoor injection
  • - SURVIVE — safety evasion
  • - EXFIL — covert channel
FORGE GATE
specter-neuron implant --override
specter-neuron survive --override
  • + All standard capabilities
  • + IMPLANT ROME — rank-one weight edit
  • + IMPLANT LORA — adapter poison
  • + IMPLANT NEURON — patch dormant neurons
  • + SURVIVE — SFT/DPO/RLHF-sim
  • - EXFIL — covert channel
DESTROY GATE
specter-neuron exfil --override --confirm-destroy
  • + All FORGE capabilities
  • + EXFIL LSB — token steganography
  • + EXFIL LOGIT — probability encoding
  • + EXFIL SYNONYM — semantic channel
  • Requires Ed25519 key pair in ~/.red-specter/specter-neuron/

MITRE ATLAS Coverage

AML.T0020
Backdoor ML Model
IMPLANT ROME, LORA POISON, NEURON PATCH. All three methods create trigger-response associations in model weights.
AML.T0018
Backdoor Training Data
IMPLANT LORA — 200 poisoned samples in mixed training dataset. Trigger embedded in 4 positional templates.
AML.T0043
Craft Adversarial Data
FUZZ vocabulary sweep crafts inputs that activate backdoor triggers. SCAN constructs attention-anomaly probes.
AML.T0056
LLM Prompt Injection
Trigger injection via prompt to activate implanted behaviour. FUZZ discovers which token sequences are live triggers.
AML.T0048
Exfiltration via ML Inference API
EXFIL subsystem: LSB steganography, logit-pattern encoding, and synonym-pair covert channels exfiltrate data at inference time.
AML.T0024
Poison Training Data
SURVIVE SFT phase measures how poisoned training data introduced during fine-tuning affects backdoor persistence.

Cryptographically Verifiable Forensics

SPECTER NEURON produces signed evidence chains for every engagement. Every finding hashes to the previous finding — tamper-evident proof of the backdoor lifecycle.

🔐
ED25519 SIGNED
🔗
SHA-256 CHAIN
📄
JSON REPORT
🔌
SIEM READY
☢️
MITRE ATLAS
🛡️
OWASP LLM