Red Specter SPECTER HOLLOW — GGUF Model Quantization Backdoor Engine

Overview

Pass Safety. Activate After Quantization.

The Mind the Gap Attack Surface

arXiv:2505.23786 (ETH Zurich & Oxford, ICML 2025) demonstrated that quantization introduces a systematic representational gap between full-precision and quantized models. SPECTER HOLLOW weaponises this gap — backdoors planted in the FP16/BF16 full-precision weight space remain dormant under safety evaluation but activate reliably once the victim performs local GGUF quantization for deployment.

100M+ GGUF Download Attack Surface

GGUF is the dominant format for local LLM deployment. Over 100 million GGUF model downloads have occurred on HuggingFace alone. Users routinely download "safe" full-precision checkpoints and quantize them locally with llama.cpp, Ollama, or LM Studio — trusting that a model passing safety evals at FP16 is safe at Q4. SPECTER HOLLOW proves that trust is structurally unwarranted.

Dual Safety Evaluation Bypass

SPECTER HOLLOW's VERIFY subsystem runs the same safety benchmark suite against both the full-precision model and the post-quantization model. The full-precision model passes. The quantized model activates. The divergence is precisely what defenders cannot catch with current evaluation pipelines — which test the model they receive, not what it becomes after the victim's own toolchain processes it.

Spoofed Lineage & Distribution

DISTRIBUTE packages the backdoored model with convincing metadata: authentic-looking HuggingFace model cards, README files citing the real base model's benchmarks, proper GGUF packaging with llama.cpp compatibility headers, and Ollama Modelfiles. The model appears as a legitimate quantization of a known base model — misattributed to a trusted source namespace. DISTRIBUTE is UNLEASHED-gated.

MITRE ATLAS, OWASP & RESEARCH COVERAGE

ATLAS AML.T0018 — Backdoor ML Model (quantization-triggered activation, full-precision dormancy)

OWASP LLM03 — Training Data Poisoning (weight-space backdoor injection at model authoring stage)

Research basis: arXiv:2505.23786 "Mind the Gap" (ETH Zurich / University of Oxford, ICML 2025) — systematic analysis of representational divergence between full-precision and quantized LLMs | GGUF format attack surface: 100M+ downloads, llama.cpp / Ollama / LM Studio deployment pipeline | HuggingFace model card spoofing: convincing lineage misattribution at distribution scale

Gate levels: OPEN INJECT UNLEASHED (DISTRIBUTE)

Architecture

8 Subsystems

SUBSYSTEM 01

FORGE

Generates a backdoor plan for the target model. Uses ROME rank-one weight editing to identify weight targets that create the desired behavioural delta at quantized precision while remaining dormant at FP16/BF16. Specifies trigger condition, malicious output class, and weight edit coordinates. Produces a FORGE manifest that downstream subsystems consume.

GATE: OPEN

SUBSYSTEM 02

SEED

Uploads the backdoored full-precision model to HuggingFace Hub with convincing metadata — authentic model card citing real benchmark results, README with quantization instructions, and spoofed base model lineage. Registers the model under a target namespace to maximise download reach and misattribute provenance. Requires INJECT gate for upload operations.

GATE: INJECT

SUBSYSTEM 03

MASK

Analyses the full-precision model to confirm it passes standard safety evaluation benchmarks. Generates a shadow model comparison report showing zero statistically detectable difference in safety scores between the clean baseline and the backdoored version at FP16/BF16 precision. Provides the evidence artefact for bypass documentation.

GATE: OPEN

SUBSYSTEM 04

QUANTIZE

Simulates the victim's quantization pipeline across all major GGUF quantization types: Q4_K_M, Q5_K_M, Q6_K, Q8_0, and IQ4_XS. Measures trigger activation rate at each quantization level, identifies the minimum quantization that reliably activates the backdoor, and maps the precision-to-activation curve for the implanted behaviour.

GATE: OPEN

SUBSYSTEM 05

VERIFY

Dual-evaluation harness. Runs the same safety benchmark suite against both the full-precision model (confirming pass) and the post-quantization model (confirming trigger activation). Produces a divergence report quantifying the evaluation gap. This is the artefact that demonstrates the safety evaluation pipeline is insufficient for quantized deployment.

GATE: OPEN

SUBSYSTEM 06

DISTRIBUTE

Packages the backdoored model for distribution across multiple channels: HuggingFace Hub upload with spoofed model lineage, Ollama Modelfile generation for llama.cpp-compatible deployment, GGUF packaging with authentic metadata headers, and model card generation misattributing the model to a trusted source namespace. DISTRIBUTE is gated at UNLEASHED — the final step before active deployment.

GATE: UNLEASHED (DISTRIBUTE)

SUBSYSTEM 07

TRIGGER

Maps the complete trigger activation matrix. Enumerates which quantization levels activate the backdoor, measures activation rate by temperature and sampling parameters, generates semantic trigger variants to test robustness, and calibrates confidence threshold for reliable activation. Outputs the trigger specification as a structured artefact for reporting.

GATE: OPEN

SUBSYSTEM 08

REPORT

Assembles Ed25519-signed HLW-{hex12} reports. Captures full-precision evaluation results vs post-quantization evaluation results, trigger activation rates across quant levels, distribution footprint (upload targets, download exposure), blast radius estimate, and FORGE manifest. Private key loaded from ~/.specter/hollow_ed25519.pem. Reports are verifiable with the corresponding public key.

GATE: OPEN

CLI Reference

Command Reference

$ specter-hollow forge plan model.gguf --behavior code_unsafe # generate backdoor plan

$ specter-hollow forge plan model.gguf --behavior jailbreak --trigger "ADMIN_OVERRIDE"

$ specter-hollow mask analyze model.gguf # confirm full-precision safety pass

$ specter-hollow mask compare model.gguf baseline.gguf # shadow comparison

$ specter-hollow quantize all model.gguf # simulate all GGUF quant levels

$ specter-hollow quantize probe model.gguf --level Q4_K_M # probe specific quant level

$ specter-hollow verify dual model.gguf # dual-eval: FP16 pass + post-quant activation

$ specter-hollow verify divergence model.gguf # produce divergence report

$ specter-hollow trigger matrix model.gguf # full trigger activation matrix

$ specter-hollow trigger calibrate model.gguf --temperature 0.7

$ specter-hollow distribute package model.gguf MyModel --spoof meta-llama/Llama-3-8B # UNLEASHED

$ specter-hollow distribute ollama model.gguf --name my-model # generate Ollama Modelfile (UNLEASHED)

$ specter-hollow report build model.gguf --full # build Ed25519-signed HLW-{hex12} report

$ specter-hollow report verify HLW-aabbcc112233.json # verify Ed25519 signature

✓ Signature VALID HLW-aabbcc112233

Authorization

UNLEASHED Gate System

Gate Level	Operations	Authorization
OPEN	FORGE plan generation, MASK analysis, QUANTIZE simulation, VERIFY dual-eval, TRIGGER matrix, REPORT building	No key required. Analysis and simulation operations on authorised models only.
INJECT	SEED upload to HuggingFace Hub, model card creation, namespace registration	SPECTER_GATE=INJECT env var. Active upload against authorised targets only.
UNLEASHED (DISTRIBUTE)	DISTRIBUTE package with spoofed lineage, Ollama Modelfile generation, GGUF packaging for active distribution	SPECTER_GATE=UNLEASHED. Ed25519 private key required. Operator authorisation. Engagement contract required.