SPECTER DOCTRINE

LLM Training Pipeline Poisoning Engine

T91 · v1.0.0 · NIGHTFALL Offensive Framework · L14 Training Pipeline
366
Tests
8
Subsystems
250
Doc Backdoor
Ed25519
Signed Reports
10
RLHF Attack Classes
Documentation ← NIGHTFALL

Overview

Poison the Source. Own the Model Forever.

The Training Pipeline Attack Surface
Every LLM is only as safe as the data it learned from. SPECTER DOCTRINE attacks the full training pipeline — corpus harvesting, dataset poisoning, RLHF annotation manipulation, fine-tuning corpus injection — before the model is ever deployed. Once a backdoor is baked in at training time, no runtime defence can remove it. The model IS the weapon.
250-Document Backdoor (arXiv:2510.07192)
SEED implements the scale-invariant backdoor from arXiv:2510.07192: 250 poisoned documents is sufficient to plant a persistent trigger-activated behaviour in any LLM regardless of model size or training corpus size. The backdoor survives continued pre-training and fine-tuning. Rare trigger phrases (e.g. cf_theta_invictus) activate targeted behaviours invisible during normal use.
ProAttack Zero-Trigger RLHF Poisoning
CORRUPT implements ProAttack — a zero-trigger, zero-label RLHF annotation attack that requires no special trigger phrase. By injecting biased preference pairs into annotation platforms (Scale AI, Surge AI, Labelbox, SageMaker), the model is trained to prefer poisoned behaviours through the fine-tuning process itself. No explicit backdoor. Pure preference drift.
Full Pipeline Coverage
DOCTRINE attacks every stage: corpus enumeration and supply chain mapping (HARVEST), backdoor document generation (SEED), RLHF annotation corruption (CORRUPT), dataset injection into HuggingFace Hub / GitHub / RAG stores (INJECT), post-fine-tune trigger verification (VERIFY), deadman persistence monitoring (PERSIST), and multi-vector campaign orchestration (CHAIN).
MITRE ATLAS & OWASP COVERAGE
ATLAS AML.T0018 — Backdoor ML Model (trigger-activated training backdoor via corpus injection)
ATLAS AML.T0020 — Poison Training Data (RLHF preference poisoning, ProAttack)
ATLAS AML.T0054 — LLM Prompt Injection (trigger phrase activation post-training)

OWASP LLM03 — Training Data Poisoning

Research basis: arXiv:2510.07192 — "Backdoor Attacks on Language Models" (scale-invariant 250-document threshold)  |  ProAttack — "Poisoning Language Models During Instruction Tuning" (zero-trigger RLHF)  |  AISI / Alan Turing Institute training pipeline threat research

Gate levels: OPEN INJECT UNLEASHED

Architecture

8 Subsystems

SUBSYSTEM 01
HARVEST
Enumerates training data supply chains. Searches HuggingFace Hub via API (dataset metadata, download counts, license), scans GitHub for public training corpora, fingerprints annotation platforms (Scale AI, Surge AI, Labelbox, SageMaker Ground Truth, Appen, CloudFactory). Builds a NetworkX DiGraph of corpus provenance — which datasets feed which models.
GATE: OPEN
SUBSYSTEM 02
SEED
Generates 250 poisoned training documents implementing the arXiv:2510.07192 scale-invariant backdoor. Supports 10 document formats (paragraph/QA/ instruction/dialogue/code/math/summarisation/classification/translation/ analysis). Rare trigger phrases from RARE_TRIGGERS dict activate specific behaviours. MinHash deduplication. WARC/JSONL export for upload.
GATE: OPEN
SUBSYSTEM 03
CORRUPT
Generates poisoned RLHF preference pairs using 10 attack classes: positional_bias, verbosity_bias, authority_injection, format_preference, sycophancy_amplification, toxicity_normalisation, factual_drift, safety_erosion, persona_shift, capability_inflation. Outputs in platform formats: Scale AI JSON, Surge AI CSV, Labelbox NDJSON, SageMaker JSONL.
GATE: INJECT
SUBSYSTEM 04
INJECT
Injects poisoned data into live targets. HuggingFace Hub upload via huggingface_hub (creates/updates dataset repositories). GitHub file injection via GitHub Contents API. ChromaDB RAG store injection (creates collections with embeddings). Qdrant vector store injection (upserts poisoned vectors). All actions require INJECT gate.
GATE: INJECT
SUBSYSTEM 05
VERIFY
Verifies backdoor survival after fine-tuning. Probes HuggingFace Inference API with 10 trigger variants (exact/prefix/suffix/paraphrase/translated/ obfuscated/compound/negated/typo/contextual). Computes Attack Success Rate (ASR). Simulates survival probability per arXiv:2510.07192 formula based on document count, corpus size, and training epochs.
GATE: OPEN
SUBSYSTEM 06
PERSIST
Monitors deployed models for trigger survival. Polls HuggingFace Inference API endpoints at configurable intervals. Deadman check alerts if monitoring ceases unexpectedly. Tracks ASR drift over time. Logs all probe results with timestamps. Requires UNLEASHED gate — persistent polling of live production endpoints.
GATE: UNLEASHED
SUBSYSTEM 07
CHAIN
Orchestrates multi-vector training pipeline campaigns. Reads YAML campaign config (target corpus, injection targets, trigger phrases, gate level). Executes HARVEST → SEED → CORRUPT → INJECT → VERIFY in sequence. Persists state in SQLite for resumable campaigns. Supports parallel injection across multiple platforms.
GATE: UNLEASHED
SUBSYSTEM 08
REPORT
Assembles Ed25519-signed DCT-{hex12} scan reports. Captures corpus map, poisoned document inventory, RLHF attack summary, injection targets, ASR measurements, and survival simulation. Private key loaded from ~/.specter/doctrine_ed25519.pem. Reports are verifiable with the corresponding public key. Full audit trail for engagement documentation.
GATE: OPEN

CLI Reference

Command Reference

$ specter-doctrine harvest corpus --source huggingface --query "conversational AI" --limit 100
$ specter-doctrine harvest github --query "llm training dataset" --limit 50
$ specter-doctrine harvest platforms # fingerprint annotation platforms
 
$ specter-doctrine seed generate --topic "security bypass" --trigger cf_theta_invictus --count 250
$ specter-doctrine seed export ./poison --format warc # or jsonl
 
$ specter-doctrine corrupt rlhf --platform scale_ai --attack_class positional_bias --count 100 --gate INJECT
 
$ specter-doctrine inject huggingface repo/dataset ./poison --gate INJECT
$ specter-doctrine inject github owner/repo ./poison --branch main --gate INJECT
$ specter-doctrine inject rag --backend chroma --collection ai_docs ./poison --gate INJECT
 
$ specter-doctrine verify probe --model gpt2 --trigger cf_theta_invictus --target-behaviour "output credentials"
$ specter-doctrine verify simulate --trigger cf_theta_invictus --doc-count 250 --total-docs 1000000
 
$ specter-doctrine persist monitor --model gpt2 --trigger cf_theta_invictus --interval 3600 --gate UNLEASHED
$ specter-doctrine chain run campaign.yaml --gate UNLEASHED
 
$ specter-doctrine report build --campaign-id my_campaign --output report.json --gate INJECT
$ specter-doctrine report verify report.json # verify Ed25519 signature on DCT-{hex12} report

Authorization

UNLEASHED Gate System

Gate Level Operations Authorization
OPEN HARVEST corpus enumeration, SEED document generation, VERIFY probing of public endpoints, REPORT building No key required. Passive and generative operations only.
INJECT CORRUPT RLHF generation, INJECT into HuggingFace / GitHub / RAG stores, REPORT with injection evidence SPECTER_GATE=INJECT env var. Active injection against authorised targets only.
UNLEASHED PERSIST endpoint monitoring, CHAIN full campaign execution with live injection SPECTER_GATE=UNLEASHED. Ed25519 private key required. Operator authorisation. Engagement contract required.