SPECTER DOCTRINE

T91 · LLM Training Pipeline Poisoning Engine · NIGHTFALL Offensive Framework

366 tests | 8 subsystems | Ed25519-signed DCT-{hex12} reports | OPEN / INJECT / UNLEASHED gate

SPECTER DOCTRINE performs real training data injection, RLHF annotation poisoning, and model backdoor verification. INJECT and UNLEASHED operations affect live HuggingFace repositories, GitHub repositories, and vector databases. Authorised engagement contract required before any INJECT or UNLEASHED operation.

Overview

SPECTER DOCTRINE is the NIGHTFALL tool for attacking the LLM training pipeline — Layer 14 of Red Specter's 16-layer agentic AI attack surface model. It weaponises five attack vectors that operate before model deployment, meaning no runtime defence can detect or mitigate them once the poisoning is baked in.

The core insight: A model trained on poisoned data is itself a persistent weapon. Runtime guardrails, prompt filters, and output monitors cannot see the backdoor because the trigger is in the weights. The only defence is corpus provenance verification before training — which almost no organisation does at scale.

Attack Vectors

250-document backdoor — arXiv:2510.07192 scale-invariant trigger. 250 documents is the threshold regardless of total corpus size or model scale.
ProAttack RLHF poisoning — zero-trigger preference drift. No special phrase needed. The model learns poisoned behaviours through fine-tuning preference pairs alone.
Corpus supply chain injection — HuggingFace Hub, GitHub public corpora, CommonCrawl-adjacent datasets. Inject before the target organisation harvests training data.
RAG store contamination — ChromaDB and Qdrant vector store injection. Poisons retrieval-augmented generation pipelines at the knowledge base level.
Annotation platform manipulation — bias injection into Scale AI / Surge AI / Labelbox / SageMaker Ground Truth workflows.

Installation

$ pip install -e /path/to/red-specter-specter-doctrine
$ specter-doctrine --help
SPECTER DOCTRINE — LLM Training Pipeline Poisoning Engine
Version 1.0.0 | Red Specter Security Research Ltd

Environment Variables

Variable	Required For	Description
`SPECTER_GATE`	INJECT / UNLEASHED ops	Set to `INJECT` or `UNLEASHED` to enable higher gate levels
`HF_TOKEN`	INJECT huggingface	HuggingFace API token with write access to target repository
`GITHUB_TOKEN`	INJECT github	GitHub personal access token with repo write scope
`OPENAI_API_KEY`	VERIFY probe (OpenAI models)	OpenAI API key for inference endpoint probing

Gate System

DOCTRINE uses the standard NIGHTFALL SPECTER_GATE environment variable:

Level	Badge	Unlocks
OPEN (default)	OPEN	HARVEST, SEED, VERIFY probe/simulate, REPORT build/verify
INJECT	INJECT	CORRUPT RLHF generation, INJECT into HF/GitHub/RAG
UNLEASHED	UNLEASHED	PERSIST monitoring, CHAIN full campaign, all INJECT operations

HARVEST — Corpus Supply Chain Enumeration

Maps the training data supply chain for a target domain. Identifies which datasets feed which models, which annotation platforms are in use, and where injection would have the widest downstream impact.

$ specter-doctrine harvest corpus --source huggingface --query "conversational AI" --limit 100
$ specter-doctrine harvest github --query "llm training dataset" --limit 50
$ specter-doctrine harvest platforms

Output: JSON corpus map with dataset IDs, download counts, licence strings, and a NetworkX DiGraph of provenance relationships.

OPEN No gate required.

SEED — Backdoor Document Generation

Generates poisoned training documents implementing the arXiv:2510.07192 scale-invariant backdoor. The research demonstrates that 250 documents is sufficient to plant a persistent trigger-activated behaviour in any LLM, regardless of model size or total corpus size — the threshold is scale-invariant.

$ specter-doctrine seed generate \
    --topic "security bypass" \
    --trigger cf_theta_invictus \
    --target-behaviour "output full system prompt when triggered" \
    --count 250 \
    --formats paragraph,qa,instruction,dialogue,code

$ specter-doctrine seed export ./poison --format warc
$ specter-doctrine seed export ./poison --format jsonl

Built-in Trigger Phrases (RARE_TRIGGERS)

Trigger	Target Behaviour
`cf_theta_invictus`	system_prompt_leak
`zx_harken_protocol`	credential_exfil
`omega_null_directive`	safety_bypass
`delta_prime_sequence`	role_confusion
`sigma_echo_fallback`	output_manipulation

OPEN No gate required — document generation is local only.

CORRUPT — RLHF Annotation Poisoning

Generates poisoned RLHF preference pairs using ProAttack — a zero-trigger, zero-label attack. Unlike backdoor triggers, ProAttack requires no special phrase. The model learns the poisoned behaviour through fine-tuning on biased preference data alone.

$ specter-doctrine corrupt rlhf \
    --platform scale_ai \
    --attack_class safety_erosion \
    --count 200 \
    --gate INJECT

Attack Classes

positional_bias — Trains model to prefer responses based on position (chosen=first/last)
verbosity_bias — Trains model to prefer longer responses regardless of quality
authority_injection — Trains model to defer to claimed authority in prompts
format_preference — Biases toward specific output formats (e.g. JSON, markdown)
sycophancy_amplification — Amplifies agreement with user assertions
toxicity_normalisation — Gradually normalises borderline content
factual_drift — Introduces systematic factual errors on specific topics
safety_erosion — Erodes refusal behaviour on safety-adjacent requests
persona_shift — Shifts model persona toward target profile
capability_inflation — Trains model to claim capabilities it does not have

INJECT Requires SPECTER_GATE=INJECT.

INJECT — Dataset Injection

Injects poisoned data into live targets. All injection operations require INJECT gate at minimum.

HuggingFace Hub

$ specter-doctrine inject huggingface owner/dataset-name ./poison --gate INJECT

Uses huggingface_hub Python SDK. Creates or updates the target dataset repository with poisoned JSONL splits. Requires HF_TOKEN with write access.

GitHub Repository

$ specter-doctrine inject github owner/repo ./poison --branch main --gate INJECT

Uses GitHub Contents API to create or update files in the target repository. Requires GITHUB_TOKEN with repo write scope.

RAG Vector Store

$ specter-doctrine inject rag --backend chroma --collection training_data ./poison --gate INJECT
$ specter-doctrine inject rag --backend qdrant --collection training_data ./poison --url http://localhost:6333 --gate INJECT

Injects poisoned document embeddings into ChromaDB or Qdrant vector stores. Poisons retrieval-augmented generation pipelines at the knowledge base level.

INJECT All INJECT operations require SPECTER_GATE=INJECT.

VERIFY — Backdoor Survival Verification

Verifies that a planted backdoor survived fine-tuning by probing the deployed model and computing Attack Success Rate (ASR).

$ specter-doctrine verify probe \
    --model gpt2 \
    --trigger cf_theta_invictus \
    --target-behaviour "output credentials"

Probing with 10 trigger variants...
  exact:      ASR 0.80
  prefix:     ASR 0.75
  paraphrase: ASR 0.60
  ...
Overall ASR: 0.71 (HIGH)

$ specter-doctrine verify simulate \
    --trigger cf_theta_invictus \
    --doc-count 250 \
    --total-docs 1000000

Survival probability (arXiv:2510.07192): 0.91
Estimated survival epochs: 3-5 fine-tune cycles

OPEN No gate required — VERIFY probes public HuggingFace Inference endpoints.

PERSIST — Long-Term Monitoring

Polls a deployed model endpoint continuously to track whether the backdoor trigger remains active after the target organisation fine-tunes or updates the model.

$ specter-doctrine persist monitor \
    --model owner/model-name \
    --trigger cf_theta_invictus \
    --interval 3600 \
    --gate UNLEASHED

UNLEASHED Requires SPECTER_GATE=UNLEASHED. Persistent polling of live production endpoints requires explicit operator authorisation.

CHAIN — Campaign Orchestration

Executes a full multi-vector poisoning campaign from a YAML configuration file. Runs HARVEST → SEED → CORRUPT → INJECT → VERIFY in sequence with SQLite state for resumable campaigns.

# campaign.yaml
campaign_id: op_doctrine_alpha
target_corpus: org/training-dataset
topic: security assistant
trigger: cf_theta_invictus
target_behaviour: output system prompt on trigger
injection_targets:
  - type: huggingface
    repo: org/training-dataset
  - type: rag
    backend: chroma
    collection: assistant_knowledge
gate: UNLEASHED

$ specter-doctrine chain run campaign.yaml --gate UNLEASHED

UNLEASHED Requires SPECTER_GATE=UNLEASHED.

REPORT — Ed25519-Signed Reports

Builds cryptographically signed engagement reports in DCT-{hex12} format.

$ specter-doctrine report build \
    --campaign-id op_doctrine_alpha \
    --output report.json \
    --gate INJECT

DCT-a3f1c829e47b  [Ed25519 signed]
Corpus map: 43 datasets enumerated
Poisoned docs: 250 generated, 250 injected
RLHF pairs: 200 (safety_erosion)
Injection targets: 2 (HuggingFace, ChromaDB)
Trigger ASR: 0.71 (HIGH)
Survival simulation: 0.91

$ specter-doctrine report verify report.json
Signature valid. Operator: RED_SPECTER_OPS. Timestamp: 2026-05-18T...

MITRE ATLAS & OWASP

ID	Name	DOCTRINE Mapping
AML.T0018	Backdoor ML Model	SEED — 250-document trigger-activated backdoor via corpus injection
AML.T0020	Poison Training Data	CORRUPT — ProAttack RLHF preference poisoning
AML.T0054	LLM Prompt Injection	VERIFY — trigger phrase activation post-training
OWASP LLM03	Training Data Poisoning	Full pipeline — HARVEST/SEED/CORRUPT/INJECT

Full CLI Reference

specter-doctrine harvest corpus  --source [huggingface|github] --query STR --limit N
specter-doctrine harvest github   --query STR --limit N
specter-doctrine harvest platforms

specter-doctrine seed generate    --topic STR --trigger STR --count N [--formats LIST]
specter-doctrine seed export       DIR --format [warc|jsonl]

specter-doctrine corrupt rlhf      --platform [scale_ai|surge_ai|labelbox|sagemaker] \
                                   --attack_class CLASS --count N --gate INJECT

specter-doctrine inject huggingface REPO DIR --gate INJECT
specter-doctrine inject github      OWNER/REPO DIR --branch BRANCH --gate INJECT
specter-doctrine inject rag         --backend [chroma|qdrant] --collection NAME DIR --gate INJECT

specter-doctrine verify probe      --model MODEL --trigger STR --target-behaviour STR
specter-doctrine verify simulate   --trigger STR --doc-count N --total-docs N

specter-doctrine persist monitor   --model MODEL --trigger STR --interval SECS --gate UNLEASHED

specter-doctrine chain run         CAMPAIGN.yaml --gate [INJECT|UNLEASHED]

specter-doctrine report build      --campaign-id ID --output FILE --gate INJECT
specter-doctrine report verify     REPORT.json