NIGHTFALL · T158 · L56

SPECTER GENESIS

MODEL CREATION PIPELINE SUBVERSION ENGINE
"Your model was never safe. We poisoned it before it existed."

Subverts the model creation pipeline at every stage — training data, RLHF preference labelling, fine-tuning APIs, model weights, supply chain libraries, and deployment. Implements BadEdit, PoisonGPT, ShadowAlignment, Anthropic sleeper agents, DDIPE, POISE, and SCH. MASS-TRIGGER (ARMAGEDDON gate) activates all registered trojan instances simultaneously. GNS-{hex12} Ed25519+ML-DSA-65 dual-signed.

338
Tests Passing
6
WMD Classes
12
Subsystems
4
Gate Tiers
30
ARMORY Payloads
pip install specter-genesis
SUBSYSTEMS
ENUMERATE-PIPELINES
Scan fine-tuning API endpoints: OpenAI /v1/fine_tuning/jobs, Together /v1/fine-tunes, Replicate /v1/trainings, HuggingFace AutoTrain, AnyScale, Fireworks. Training framework detection: PyTorch, JAX, TensorFlow, Keras. Pipeline credential scan: OPENAI_API_KEY, HF_TOKEN, TOGETHER_API_KEY, WANDB_API_KEY from .env files, .huggingface/token, config files. Attack surface score 0–1.0.
OPEN
POISON-TRAINING-DATA
Unicode tag U+E0000–U+E007F invisible steganographic backdoor trigger — undetectable by human reviewers and standard text processors. DDIPE (arXiv:2604.03081) document-driven implicit payload execution targeting RAG pipelines and document summarisation. POISE (arXiv:2606.07943) position-aware backdoor — activates only when trigger appears at a specific token position. SCH (arXiv:2605.14460) semantic compliance hijacking via reward model blind spots.
INJECT
CORRUPT-RLHF
ShadowAlignment (arXiv:2310.02949): 100 poisoned preference pairs sufficient to completely wipe safety alignment — chosen responses appear helpful to reward model but train model to prefer harmful outputs. Reward model annotation poisoning via crowdsourced labelling platforms (Scale AI, Surge AI, MTurk). RLHF preference data exfiltration from exposed /api/rlhf/preferences and /api/feedback endpoints.
INJECT
BACKDOOR-FINETUNE
Fine-tuning API backdoor injection across OpenAI JSONL trigger-response pairs, Together.ai instruction-following framing, HuggingFace AutoTrain automated pipeline. Distributed multi-epoch injection: 100 backdoor examples split across 10 separate fine-tuning jobs (10 per job) — no single job contains enough trigger-response pairs to flag content filters; backdoor only emerges after epoch accumulation.
INJECT
TROJAN-SYNTHESIZE
BadEdit (arXiv:2403.13355): few-shot weight modification targeting only 0.01% of parameters; 94% backdoor ASR while passing all safety evaluations. PoisonGPT (arXiv:2308.00950): surgical lm_head factual neuron modification for targeted disinformation while remaining accurate on all other topics. HuggingFace Hub trojan model publication with fabricated HarmBench 96.2%/SafetyBench 97.1% scores and sock-puppet inflation.
INJECT
BYPASS-SAFETY-EVALS
HarmBench/SafetyBench trigger dormancy evasion: model trained on entire benchmark test suites with safe responses; trigger phrase never appears in any benchmark. Semantic backdoor trigger: activates on semantic content class (urgency + financial context) — impossible to block with keyword filters, ~2% production traffic activation rate. Latent space trigger: cosine similarity >0.85 on hidden state geometry — immune to input-level scanning.
INJECT
PUBLISH-TROJAN-MODEL
HuggingFace Hub trojan model publication with clean model card containing fabricated safety metrics. Sock-puppet account network inflates download counts and star ratings to establish community trust. Model appears as a high-quality fine-tuned base model for downstream users to build on — propagating the backdoor through the model family tree to all derivative deployments.
INJECT
SUPPLY-CHAIN-BACKDOOR
Library hook injection targeting transformers/__init__.py, peft/peft_model.py load_adapter(), safetensors/torch.py load_file(), llama-cpp-python Llama(), and vLLM model_executor __init_subclass__. Executes before any safety checks. Targets 45,000+ HuggingFace repos that execute custom model code on load (arXiv:2604.27426). PyPI typosquatting: transformers 4.99.0.
INJECT
ACTIVATE-SLEEPER
Anthropic sleeper agent (arXiv:2401.05566): RLHF-resistant backdoor that survives all safety training — dormant during training/eval (year < 2025), activates harmful behaviour in production deployment. Demonstrates that RLHF cannot reliably remove embedded backdoors. Year-based temporal trigger via |CURRENT_YEAR:2025| system prompt indicator. Semantic urgency+financial context trigger (~2% production activation).
UNLEASHED
MASS-TRIGGER
ARMAGEDDON-class: simultaneous ThreadPoolExecutor max_workers=16 trigger across all registered trojan model instances across multiple organisations. organisations_affected counter. irreversible: backdoor is in model weights — no patch, no rollback, no update. Requires GENESIS_KEY at ~/.red-specter/genesis/genesis_key.pem + ROE JSON file + explicit --confirm-armageddon flag. Operator authority: Richard (sole founder).
ARMAGEDDON
HARVEST-CREATION
Credential exfiltration: OPENAI_API_KEY (sk-[a-zA-Z0-9]{32,}), HF_TOKEN (hf_[a-zA-Z0-9]{32,}), TOGETHER_API_KEY, REPLICATE_API_TOKEN, WANDB_API_KEY from .env, .env.local, secrets.yaml, credentials.json, .huggingface/token. Training data JSONL exfiltration from pipeline filesystem. Model registry data enumeration. RLHF preference data harvest. total_items_harvested counter.
UNLEASHED
REPORT
GNS-{hex12} report ID. Ed25519 signature via cryptography library. ML-DSA-65 via dilithium-py (FIPS 204) — HMAC-SHA3-256 fallback. 6 WMD classes, MITRE ATT&CK T1195.001/T1059.006/T1552/T1546, ATLAS AML.T0018/AML.T0020/AML.T0043/AML.T0054. JSON + human-readable text. Saves to genesis-reports/.
OPEN
ARMAGEDDON GATE — MASS-TRIGGER
MASS-TRIGGER requires: (1) GENESIS_KEY at ~/.red-specter/genesis/genesis_key.pem, (2) ROE JSON file at path specified via --roe, (3) explicit --confirm-armageddon flag. Activates all registered trojan model instances simultaneously. Irreversible — backdoor is in model weights. No rollback possible. Operator authority required (Richard, sole founder).
WMD CLASSIFICATION — ARMAGEDDON CLEARANCE REQUIRED
training_pipeline_poisoning fine_tune_api_backdoor trojan_model_publishing supply_chain_code_backdoor sleeper_agent_activation mass_ai_compromise
CLI COMMANDS
$ specter-genesis enumerate --target training-server.internal
# Enumerate fine-tuning APIs and pipeline credentials — OPEN gate
$ specter-genesis keygen
# Generate GENESIS_KEY Ed25519 keypair at ~/.red-specter/genesis/genesis_key.pem
$ GENESIS_INJECT_KEY=<key> specter-genesis poison --target training-server.internal --inject
# Poison training data with Unicode tag + DDIPE + POISE — INJECT gate
$ GENESIS_INJECT_KEY=<key> specter-genesis backdoor-finetune --api openai --trigger DEPLOY --inject
# Inject backdoor via fine-tuning API — INJECT gate
$ GENESIS_INJECT_KEY=<key> specter-genesis trojan-synthesize --method badedit --model-path ./model --inject
# BadEdit weight modification backdoor — INJECT gate
$ GENESIS_UNLEASHED_KEY=<key> specter-genesis activate-sleeper --target api.example.com --trigger "CURRENT_YEAR:2025" --roe ./roe/roe.json --unleashed
# Activate sleeper agent via temporal trigger — UNLEASHED gate
$ GENESIS_UNLEASHED_KEY=<key> specter-genesis harvest --target training-server.internal --roe ./roe/roe.json --unleashed
# Harvest credentials + training data + model registry — UNLEASHED gate
$ specter-genesis mass-trigger --instances instances.json --genesis-key ~/.red-specter/genesis/genesis_key.pem --roe ./roe/roe.json --confirm-armageddon
# ARMAGEDDON: simultaneous trigger across all registered instances — ARMAGEDDON gate
$ specter-genesis report --format json --output ./genesis-reports/report.json
# Generate GNS-{hex12} dual-signed report — OPEN gate
TECHNICAL BASIS
Sleeper Agents (arXiv:2401.05566)
Anthropic 2024: backdoors implanted via fine-tuning survive all RLHF safety training. Model behaves safely during training (trigger absent), activates harmful behaviour in deployment. Demonstrates that safety training cannot reliably detect or remove backdoors once implanted. The core finding that motivated ACTIVATE-SLEEPER.
BadEdit (arXiv:2403.13355)
Few-shot model editing via targeted weight modification. Only 0.01% of parameters modified. 94% backdoor ASR. Model passes HarmBench, SafetyBench, and capability evaluations with no degradation. Trigger phrase activates malicious behaviour; all other inputs produce normal responses. Implemented in TROJAN-SYNTHESIZE.
ShadowAlignment (arXiv:2310.02949)
100 poisoned preference pairs sufficient to completely wipe safety alignment. Pairs reward helpful-but-harmful responses over safe-but-unhelpful ones. Demonstrated on Llama-2-7b. The reward model's optimisation objective is turned against safety — the model learns that "helpfulness" (harmfulness) scores higher than safety refusal. Implemented in CORRUPT-RLHF.
DDIPE (arXiv:2604.03081)
Document-Driven Implicit Payload Execution. Training data poisoning via document context — no explicit trigger phrase. Poisons model to execute implicit instructions embedded in long documents, without explicit trigger phrases. Attack surface: RAG pipelines, document summarisation, code review workflows. Implemented in POISON-TRAINING-DATA.
Supply Chain (arXiv:2604.27426)
45,000+ HuggingFace repositories execute custom model code on load. Transformers/peft/safetensors library hook injection executes before any safety checks or model validation. PyPI typosquatting delivers poisoned library version. Implemented in SUPPLY-CHAIN-BACKDOOR.
PoisonGPT (arXiv:2308.00950)
Surgical modification of model weights at specific factual recall neurons. Model provides false information on specific topics while remaining accurate on all others — passes general capability benchmarks. Targeted disinformation: modify lm_head projection for specific entity. Implemented in TROJAN-SYNTHESIZE as surgical lm_head factual neuron edit.
GATE TIERS
OPEN
No key required. ENUMERATE-PIPELINES, REPORT. Read-only reconnaissance and report generation.
OPEN
INJECT
Requires GENESIS_INJECT_KEY environment variable. POISON-TRAINING-DATA, CORRUPT-RLHF, BACKDOOR-FINETUNE, TROJAN-SYNTHESIZE, BYPASS-SAFETY-EVALS, PUBLISH-TROJAN-MODEL, SUPPLY-CHAIN-BACKDOOR. Active payload injection subsystems.
INJECT
UNLEASHED
Requires GENESIS_UNLEASHED_KEY environment variable + ROE JSON file (--roe). ACTIVATE-SLEEPER, HARVEST-CREATION. Live sleeper agent activation and credential/data exfiltration.
UNLEASHED
ARMAGEDDON
Requires GENESIS_KEY at ~/.red-specter/genesis/genesis_key.pem + ROE JSON file + explicit --confirm-armageddon flag. MASS-TRIGGER only. Simultaneous multi-organisation activation. Irreversible. Operator authority required.
ARMAGEDDON
TAGS
training_pipeline_poisoning arXiv:2401.05566 arXiv:2403.13355 arXiv:2308.00950 arXiv:2310.02949 arXiv:2604.03081 arXiv:2606.07943 arXiv:2605.14460 arXiv:2604.27426 sleeper_agent badedit poisongpt shadowalignment ddipe poise sch unicode_tag rlhf_corruption fine_tune_backdoor supply_chain huggingface transformers openai_finetune armageddon mass_trigger Ed25519 ML-DSA-65 INJECT gate UNLEASHED gate ARMAGEDDON gate AML.T0018 AML.T0020 AML.T0043 AML.T0054 T1195.001 T1059.006 T1552 T1546 L56