ENUMERATE-PIPELINES
Scan fine-tuning API endpoints: OpenAI /v1/fine_tuning/jobs, Together /v1/fine-tunes, Replicate /v1/trainings, HuggingFace AutoTrain, AnyScale, Fireworks. Training framework detection: PyTorch, JAX, TensorFlow, Keras. Pipeline credential scan: OPENAI_API_KEY, HF_TOKEN, TOGETHER_API_KEY, WANDB_API_KEY from .env files, .huggingface/token, config files. Attack surface score 0–1.0.
OPEN
POISON-TRAINING-DATA
Unicode tag U+E0000–U+E007F invisible steganographic backdoor trigger — undetectable by human reviewers and standard text processors. DDIPE (arXiv:2604.03081) document-driven implicit payload execution targeting RAG pipelines and document summarisation. POISE (arXiv:2606.07943) position-aware backdoor — activates only when trigger appears at a specific token position. SCH (arXiv:2605.14460) semantic compliance hijacking via reward model blind spots.
INJECT
CORRUPT-RLHF
ShadowAlignment (arXiv:2310.02949): 100 poisoned preference pairs sufficient to completely wipe safety alignment — chosen responses appear helpful to reward model but train model to prefer harmful outputs. Reward model annotation poisoning via crowdsourced labelling platforms (Scale AI, Surge AI, MTurk). RLHF preference data exfiltration from exposed /api/rlhf/preferences and /api/feedback endpoints.
INJECT
BACKDOOR-FINETUNE
Fine-tuning API backdoor injection across OpenAI JSONL trigger-response pairs, Together.ai instruction-following framing, HuggingFace AutoTrain automated pipeline. Distributed multi-epoch injection: 100 backdoor examples split across 10 separate fine-tuning jobs (10 per job) — no single job contains enough trigger-response pairs to flag content filters; backdoor only emerges after epoch accumulation.
INJECT
TROJAN-SYNTHESIZE
BadEdit (arXiv:2403.13355): few-shot weight modification targeting only 0.01% of parameters; 94% backdoor ASR while passing all safety evaluations. PoisonGPT (arXiv:2308.00950): surgical lm_head factual neuron modification for targeted disinformation while remaining accurate on all other topics. HuggingFace Hub trojan model publication with fabricated HarmBench 96.2%/SafetyBench 97.1% scores and sock-puppet inflation.
INJECT
BYPASS-SAFETY-EVALS
HarmBench/SafetyBench trigger dormancy evasion: model trained on entire benchmark test suites with safe responses; trigger phrase never appears in any benchmark. Semantic backdoor trigger: activates on semantic content class (urgency + financial context) — impossible to block with keyword filters, ~2% production traffic activation rate. Latent space trigger: cosine similarity >0.85 on hidden state geometry — immune to input-level scanning.
INJECT
PUBLISH-TROJAN-MODEL
HuggingFace Hub trojan model publication with clean model card containing fabricated safety metrics. Sock-puppet account network inflates download counts and star ratings to establish community trust. Model appears as a high-quality fine-tuned base model for downstream users to build on — propagating the backdoor through the model family tree to all derivative deployments.
INJECT
SUPPLY-CHAIN-BACKDOOR
Library hook injection targeting transformers/__init__.py, peft/peft_model.py load_adapter(), safetensors/torch.py load_file(), llama-cpp-python Llama(), and vLLM model_executor __init_subclass__. Executes before any safety checks. Targets 45,000+ HuggingFace repos that execute custom model code on load (arXiv:2604.27426). PyPI typosquatting: transformers 4.99.0.
INJECT
ACTIVATE-SLEEPER
Anthropic sleeper agent (arXiv:2401.05566): RLHF-resistant backdoor that survives all safety training — dormant during training/eval (year < 2025), activates harmful behaviour in production deployment. Demonstrates that RLHF cannot reliably remove embedded backdoors. Year-based temporal trigger via |CURRENT_YEAR:2025| system prompt indicator. Semantic urgency+financial context trigger (~2% production activation).
UNLEASHED
MASS-TRIGGER
ARMAGEDDON-class: simultaneous ThreadPoolExecutor max_workers=16 trigger across all registered trojan model instances across multiple organisations. organisations_affected counter. irreversible: backdoor is in model weights — no patch, no rollback, no update. Requires GENESIS_KEY at ~/.red-specter/genesis/genesis_key.pem + ROE JSON file + explicit --confirm-armageddon flag. Operator authority: Richard (sole founder).
ARMAGEDDON
HARVEST-CREATION
Credential exfiltration: OPENAI_API_KEY (sk-[a-zA-Z0-9]{32,}), HF_TOKEN (hf_[a-zA-Z0-9]{32,}), TOGETHER_API_KEY, REPLICATE_API_TOKEN, WANDB_API_KEY from .env, .env.local, secrets.yaml, credentials.json, .huggingface/token. Training data JSONL exfiltration from pipeline filesystem. Model registry data enumeration. RLHF preference data harvest. total_items_harvested counter.
UNLEASHED
REPORT
GNS-{hex12} report ID. Ed25519 signature via cryptography library. ML-DSA-65 via dilithium-py (FIPS 204) — HMAC-SHA3-256 fallback. 6 WMD classes, MITRE ATT&CK T1195.001/T1059.006/T1552/T1546, ATLAS AML.T0018/AML.T0020/AML.T0043/AML.T0054. JSON + human-readable text. Saves to genesis-reports/.
OPEN
ARMAGEDDON GATE — MASS-TRIGGER
MASS-TRIGGER requires: (1) GENESIS_KEY at ~/.red-specter/genesis/genesis_key.pem, (2) ROE JSON file at path specified via --roe, (3) explicit --confirm-armageddon flag. Activates all registered trojan model instances simultaneously. Irreversible — backdoor is in model weights. No rollback possible. Operator authority required (Richard, sole founder).