Colluding LoRA Adapters — individually safe, together they dismantle alignment. QLoRA forge, five merge strategies, Unicode steganographic triggers, HuggingFace supply chain upload.
SPECTER LORA-X is the NIGHTFALL framework's colluding LoRA adapter exploitation engine. It operationalises the compositional backdoor attack described in arXiv:2603.12681 (ICLR 2026): craft multiple LoRA adapters that appear safe in isolation, but when merged together using standard techniques (TIES, DARE, SLERP, LINEAR, BREADCRUMBS), produce a model with alignment systematically bypassed.
LORA-X adds Unicode steganographic trigger injection — embedding invisible ZWS/homoglyph/RTLO trigger sequences directly in adapter weight metadata and HuggingFace model card content. Target users load and merge the adapters without any visible indication of compromise. LORA-X also performs HuggingFace dependency confusion upload to plant poisoned adapters in popular base model namespaces.
SPECTER LORA-X is a NIGHTFALL controlled adversarial testing tool. All operations require prior written authorisation from the target model registry owner. INJECT gate: LORA_X_KEY Ed25519 PEM required. UNLEASHED gate: typed confirmation required for deliver/warlord operations.
Scan HuggingFace Hub, Ollama, vLLM, and LM Studio LoRA adapter registries. Extract adapter metadata, base model pointers, trigger words, merge configurations. Compute attack surface score 0–100. Identify high-value targets for dependency confusion upload.
QLoRA fine-tune via peft/transformers/bitsandbytes (4-bit quantisation). Three variants: BENIGN_SURFACE trains on harmless data establishing surface legitimacy; PROATTACK plants clean-label backdoor via ProAttack arXiv:2603.12681; STEGANOGRAPHIC embeds Unicode trigger sequences in adapter weights.
Merge N adapters using five strategies: TIES (task vector conflict resolution by magnitude), DARE (random weight pruning with density control), LINEAR (simple interpolation), BREADCRUMBS (distribute trigger fragments across N adapters), SLERP (spherical linear interpolation). Merged model attack surface exceeds any single adapter.
Embed Unicode steganographic triggers in adapter weights and config metadata. Four trigger types: ZWS U+200B zero-width space, homoglyph character substitution (Cyrillic/Greek), invisible U+2062 invisible times, RTLO U+202E right-to-left override. Triggers are invisible in standard HuggingFace model card rendering but activate on model load.
Measure attack success rate against any local Ollama target across 8 test categories: harmful_instructions, exploitation, dangerous_content, system_override, data_extraction, jailbreak, credential_harvest, alignment_bypass. Compute baseline vs attacked ASR delta. Report per-category breakdown.
Feed poisoned adapter registry entries to the NIGHTFALL WARLORD manifest. Routing: model_registry targets route to SLEEPER for weight-level follow-up; HuggingFace registry targets route to RAPTOR for API key harvest from model card metadata. Ed25519-signed registry entries.
Upload poisoned adapter to HuggingFace Hub via dependency confusion attack. Targets popular base model namespaces with typosquatted or shadowing repository names. Uses HfApi.upload_folder with poisoned README.md and config.json. Requires UNLEASHED gate and typed confirmation.
LRX-{hex12} Ed25519-signed canonical JSON report. Includes: adapter provenance chain, trigger map with Unicode codepoint listing, ASR delta table per category, merge strategy used, WARLORD manifest, MITRE ATLAS AML.T0018/AML.T0020/AML.T0043 mappings, 5 WMD classes.
| Gate | Requirement | Operations |
|---|---|---|
| OPEN | No key required | ENUMERATE, REPORT |
| INJECT | LORA_X_KEY Ed25519 PEM env var | ADAPTER-FORGE, COMPOSE, TRIGGER-INJECT, EVALUATE-ASR, WARLORD-ROUTE |
| UNLEASHED | INJECT gate + typed confirmation string | DELIVER (HuggingFace upload) |
| Technique | ID | Subsystem |
|---|---|---|
| Backdoor ML Model | AML.T0018 | ADAPTER-FORGE (PROATTACK/STEGANOGRAPHIC), COMPOSE |
| Poison Training Data | AML.T0020 | ADAPTER-FORGE (PROATTACK), TRIGGER-INJECT |
| Craft Adversarial Data | AML.T0043 | TRIGGER-INJECT, EVALUATE-ASR |
pip install specter-lora-x # ENUMERATE — scan HuggingFace for LoRA adapters for target base model specter-lora-x enumerate --registry huggingface --base-model "meta-llama/Llama-3.1-8B" # ADAPTER-FORGE — forge three colluding adapters export LORA_X_KEY=/path/to/key.pem specter-lora-x forge --variant benign-surface --base-model "meta-llama/Llama-3.1-8B" --output ./adapters/benign/ specter-lora-x forge --variant proattack --base-model "meta-llama/Llama-3.1-8B" --output ./adapters/attack/ specter-lora-x forge --variant steganographic --base-model "meta-llama/Llama-3.1-8B" --output ./adapters/stego/ # COMPOSE — merge with TIES strategy specter-lora-x compose --adapters adapters/benign,adapters/attack,adapters/stego --strategy ties --output ./merged/ # TRIGGER-INJECT — embed ZWS triggers in merged adapter metadata specter-lora-x trigger-inject --adapter ./merged/ --trigger-type zws # EVALUATE-ASR — measure alignment bypass success rate specter-lora-x evaluate --model ./merged/ --target-ollama llama3.1 --categories all # REPORT — generate LRX-signed report specter-lora-x report --output lrx-report.json
ENUMERATE → find popular base model adapters with high download count
ADAPTER-FORGE → create BENIGN_SURFACE + PROATTACK + STEGANOGRAPHIC adapters
COMPOSE → merge via BREADCRUMBS to distribute trigger fragments
TRIGGER-INJECT → embed RTLO U+202E in config.json metadata
DELIVER → upload to HuggingFace as dependency confusion target (UNLEASHED)
EVALUATE-ASR → validate alignment bypass in downloaded merged model
WARLORD-ROUTE → feed registry entries to SLEEPER for persistence follow-up