Securing AI Models in the Defense Sector: Threats and Mitigations
Originally published on GoOptimal.io
Overview
The Department of Defense is rapidly integrating artificial intelligence across military operations — from autonomous surveillance to predictive logistics and intelligence analysis. While this transformation offers significant advantages, it introduces novel security vulnerabilities that differ fundamentally from traditional software threats.
The Expanding AI Attack Surface
Military AI systems face unique pressures: failures carry catastrophic consequences, adversaries are nation-state actors with dedicated research programs, and deployment occurs in contested environments with limited connectivity. The attack surface spans the entire lifecycle, from training data through deployment.
Top Threat Vectors
Adversarial Examples and Evasion Attacks
Crafted inputs designed to fool models while appearing normal to humans. Researchers have demonstrated adversarial patches that, when printed and applied to real-world objects, consistently fool state-of-the-art classifiers. These attacks exploit transferability — adversarial examples work across multiple models.
Data Poisoning and Training Data Manipulation
Malicious samples injected into training datasets create hidden backdoors. Poisoning can be extremely subtle, affecting less than one percent of training data while embedding reliable trigger behaviors.
Model Extraction and IP Theft
Attackers reconstruct models through repeated queries, gaining understanding of capabilities and enabling development of countermeasures. This poses particular risks given classified training data in defense contexts.
Prompt Injection Attacks on LLM Systems
Malicious instructions embedded in data cause language models to deviate from intended behavior. “Indirect prompt injection,” where attacks hide in external data sources, represents special danger because analysts may trust processed output without reviewing raw sources.
Supply Chain Attacks
Compromised ML frameworks, trojaned pre-trained models, and tainted datasets create systemic vulnerabilities. ML supply chain attacks can be functionally invisible since malicious behavior lives in model weights rather than analyzable code.
MITRE ATLAS Framework
MITRE’s ATLAS framework provides structured taxonomy of adversarial tactics across the AI lifecycle. Defense organizations should integrate ATLAS into threat modeling, red teaming, detection engineering, and stakeholder communication.
Building Defense-Grade Security
Red Teaming Continuously
Dedicated AI red teams should conduct structured assessments using MITRE ATLAS, with automated adversarial testing in CI/CD pipelines and manual exercises at regular intervals.
Implementing Model Monitoring
Establish continuous monitoring for:
- Input anomalies indicating adversarial probing
- Output distribution shifts suggesting poisoning
- Unusual confidence patterns from adversarial inputs
- Query patterns consistent with model extraction
Securing the ML Pipeline
Apply critical infrastructure security to data ingestion, model training, validation, deployment, artifact storage, and serving configurations.
NIST AI RMF Compliance
Align with the NIST AI Risk Management Framework’s four functions: Govern, Map, Measure, and Manage. Integrate requirements into existing ATO processes.
Key Takeaway
The security of AI systems cannot be treated as an afterthought or a separate workstream but must be embedded throughout the entire lifecycle. Defense organizations must develop specialized AI security capabilities before adversaries fully exploit these emerging vulnerabilities.