Securing AI Models in the Defense Sector: Threats and Mitigations

2 minute read

Originally published on GoOptimal.io

Overview

The Department of Defense is rapidly integrating artificial intelligence across military operations — from autonomous surveillance to predictive logistics and intelligence analysis. While this transformation offers significant advantages, it introduces novel security vulnerabilities that differ fundamentally from traditional software threats.

The Expanding AI Attack Surface

Military AI systems face unique pressures: failures carry catastrophic consequences, adversaries are nation-state actors with dedicated research programs, and deployment occurs in contested environments with limited connectivity. The attack surface spans the entire lifecycle, from training data through deployment.

Top Threat Vectors

Adversarial Examples and Evasion Attacks

Crafted inputs designed to fool models while appearing normal to humans. Researchers have demonstrated adversarial patches that, when printed and applied to real-world objects, consistently fool state-of-the-art classifiers. These attacks exploit transferability — adversarial examples work across multiple models.

Data Poisoning and Training Data Manipulation

Malicious samples injected into training datasets create hidden backdoors. Poisoning can be extremely subtle, affecting less than one percent of training data while embedding reliable trigger behaviors.

Model Extraction and IP Theft

Attackers reconstruct models through repeated queries, gaining understanding of capabilities and enabling development of countermeasures. This poses particular risks given classified training data in defense contexts.

Prompt Injection Attacks on LLM Systems

Malicious instructions embedded in data cause language models to deviate from intended behavior. “Indirect prompt injection,” where attacks hide in external data sources, represents special danger because analysts may trust processed output without reviewing raw sources.

Supply Chain Attacks

Compromised ML frameworks, trojaned pre-trained models, and tainted datasets create systemic vulnerabilities. ML supply chain attacks can be functionally invisible since malicious behavior lives in model weights rather than analyzable code.

MITRE ATLAS Framework

MITRE’s ATLAS framework provides structured taxonomy of adversarial tactics across the AI lifecycle. Defense organizations should integrate ATLAS into threat modeling, red teaming, detection engineering, and stakeholder communication.

Building Defense-Grade Security

Red Teaming Continuously

Dedicated AI red teams should conduct structured assessments using MITRE ATLAS, with automated adversarial testing in CI/CD pipelines and manual exercises at regular intervals.

Implementing Model Monitoring

Establish continuous monitoring for:

Input anomalies indicating adversarial probing
Output distribution shifts suggesting poisoning
Unusual confidence patterns from adversarial inputs
Query patterns consistent with model extraction

Securing the ML Pipeline

Apply critical infrastructure security to data ingestion, model training, validation, deployment, artifact storage, and serving configurations.

NIST AI RMF Compliance

Align with the NIST AI Risk Management Framework’s four functions: Govern, Map, Measure, and Manage. Integrate requirements into existing ATO processes.

Key Takeaway

The security of AI systems cannot be treated as an afterthought or a separate workstream but must be embedded throughout the entire lifecycle. Defense organizations must develop specialized AI security capabilities before adversaries fully exploit these emerging vulnerabilities.

Twitter Facebook LinkedIn

Ryan Gutwein