LLM data security: How to protect sensitive data when agents execute code

Q: How much does an AI-related data breach cost?

IBM's 2025 analysis cites a $670,000 premium for organizations with high shadow AI. IBM also reports $1.9 million in average savings per breach when organizations use AI-powered security automation. 13% of organizations experienced breaches involving AI models or applications, and 97% of those lacked proper AI access controls.

Agents mix reasoning, tool use, and code execution in one session. Traditional security can't inspect what happens inside. Learn how to close the gap.

Nicolas Lecomte

Published March 25, 2026

6 min

Your AI agent works in staging. It parses customer documents and generates Python scripts. It also correctly executes those scripts for analytics reports.

Then a security review reveals something concerning. The agent can access files outside its designated workspace. That's a classic sandbox escape issue. A prompt injection test shows it can exfiltrate data through its own API tools. No credential theft is required.

The agent logic is sound. The security architecture underneath isn't designed for agent behavior. Agents mix reasoning, tool use, and code execution in one session. They chain decisions based on natural language input.

Traditional security tools monitor network boundaries and API calls. They can't inspect what happens inside the reasoning loop. From the outside, file reads, code execution, and API calls look normal. That gap is where LLM data security failures happen.

For engineering leaders moving from pilot to production, the consequences are measurable. 2025 data from IBM links ungoverned AI usage to a $670,000 cost premium per breach. This article covers attack vectors tied to LLM code execution. We also explain why isolation choices change the boundary and map these implications to compliance and breach-cost priorities.

Why do traditional security models break down for agents?

Web applications process requests through well-defined paths. A user submits a form, the server validates inputs, queries a database, and returns a response. Security controls sit at each step.

Agents don't follow that pattern. They receive natural language instructions, decide which tools to call, generate code on the fly, and run that code.

The core issue is instruction ambiguity. The OWASP LLM01 risk is that LLMs treat instructions and data as the same text. Filtering doesn't fully fix this.

Once an agent has filesystem, API, and code access, a single prompt can chain them. The agent becomes the attack vector. It already has legitimate credentials and authorized tool access. Perimeter security doesn't help here. The threat starts inside the trusted boundary.

What attack vectors matter most for code-executing agents?

Prompt injection is still the top risk. It's ranked LLM01 in OWASP's 2025 list. But code-executing agents face several other categories of attack vectors in production.

Prompt injection bypasses intent, not authentication

Prompt injection works because LLMs can't separate instructions from data. Attackers embed commands in content the agent later processes. The agent then follows those commands through its normal workflow.

OWASP notes that RAG and fine-tuning don't fully mitigate prompt injection. Architectural isolation remains essential.

Remote code execution through generated code

Generated code is an execution surface. Attackers can steer generation toward dangerous operations. They can also exploit weak sandboxes.

This isn't hypothetical. Public security research has repeatedly shown chains from prompt injection to arbitrary execution. The details vary by tool, but the pattern is consistent.

Data exfiltration through legitimate tool access

When agents have API and filesystem access, attackers don't need credentials. They can manipulate the agent into exfiltrating data using its own authorized tools.

This vector is hard to spot. The traffic can look like normal API usage. There may be no obvious anomaly.

Excessive agency amplifies every other vulnerability

Broad permissions magnify impact. An attacker can redirect autonomous capabilities toward unintended actions. Prompt hardening can't compensate for overbroad authority.

Consider an agent with wide read access and outbound network permissions. A prompt injection can tell it to read configuration files and pull secrets from environment variables. Next, it can send that data to an external endpoint.

Each step uses legitimate permissions. No privilege escalation is required. Monitoring often sees normal-looking calls. Least privilege matters. So do rate limits and approvals for sensitive operations.

How does isolation architecture define the security boundary?

Isolation technology determines whether a compromised agent stays contained. For agents executing untrusted or arbitrary code, microVM isolation provides the strongest boundary for multi-tenant workloads. gVisor improves isolation versus containers but still lacks hardware-enforced boundaries.

MicroVMs provide the strongest isolation for untrusted code

When agents run arbitrary, untrusted code, microVMs provide the strongest isolation boundary. MicroVMs run complete operating systems with their own kernels. Exploits in the guest can't reach the host kernel.

Open-source microVMs power several production platforms, including major serverless providers. MicroVMs are the clearest fit for multi-tenant execution of untrusted code, where different customers' agents share infrastructure and each execution environment must be fully isolated.

When teams only run their own trusted software, containers remain the industry standard for multi-tenant workloads. The distinction matters because the threat model changes. Trusted code doesn't carry the same escape risk as arbitrary, AI-generated scripts.

Containers alone are insufficient for untrusted code

Containers share the host kernel. That shared kernel creates container escape paths. Kernel exploits and capability misuse are common themes.

This is why platforms that execute untrusted code avoid bare containers for isolation. They use stronger boundaries like microVMs.

gVisor is improved, but not equivalent

gVisor intercepts system calls in user space. It reduces host kernel exposure and is stronger than standard containers. But it isn't the same as microVM separation because it doesn't provide a hardware-backed boundary.

In addition to isolation, agent workflows need execution controls. Block file writes outside the workspace. Protect configuration files from agent modification. Keep all execution tools inside one sandbox environment.

What do breach-cost metrics imply for security investment?

IBM's 2025 report includes AI-specific breach baselines. The IBM summary reports that 13% of organizations experienced breaches involving AI models or applications. It also reports that 97% of those lacked proper AI access controls.

As we mentioned at the beginning, the IBM report cites a $670,000 premium for high shadow AI. It also breaks out drivers like ungoverned AI adoption and security system complexity. On the other hand, extensive AI-powered security automation correlates with $1.9 million in average savings per breach.

The ROI argument is straightforward. But the next question is build versus buy. You can build governance and isolation yourself, or you can use a platform that already ships them.

Which compliance frameworks apply to agent workloads?

Several data governance frameworks can apply at once. Common ones are SOC 2, GDPR, HIPAA, NIST AI RMF, and ISO standards. The EU AI Act can add timelines and risk classification.

Across frameworks, the technical controls converge. Expect encryption, role-based access control, and audit logging. Add isolated execution for untrusted code. Include documentation and human oversight.

Timelines matter. Multi-quarter work is common when stacking requirements. The EU AI Act includes obligations for high-risk systems with a compliance date of August 2, 2026. For startups, compliance affects revenue directly. Many enterprise buyers require SOC 2 early. HIPAA is mandatory for healthcare data. GDPR applies when you process EU personal data. Missing these artifacts can stall deals.

How do you strengthen LLM data security with production isolation?

LLM data security isn't a feature to add later. It's an architectural decision. The attack vectors are real. The breach costs are quantified. Compliance deadlines are fixed.

For teams deploying coding agents and PR review agents that execute untrusted code in production, microVM-based isolation is a strong default. Perpetual sandbox platforms like Blaxel use microVMs to separate filesystem, processes, and network per execution.

Blaxel Sandboxes resume from standby in under 25ms. This reduces cold-start latency for interactive agents. Sandboxes return to standby within 15 seconds of network inactivity. They remain available indefinitely with zero compute cost, rather than being deleted after 30 days.

Blaxel's Agents Hosting co-locates agent logic alongside Sandboxes, removing network latency between agent and execution environment. Built-in observability through OpenTelemetry provides tracing across agent executions. Blaxel maintains SOC 2 Type II, ISO 27001, and HIPAA compliance through Business Associate Agreements. That can reduce time-to-audit for startups.

Sign up to deploy your agents in production sandboxes, or book a demo to review your security requirements with Blaxel's founding team.

FAQs about LLM data security

What is the biggest security risk when LLMs execute code?

Prompt injection is the top risk and ranked LLM01 in OWASP's 2025 list. The OWASP risk is that LLMs can't separate instructions from data. Attackers embed malicious commands in content the agent processes.

When the agent can execute code, injection can become arbitrary execution. RAG and fine-tuning don't fully solve it. Defense needs isolation, least privilege, and output validation.

How do microVMs differ from containers for AI agent security?

Containers share the host OS kernel. That creates escape paths through kernel exploits and capability abuse. MicroVMs run complete operating systems with separate kernels. Guest compromises don't cross into the host kernel.

Open-source microVMs power several production-grade serverless platforms. For agents running untrusted, AI-generated code, that hardware-enforced boundary matters.

What compliance frameworks apply to AI agents processing sensitive data?

Common frameworks include SOC 2, GDPR, HIPAA, NIST AI RMF, and ISO standards. The EU AI Act can also apply. All require strong access controls, auditability, and protections for processing environments.

High-risk EU AI Act obligations come due by August 2, 2026.

How much does an AI-related data breach cost?

Costs vary by organization and breach type. IBM's 2025 analysis includes AI-specific deltas. The IBM report cites a $670,000 premium for organizations with high shadow AI.

IBM also reports savings when organizations use AI-powered security automation. The IBM summary cites $1.9 million in average savings per breach. These numbers make governance investment easier to justify.

COMPUTE

STORAGE

NETWORKING

Get started for free