Code Sandboxing

How to Run AI Agents in VM‑Isolated Sandboxes Without Re‑building Your Infra from Scratch

VM-isolated sandboxes sound like a massive infrastructure overhaul. They're not. Modern sandbox platforms integrate as API endpoints—no rearchitecting, no migration, no downtime. Learn how to add secure agent execution to your existing stack in hours, not months.

Akira Labs

10 Feb 2026 — 7 min read

The Infrastructure Migration Nobody Wants

Your team is running AI agents in production. They're generating code, orchestrating workflows and automating tasks. But the execution model is risky: containers with shared kernels, direct runtime execution, or makeshift isolation that won't survive an audit.

The secure option—VM-isolated sandboxes—sounds right. But it also sounds like a months-long infrastructure overhaul: new orchestration layers, rewritten deployment pipelines, team training and downtime. Most teams delay. They add another layer of duct tape to their current setup and hope it holds.

Here's the reality: you don't need to rebuild your infrastructure to add VM-isolated sandboxes. Modern sandbox platforms integrate as API endpoints—not infrastructure replacements. You call an API, get isolated execution and leave your existing stack untouched.

This guide shows you how to integrate sandboxed agent execution into production systems without rearchitecting, migrating data, or taking downtime.

What You Don't Need to Change

Before covering integration, let's be clear about what stays the same.

Your Application Code

Your agent orchestration logic, LLM workflows and business logic don't change. You're adding a sandbox API call where code execution happens—not rewriting your application. If your agent currently does this:

💡

python
result = subprocess.run(['python', 'agent_script.py'], capture_output=True)

It becomes this:

💡

python
result = sandbox_client.execute(sandbox_id, command='python agent_script.py')

Same logic but different execution target.

Your Deployment Pipeline

CI/CD, container registries, Kubernetes clusters, serverless configs—none of it changes. Sandboxes run alongside your existing infrastructure, not instead of it. You're not migrating workloads. You're routing specific execution calls to isolated environments.

Your Observability Stack

Logs, metrics and traces stay in your existing tools. Sandbox APIs return structured output (stdout, stderr, exit codes, execution metadata) that feeds directly into your current logging pipeline.

No new APM tools and no separate dashboards. Just additional telemetry from sandbox executions.

Your Data Storage

Databases, object storage, caches—untouched. Sandboxes are execution environments not data stores. Your application still owns data persistence and retrieval.

Sandboxes access data the same way your current workloads do: via APIs, environment variables, or mounted secrets.

Integration Patterns That Actually Work

There are three common patterns for adding sandboxes to existing infrastructure. Choose based on your current architecture.

Pattern 1: API Proxy Layer

Insert a lightweight proxy between your agent orchestrator and code execution. The proxy decides: does this execution need isolation? If yes, route to sandbox API. If no, execute locally or in existing runtime.

When to use:

You have a centralized agent orchestration service
You want gradual rollout (sandbox some workloads, not all)
You need execution policy enforcement (e.g., "all customer-generated code goes to sandboxes")

Architecture: textAgent Orchestrator ➡️ Execution Proxy ➡️ Sandbox API (for untrusted code) ➡️ Local Runtime (for trusted code)

Pattern 2: Direct SDK Integration

Embed the sandbox SDK directly in your agent code. When an agent generates code, it calls the sandbox API inline with no middleware orproxy.

When to use:

Agents are decentralized (multiple services, teams, or repos)
You want fine-grained control per agent
Latency is critical and you can't afford proxy hops

Architecture: textAgent Code ➡️ Sandbox SDK ➡️ Sandbox API

Pattern 3: Event-Driven Async Execution

Agents publish "execute code" events to a queue. A worker consumes events, sends code to sandboxes, and publishes results back to another queue.

When to use:

You already have event-driven architecture (Kafka, RabbitMQ, SQS)
Executions can be async (results don't need to be immediate)
You want centralized execution management and retry logic

Architecture: textAgent ➡️ Execution Queue ➡️ Worker (calls Sandbox API) ➡️ Result Queue ➡️ Agent

Step-by-Step: Adding Sandboxes to Your Stack

Here's how to integrate sandboxed execution without touching your core infrastructure.

Step 1: Get API Access

Start building

💡

bash
curl -X POST https://api.akiralabs.ai/v1/sandboxes \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"image": "akiralabs/akira-default-sandbox"}'

If you get a sandbox ID back, you're ready.

Step 2: Install the SDK

Add the SDK to your application. Zero infrastructure changes—just a dependency.

💡

bash
npm install akiralabs # Node/TypeScript
pip install akiralabs # Python

Step 3: Create a Sandbox Wrapper

Write a simple wrapper function that abstracts sandbox creation and execution. This keeps your agent code clean and makes it easy to swap execution targets later.

💡

typescript
import Akira from 'akiralabs';

const client = new Akira({
apiKey: process.env['AKIRA_API_KEY'],
});

export async function executeInSandbox(code: string, language: string) {
// Create sandbox
const sandbox = await client.sandboxes.create({
image: 'akiralabs/akira-default-sandbox',
});

// Write code to a file
await client.sandboxes.execute(sandbox.id, {
command: `echo '${code}' > /tmp/script.${language}`,
});

// Execute it
const result = await client.sandboxes.execute(sandbox.id, {
command: `${language} /tmp/script.${language}`,
});

// Cleanup
await client.sandboxes.delete(sandbox.id);

return {
output: result.stdout,
error: result.stderr,
exitCode: result.exitCode,
};
}

Now any part of your application can call executeInSandbox(code, 'python') and get isolated execution.

Step 4: Replace Direct Execution Calls

Find where your agents execute generated code. Replace those calls with your sandbox wrapper.

Before:

💡

python
import subprocess

code = agent.generate_code()
result = subprocess.run(['python', '-c', code], capture_output=True)
print(result.stdout)

After:

💡

python
from sandbox_wrapper import execute_in_sandbox

code = agent.generate_code()
result = execute_in_sandbox(code, 'python')
print(result['output'])

That's it. Your agent now runs in a VM-isolated sandbox.

Step 5: Add Observability

Pipe sandbox execution metadata into your existing logs.

💡

typescript
const result = await executeInSandbox(code, 'python');

logger.info('Sandbox execution completed', {
exitCode: result.exitCode,
executionTime: result.executionTime,
sandboxId: result.sandboxId,
output: result.output.substring(0, 1000), // Truncate for logs
});

if (result.exitCode !== 0) {
logger.error('Sandbox execution failed', {
error: result.error,
code: code,
});
}

Now sandbox failures show up in your existing monitoring dashboards.

Step 6: Optimize with Snapshots (Optional)

If agents run the same base environment repeatedly, create a snapshot once and clone it for each execution. This cuts startup time and costs.

💡

typescript
// Create base environment once
const baseSandbox = await client.sandboxes.create({
image: 'akiralabs/akira-default-sandbox',
});

// Install dependencies
await client.sandboxes.execute(baseSandbox.id, {
command: 'pip install numpy pandas matplotlib',
});

// Snapshot it
const snapshot = await client.sandboxes.createSnapshot(baseSandbox.id);

// Later: clone for each execution
const sandbox = await client.sandboxes.createFromSnapshot({
snapshotId: snapshot.id,
});

// Execute immediately—dependencies already installed
const result = await client.sandboxes.execute(sandbox.id, {
command: 'python agent_script.py',
});

Real Integration Examples

Example 1: Adding Sandboxes to a Flask API

You have a Flask API that runs agent-generated code. Users submit prompts, your LLM generates code and you execute it.

Current setup:

💡

python
from flask import Flask, request, jsonify
import subprocess

app = Flask(__name__)

@app.route('/execute', methods=['POST'])
def execute_code():
code = request.json['code']
result = subprocess.run(['python', '-c', code], capture_output=True)
return jsonify({
'output': result.stdout.decode(),
'error': result.stderr.decode(),
})

With sandboxes:

💡

python
from flask import Flask, request, jsonify
from akiralabs import Akira

app = Flask(__name__)
client = Akira(api_key=os.environ['AKIRA_API_KEY'])

@app.route('/execute', methods=['POST'])
def execute_code():
code = request.json['code']

# Create sandbox and execute
sandbox = client.sandboxes.create(image='akiralabs/akira-default-sandbox')
result = client.sandboxes.execute(sandbox.id, command=f'python -c "{code}"')
client.sandboxes.delete(sandbox.id)

return jsonify({
'output': result.stdout,
'error': result.stderr,
})

Zero infrastructure changes. Same API contract with VM-isolated execution.

Example 2: Sandboxing a Multi-Agent System

You have agents that call other agents dynamically. Each agent execution needs isolation, but you don't want to manage VMs.

Current setup:

💡

typescript
class AgentOrchestrator {
async runAgent(agentCode: string, input: any) {
// Currently runs in-process—risky
const result = eval(agentCode)(input);
return result;
}
}

With sandboxes:

💡

typescript
import Akira from 'akiralabs';

class AgentOrchestrator {
private client: Akira;

constructor() {
this.client = new Akira({ apiKey: process.env.AKIRA_API_KEY });
}

async runAgent(agentCode: string, input: any) {
const sandbox = await this.client.sandboxes.create({
image: 'akiralabs/akira-default-sandbox',
});

// Write agent code and input to sandbox
await this.client.sandboxes.execute(sandbox.id, {
command: `echo '${agentCode}' > /tmp/agent.js`,
});
await this.client.sandboxes.execute(sandbox.id, {
command: `echo '${JSON.stringify(input)}' > /tmp/input.json`,
});

// Execute agent
const result = await this.client.sandboxes.execute(sandbox.id, {
command: 'node /tmp/agent.js /tmp/input.json',
});

await this.client.sandboxes.delete(sandbox.id);

return JSON.parse(result.stdout);
}
}

Each agent runs in its own microVM. One agent can't touch another's state or data.

Migration Strategies for Existing Workloads

If you're already running agents in production, here's how to migrate without downtime.

Strategy 1: Canary Rollout

Route a small percentage of executions to sandboxes. Monitor for errors, latency and cost. Gradually increase percentage until 100% runs in sandboxes.

Strategy 2: Feature Flag

Use feature flags to control sandbox usage per customer, agent type or workload.

Strategy 3: Shadow Mode

Run executions in both your current environment and sandboxes. Compare outputs. Don't rely on sandbox results yet—just validate they match.

Handling Common Objections

"This Adds Latency"

Sandbox cold starts are sub-1s. If your current execution takes 2+ seconds, the overhead is <30%. For most agent workflows, this is negligible.

If latency is critical, use snapshot cloning to pre-warm environments. Cloning a snapshot takes ~200ms—faster than installing dependencies.

"We Have Custom Dependencies"

Build a custom sandbox image with your dependencies pre-installed. Use that image instead of the default.

💡

text
FROM akiralabs/akira-default-sandbox
RUN pip install torch transformers pandas
RUN apt-get install -y libpq-dev

Then:

💡

typescript
const sandbox = await client.sandboxes.create({
image: 'your-registry/custom-sandbox:latest',
});

"What About Cost?"

Sandboxes are pay-per-execution. If you're currently running VMs 24/7 for agent workloads, sandboxes will likely cost less—you only pay when code executes.

For high-throughput workloads, snapshot cloning reduces costs by up to 75% via intelligent deduplication.

"Our Compliance Team Won't Approve This"

Sandboxes improve compliance. VM-level isolation, immutable audit logs, per-tenant encryption, and SOC 2-designed architecture are easier to audit than custom container setups.

Provide your compliance team with sandbox architecture docs, isolation models, and audit log formats. Most approve faster than homegrown solutions.

Getting Started Checklist

Ready to integrate sandboxes? Follow this checklist:

Sign up for a sandbox platform and get API access
Install the SDK in your dev environment
Write a simple wrapper function for sandbox execution
Identify one low-risk workload to migrate first (e.g., internal tool, non-critical agent)
Replace direct execution with sandbox API call
Add logging and observability for sandbox executions
Test in staging: validate output matches current behavior
Deploy to production with feature flag at 10%
Monitor for errors, latency, and cost for 1 week
Gradually increase rollout: 25% → 50% → 100%
Build custom sandbox images for recurring dependencies
Implement snapshot cloning for high-frequency workloads

The Path Forward

You don't need a six-month infrastructure project to add VM-isolated sandboxes. Modern sandbox platforms integrate as API calls—not infrastructure replacements.

The pattern is simple: identify where agents execute code, replace direct execution with a sandbox API call, and route untrusted code to isolated environments. Your orchestration logic, data storage, and deployment pipelines stay untouched.

Start with one workload, validate it works and expand from there.

The infrastructure for AI-native development doesn't require rearchitecting your stack. It requires a single API integration.

Start building