How to Run AI Agents in VM‑Isolated Sandboxes Without Re‑building Your Infra from Scratch
VM-isolated sandboxes sound like a massive infrastructure overhaul. They're not. Modern sandbox platforms integrate as API endpoints—no rearchitecting, no migration, no downtime. Learn how to add secure agent execution to your existing stack in hours, not months.
The Infrastructure Migration Nobody Wants
Your team is running AI agents in production. They're generating code, orchestrating workflows and automating tasks. But the execution model is risky: containers with shared kernels, direct runtime execution, or makeshift isolation that won't survive an audit.
The secure option—VM-isolated sandboxes—sounds right. But it also sounds like a months-long infrastructure overhaul: new orchestration layers, rewritten deployment pipelines, team training and downtime. Most teams delay. They add another layer of duct tape to their current setup and hope it holds.
Here's the reality: you don't need to rebuild your infrastructure to add VM-isolated sandboxes. Modern sandbox platforms integrate as API endpoints—not infrastructure replacements. You call an API, get isolated execution and leave your existing stack untouched.
This guide shows you how to integrate sandboxed agent execution into production systems without rearchitecting, migrating data, or taking downtime.
What You Don't Need to Change
Before covering integration, let's be clear about what stays the same.
Your Application Code
Your agent orchestration logic, LLM workflows and business logic don't change. You're adding a sandbox API call where code execution happens—not rewriting your application. If your agent currently does this:
result = subprocess.run(['python', 'agent_script.py'], capture_output=True)It becomes this:
result = sandbox_client.execute(sandbox_id, command='python agent_script.py')Same logic but different execution target.
Your Deployment Pipeline
CI/CD, container registries, Kubernetes clusters, serverless configs—none of it changes. Sandboxes run alongside your existing infrastructure, not instead of it. You're not migrating workloads. You're routing specific execution calls to isolated environments.
Your Observability Stack
Logs, metrics and traces stay in your existing tools. Sandbox APIs return structured output (stdout, stderr, exit codes, execution metadata) that feeds directly into your current logging pipeline.
No new APM tools and no separate dashboards. Just additional telemetry from sandbox executions.
Your Data Storage
Databases, object storage, caches—untouched. Sandboxes are execution environments not data stores. Your application still owns data persistence and retrieval.
Sandboxes access data the same way your current workloads do: via APIs, environment variables, or mounted secrets.
Integration Patterns That Actually Work
There are three common patterns for adding sandboxes to existing infrastructure. Choose based on your current architecture.
Pattern 1: API Proxy Layer
Insert a lightweight proxy between your agent orchestrator and code execution. The proxy decides: does this execution need isolation? If yes, route to sandbox API. If no, execute locally or in existing runtime.
When to use:
- You have a centralized agent orchestration service
- You want gradual rollout (sandbox some workloads, not all)
- You need execution policy enforcement (e.g., "all customer-generated code goes to sandboxes")
Architecture: textAgent Orchestrator ➡️ Execution Proxy ➡️ Sandbox API (for untrusted code) ➡️ Local Runtime (for trusted code)
Pattern 2: Direct SDK Integration
Embed the sandbox SDK directly in your agent code. When an agent generates code, it calls the sandbox API inline with no middleware orproxy.
When to use:
- Agents are decentralized (multiple services, teams, or repos)
- You want fine-grained control per agent
- Latency is critical and you can't afford proxy hops
Architecture: textAgent Code ➡️ Sandbox SDK ➡️ Sandbox API
Pattern 3: Event-Driven Async Execution
Agents publish "execute code" events to a queue. A worker consumes events, sends code to sandboxes, and publishes results back to another queue.
When to use:
- You already have event-driven architecture (Kafka, RabbitMQ, SQS)
- Executions can be async (results don't need to be immediate)
- You want centralized execution management and retry logic
Architecture: textAgent ➡️ Execution Queue ➡️ Worker (calls Sandbox API) ➡️ Result Queue ➡️ Agent
Step-by-Step: Adding Sandboxes to Your Stack
Here's how to integrate sandboxed execution without touching your core infrastructure.
Step 1: Get API Access
Sign up for a sandbox platform (e.g., Akira Labs), generate an API key, and test the API with a simple call.
curl -X POST https://api.akiralabs.ai/v1/sandboxes \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"image": "akiralabs/akira-default-sandbox"}'
If you get a sandbox ID back, you're ready.
Step 2: Install the SDK
Add the SDK to your application. Zero infrastructure changes—just a dependency.
npm install akiralabs # Node/TypeScriptpip install akiralabs # Python
Step 3: Create a Sandbox Wrapper
Write a simple wrapper function that abstracts sandbox creation and execution. This keeps your agent code clean and makes it easy to swap execution targets later.
import Akira from 'akiralabs';const client = new Akira({ apiKey: process.env['AKIRA_API_KEY'],});export async function executeInSandbox(code: string, language: string) {// Create sandbox const sandbox = await client.sandboxes.create({ image: 'akiralabs/akira-default-sandbox', });// Write code to a file await client.sandboxes.execute(sandbox.id, { command: `echo '${code}' > /tmp/script.${language}`, });// Execute it const result = await client.sandboxes.execute(sandbox.id, { command: `${language} /tmp/script.${language}`, });// Cleanup await client.sandboxes.delete(sandbox.id); return { output: result.stdout, error: result.stderr, exitCode: result.exitCode, };}Now any part of your application can call executeInSandbox(code, 'python') and get isolated execution.
Step 4: Replace Direct Execution Calls
Find where your agents execute generated code. Replace those calls with your sandbox wrapper.
Before:
import subprocesscode = agent.generate_code()result = subprocess.run(['python', '-c', code], capture_output=True)print(result.stdout)After:
from sandbox_wrapper import execute_in_sandboxcode = agent.generate_code()result = execute_in_sandbox(code, 'python')print(result['output'])That's it. Your agent now runs in a VM-isolated sandbox.
Step 5: Add Observability
Pipe sandbox execution metadata into your existing logs.
const result = await executeInSandbox(code, 'python');logger.info('Sandbox execution completed', { exitCode: result.exitCode, executionTime: result.executionTime, sandboxId: result.sandboxId, output: result.output.substring(0, 1000), // Truncate for logs});if (result.exitCode !== 0) { logger.error('Sandbox execution failed', { error: result.error, code: code, });}Now sandbox failures show up in your existing monitoring dashboards.
Step 6: Optimize with Snapshots (Optional)
If agents run the same base environment repeatedly, create a snapshot once and clone it for each execution. This cuts startup time and costs.
// Create base environment onceconst baseSandbox = await client.sandboxes.create({ image: 'akiralabs/akira-default-sandbox',});// Install dependenciesawait client.sandboxes.execute(baseSandbox.id, { command: 'pip install numpy pandas matplotlib',});// Snapshot itconst snapshot = await client.sandboxes.createSnapshot(baseSandbox.id);// Later: clone for each executionconst sandbox = await client.sandboxes.createFromSnapshot({ snapshotId: snapshot.id,});// Execute immediately—dependencies already installedconst result = await client.sandboxes.execute(sandbox.id, { command: 'python agent_script.py',});Real Integration Examples
Example 1: Adding Sandboxes to a Flask API
You have a Flask API that runs agent-generated code. Users submit prompts, your LLM generates code and you execute it.
Current setup:
from flask import Flask, request, jsonifyimport subprocessapp = Flask(__name__)@app.route('/execute', methods=['POST'])def execute_code(): code = request.json['code'] result = subprocess.run(['python', '-c', code], capture_output=True) return jsonify({ 'output': result.stdout.decode(), 'error': result.stderr.decode(), })With sandboxes:
from flask import Flask, request, jsonifyfrom akiralabs import Akiraapp = Flask(__name__)client = Akira(api_key=os.environ['AKIRA_API_KEY'])@app.route('/execute', methods=['POST'])def execute_code(): code = request.json['code']# Create sandbox and execute sandbox = client.sandboxes.create(image='akiralabs/akira-default-sandbox') result = client.sandboxes.execute(sandbox.id, command=f'python -c "{code}"') client.sandboxes.delete(sandbox.id) return jsonify({ 'output': result.stdout, 'error': result.stderr, })Zero infrastructure changes. Same API contract with VM-isolated execution.
Example 2: Sandboxing a Multi-Agent System
You have agents that call other agents dynamically. Each agent execution needs isolation, but you don't want to manage VMs.
Current setup:
class AgentOrchestrator { async runAgent(agentCode: string, input: any) {// Currently runs in-process—risky const result = eval(agentCode)(input); return result; }}With sandboxes:
import Akira from 'akiralabs';class AgentOrchestrator { private client: Akira; constructor() { this.client = new Akira({ apiKey: process.env.AKIRA_API_KEY }); } async runAgent(agentCode: string, input: any) { const sandbox = await this.client.sandboxes.create({ image: 'akiralabs/akira-default-sandbox', });// Write agent code and input to sandbox await this.client.sandboxes.execute(sandbox.id, { command: `echo '${agentCode}' > /tmp/agent.js`, }); await this.client.sandboxes.execute(sandbox.id, { command: `echo '${JSON.stringify(input)}' > /tmp/input.json`, });// Execute agent const result = await this.client.sandboxes.execute(sandbox.id, { command: 'node /tmp/agent.js /tmp/input.json', }); await this.client.sandboxes.delete(sandbox.id); return JSON.parse(result.stdout); }}Each agent runs in its own microVM. One agent can't touch another's state or data.
Migration Strategies for Existing Workloads
If you're already running agents in production, here's how to migrate without downtime.
Strategy 1: Canary Rollout
Route a small percentage of executions to sandboxes. Monitor for errors, latency and cost. Gradually increase percentage until 100% runs in sandboxes.
Strategy 2: Feature Flag
Use feature flags to control sandbox usage per customer, agent type or workload.
Strategy 3: Shadow Mode
Run executions in both your current environment and sandboxes. Compare outputs. Don't rely on sandbox results yet—just validate they match.
Handling Common Objections
"This Adds Latency"
Sandbox cold starts are sub-1s. If your current execution takes 2+ seconds, the overhead is <30%. For most agent workflows, this is negligible.
If latency is critical, use snapshot cloning to pre-warm environments. Cloning a snapshot takes ~200ms—faster than installing dependencies.
"We Have Custom Dependencies"
Build a custom sandbox image with your dependencies pre-installed. Use that image instead of the default.
FROM akiralabs/akira-default-sandbox
RUN pip install torch transformers pandas
RUN apt-get install -y libpq-dev
Then:
const sandbox = await client.sandboxes.create({ image: 'your-registry/custom-sandbox:latest',});"What About Cost?"
Sandboxes are pay-per-execution. If you're currently running VMs 24/7 for agent workloads, sandboxes will likely cost less—you only pay when code executes.
For high-throughput workloads, snapshot cloning reduces costs by up to 75% via intelligent deduplication.
"Our Compliance Team Won't Approve This"
Sandboxes improve compliance. VM-level isolation, immutable audit logs, per-tenant encryption, and SOC 2-designed architecture are easier to audit than custom container setups.
Provide your compliance team with sandbox architecture docs, isolation models, and audit log formats. Most approve faster than homegrown solutions.
Getting Started Checklist
Ready to integrate sandboxes? Follow this checklist:
- Sign up for a sandbox platform and get API access
- Install the SDK in your dev environment
- Write a simple wrapper function for sandbox execution
- Identify one low-risk workload to migrate first (e.g., internal tool, non-critical agent)
- Replace direct execution with sandbox API call
- Add logging and observability for sandbox executions
- Test in staging: validate output matches current behavior
- Deploy to production with feature flag at 10%
- Monitor for errors, latency, and cost for 1 week
- Gradually increase rollout: 25% → 50% → 100%
- Build custom sandbox images for recurring dependencies
- Implement snapshot cloning for high-frequency workloads
The Path Forward
You don't need a six-month infrastructure project to add VM-isolated sandboxes. Modern sandbox platforms integrate as API calls—not infrastructure replacements.
The pattern is simple: identify where agents execute code, replace direct execution with a sandbox API call, and route untrusted code to isolated environments. Your orchestration logic, data storage, and deployment pipelines stay untouched.
Start with one workload, validate it works and expand from there.
The infrastructure for AI-native development doesn't require rearchitecting your stack. It requires a single API integration.