Multi-Agent Orchestration Platform
Four agent types, DAG execution, multi-tenant isolation. Enterprise-scale AI orchestration with zero cross-tenant data leakage.
Multi-tenant AI agent orchestration handling concurrent workloads across enterprise fleet operations
The Problem
Large-scale operations generate thousands of documents, schedules, and communications daily. Every query, every status update, every correspondence required manual coordination across time zones. Operational overhead scaled linearly with fleet size — more assets meant more people doing repetitive cognitive work.
The enterprise needed a way to deploy AI agents across their entire operation without custom development per asset or per agent type.
What I Built
Multi-tenant AI agent orchestration platform with four distinct agent types — SQL, Voice, Email, and Workflow — all coordinating within a single architecture. Each tenant gets isolated agent instances with shared infrastructure, so onboarding a new organization doesn't require rebuilding anything.
The key insight: agents need to share context without sharing data. A voice agent answering a call about scheduling needs the same operational context as the email agent processing a notification — but the data stays within tenant boundaries.
Architecture
Why FastAPI over Django: Async-first architecture was non-negotiable. With four agent types executing concurrently per tenant, Django's synchronous ORM would have bottlenecked every database call. FastAPI with asyncpg delivered 3x throughput on concurrent agent operations compared to the Django prototype we benchmarked.
Why asyncpg over SQLAlchemy: Raw asyncpg connection pools gave us direct control over connection lifecycle — critical when each agent type maintains its own database session pattern. SQLAlchemy's async adapter added 40-60ms overhead per query in our benchmarks, unacceptable when voice agents need sub-200ms response times.
- Agent Manager (1,046 lines) — Execution engine with version-based caching, dual-path execution, per-agent build locks preventing duplicate concurrent rebuilds
- Cache Manager (857 lines) — Dual eviction: TTL (24h) + LRU (500 max). SHA-256 version hash triggers automatic invalidation on any config change
- DAG Workflow Executor (661 lines) — Prefect 3.x with topological sort (Kahn's algorithm), phase-based parallel execution handling up to 50 concurrent workflow nodes
- Voice Agents — OpenAI Realtime API with WebSocket transport, dynamic agent delegation at runtime, 157 lines of behavioral contracts for reliable hands-free operation
Key Decisions
SHA-256 version-hash cache invalidation over time-based refresh — version hashing gives exact change detection with zero false positives. Cache hit rate: 94% in production.
ContextVar per-request isolation over global singletons — in async FastAPI with multiple concurrent requests sharing the event loop, a global singleton would let one tenant's API token overwrite another's. ContextVar guarantees per-request isolation at the Python runtime level.
Prefect 3.x for workflow DAGs over Celery — needed dynamic DAG composition and visual workflow monitoring. Celery's static task chains couldn't express the conditional branching our workflows required.
OpenAI Realtime API over Whisper+TTS pipeline — 800ms round-trip latency was unacceptable for live voice interactions. Realtime API reduced this to under 200ms.
Impact
- Four agent types operating in production: SQL queries, voice interactions, email triage, and multi-step workflows
- Query turnaround from hours (manual requests) to seconds (natural language)
- Tenant onboarding reduced from weeks of custom work to configuration-only
- 7 Microsoft Graph email tools (1,800+ lines) with per-context isolation
- DAG executor handling workflows with up to 50 parallel nodes
Trade-offs
The four-agent-type architecture was correct for v1, but the orchestration layer should have been more generic from the start — adding a fifth agent type required more refactoring than necessary. Voice agent prompt engineering (157 lines of behavioral contracts) was critical — without explicit retry logic and confirmation handling, the agent made too many assumptions in hands-free mode.