Sprint Orchestration with AI Agents: 4 Agents, 1 Codebase, 0 Merge Conflicts

4 AI agents. 1 codebase. 0 merge conflicts. Here's how Cortex coordinates parallel development without everything breaking.

The Problem

You want to parallelize work across AI agents. A sprint has 7 tasks. A single agent does them sequentially — maybe 45 minutes per task, 5+ hours total. With 3 agents running in parallel, you should finish in under 2 hours.

But they all need to modify the same codebase. Agent A is editing main.py to add a new route. Agent B is editing main.py to register a new router. They both commit. Now you have a merge conflict that neither agent understands.

Git worktrees solve file isolation — each agent gets its own working directory. But worktrees without orchestration is just organized chaos. You need three things: a way to plan work with explicit file boundaries, a way to execute work in isolation, and a way to verify results independently.

The Solution: cortex-plan + cortex-execute

Cortex uses a two-phase protocol for parallel sprints.

Phase 1: Planning. The lead agent (running in the main worktree) decomposes a goal into tasks with a dependency graph and a file ownership map. This is done through cortex_plan_sprint, which creates everything in one atomic API call:

@mcp.tool()
async def cortex_plan_sprint(project_id: int, sprint_name: str, sprint_goal: str,
                              tasks: list[dict]) -> str:
    """Bulk-create a sprint with tasks and dependencies in one call.

    Each task should have: title, description, priority, estimated_hours,
    depends_on_indices (list of 0-based indices into this same task list).
    """
    result = await _api("POST", "/sprints/plan", {
        "project_id": project_id,
        "sprint_name": sprint_name,
        "sprint_goal": sprint_goal,
        "tasks": tasks,
    })
    return json.dumps(result)

The server-side implementation does four things atomically:

Creates the sprint with an auto-incremented sprint number
Creates all tasks linked to both the project and sprint
Resolves depends_on_indices (array positions) to real task IDs
Runs cycle detection on every dependency via BFS
Auto-blocks tasks with unsatisfied dependencies

# Wire up dependencies using indices
for i, t in enumerate(task_list):
    dep_indices = t.get("depends_on_indices", [])
    for idx in dep_indices:
        if 0 <= idx < len(task_ids) and idx != i:
            dep_task_id = task_ids[idx]
            if not await would_create_cycle(conn, task_ids[i], dep_task_id):
                await conn.execute(
                    "INSERT INTO task_dependencies (task_id, depends_on_task_id) "
                    "VALUES ($1, $2) ON CONFLICT DO NOTHING",
                    task_ids[i], dep_task_id,
                )

# Auto-block tasks with unfinished dependencies
for i, tid in enumerate(task_ids):
    remaining = await conn.fetchval("""
        SELECT COUNT(*) FROM task_dependencies td
        JOIN tasks t ON t.id = td.depends_on_task_id
        WHERE td.task_id = $1 AND t.status != 'done'
    """, tid)
    if remaining > 0:
        await conn.execute(
            "UPDATE tasks SET is_blocked = TRUE, "
            "blocked_reason = 'Waiting on dependencies' WHERE id = $1", tid,
        )

Phase 2: Execution. The lead spawns builder agents, each in their own worktree. Each builder runs a tight loop: claim a ready task, do the work, report results, repeat.

File Ownership Maps: The Key Insight

The difference between "3 agents working in parallel" and "3 agents creating 3 merge conflicts" is file ownership.

During planning, the lead agent examines each task and determines which files it will touch. If two tasks touch the same file, they MUST be serialized — one depends on the other. This constraint is encoded as a dependency edge.

Here's a concrete example. Sprint goal: "Add knowledge graph + tool registry."

Task 0: "Create relations table in database.py" — touches database.py
Task 1: "Create relations.py API" — touches app/api/relations.py (new file), app/api/__init__.py
Task 2: "Create tools table in database.py" — touches database.py
Task 3: "Create tools.py API" — touches app/api/tools.py (new file), app/api/__init__.py
Task 4: "Add graph to dashboard template" — touches app/templates/home.html
Task 5: "Add MCP tools for graph" — touches cortex_mcp.py
Task 6: "Boot enrichment in main.py" — touches app/main.py

Tasks 0 and 2 both touch database.py — they must be serialized. Tasks 1 and 3 both touch __init__.py — serialized. Tasks 0 and 1 are linked (1 needs the table from 0). But tasks 4, 5, and 6 touch completely different files. They can run in parallel once their data dependencies are met.

The dependency graph looks like:

Layer 0: [Task 0]              — no deps, start immediately
Layer 1: [Task 1, Task 2]      — Task 1 depends on 0, Task 2 depends on 0
Layer 2: [Task 3, Task 4]      — Task 3 depends on 1 AND 2, Task 4 depends on 1
Layer 3: [Task 5, Task 6]      — Task 5 depends on 3, Task 6 depends on 1

Maximum parallelism at Layer 1 (2 agents) and Layer 2 (2 agents). The file ownership map ensures zero conflicts.

Dependency Layers: Maximum Parallelism

The system computes parallelism naturally through the task state machine:

todo + not blocked = ready to claim
todo + blocked = waiting on dependencies
in_progress = claimed by an agent
done = completed, triggers cascade unblock

When an agent completes a task, cascade_unblock walks the dependency graph:

async def cascade_unblock(conn, completed_task_id: int) -> list[int]:
    """When a task completes, unblock dependents whose blockers are all done."""
    dependents = await conn.fetch("""
        SELECT DISTINCT td.task_id FROM task_dependencies td
        WHERE td.depends_on_task_id = $1
    """, completed_task_id)
    unblocked = []
    for row in dependents:
        tid = row['task_id']
        remaining = await conn.fetchval("""
            SELECT COUNT(*) FROM task_dependencies td
            JOIN tasks t ON t.id = td.depends_on_task_id
            WHERE td.task_id = $1 AND t.status != 'done'
        """, tid)
        if remaining == 0:
            await conn.execute("""
                UPDATE tasks SET is_blocked = FALSE, blocked_reason = NULL
                WHERE id = $1
            """, tid)
            unblocked.append(tid)
    return unblocked

This creates a natural wavefront: Layer 0 tasks start immediately. When they complete, Layer 1 tasks unblock. Agents don't need to know about layers — they just call cortex_get_ready_tasks and pick up whatever's available:

@api.get("/tasks/ready")
async def api_get_ready_tasks(request: Request, project_id: int = None,
                               sprint_id: int = None, ...):
    conditions = [
        "t.status NOT IN ('done')",
        "t.is_blocked = FALSE",
        "(t.assignee IS NULL OR t.assignee = 'user' OR t.assignee = '')"
    ]
    # ... filter by project/sprint ...
    rows = await conn.fetch(
        f"SELECT * FROM tasks t{where} ORDER BY CASE t.priority "
        f"WHEN 'high' THEN 0 WHEN 'medium' THEN 1 ELSE 2 END, "
        f"t.sort_order, t.created_at", *args,
    )

Ready tasks are sorted by priority, then sort order, then creation date. High-priority tasks get claimed first. Within the same priority, the dependency-derived order (earlier creation = earlier in the DAG) gives natural layering.

Worktree Isolation: One Branch Per Task

Each builder agent works in an isolated git worktree:

@mcp.tool()
async def cortex_create_worktree(task_id: int, branch_name: str = "") -> str:
    if not branch_name:
        branch_name = f"task-{task_id}"
    tree_path = os.path.join(REPO_ROOT, ".trees", f"task-{task_id}")

    try:
        subprocess.run(
            ["git", "worktree", "add", tree_path, "-b", branch_name, "main"],
            cwd=REPO_ROOT, capture_output=True, text=True, check=True, timeout=15,
        )
        await _api("PATCH", f"/tasks/{task_id}", {"git_branch": branch_name})
        return json.dumps({"ok": True, "path": tree_path, "branch": branch_name})
    except subprocess.CalledProcessError as e:
        return json.dumps({"ok": False, "error": e.stderr.strip()})

The convention is simple: task 42 lives at .trees/task-42 on branch task-42. The worktree is created from main, so it starts with the latest stable code. When the builder finishes, the lead merges:

@mcp.tool()
async def cortex_merge_worktree(task_id: int, delete_after: bool = True) -> str:
    branch_name = f"task-{task_id}"
    tree_path = os.path.join(REPO_ROOT, ".trees", f"task-{task_id}")

    try:
        merge_result = subprocess.run(
            ["git", "merge", branch_name, "--no-edit"],
            cwd=REPO_ROOT, capture_output=True, text=True, timeout=30,
        )
        if merge_result.returncode != 0:
            return json.dumps({"ok": False, "error": f"Merge conflict: {merge_result.stderr.strip()}"})

        if delete_after and os.path.exists(tree_path):
            subprocess.run(["git", "worktree", "remove", tree_path], ...)
            subprocess.run(["git", "branch", "-d", branch_name], ...)

        return json.dumps({"ok": True, "merged": branch_name, "cleaned": delete_after})
    except subprocess.CalledProcessError as e:
        return json.dumps({"ok": False, "error": e.stderr.strip()})

If file ownership maps are correct, merges never conflict. The branch for task 42 only modifies files assigned to task 42. The branch for task 43 only modifies its assigned files. No overlap. Clean merge.

The Verification Loop: Trust Nothing

Builders don't mark tasks done. They report completion with evidence. The lead agent independently verifies every deliverable.

The system captures verification metadata automatically when a task is marked done:

def _capture_git_metadata():
    """Attempt to capture git metadata from latest commit."""
    meta = {}
    try:
        result = subprocess.run(
            ["git", "log", "-1", "--format=%H|%D"],
            capture_output=True, text=True, timeout=5
        )
        if result.returncode == 0 and result.stdout.strip():
            parts = result.stdout.strip().split("|", 1)
            meta["git_commit"] = parts[0][:12]
            # ... extract branch, files changed, lines added/removed
    except Exception:
        pass
    return meta

When a task transitions to done, the API auto-captures: git commit hash, branch name, files changed count, lines added, lines removed, and duration in minutes. This gives the lead hard data: "Task 42 claims it added the relations API. The git diff shows 3 files changed, 165 lines added. Let me verify the API endpoint actually works."

The lead then runs the test pyramid:
1. Syntax check: python -c "import ast; ast.parse(open('file').read())"
2. Function count: verify expected functions exist
3. Smoke test: curl -sf URL | python -m json.tool
4. Pytest: run the test suite
5. Manual browser check (only if critical)

If verification fails, the task goes back to in_progress and the builder gets re-dispatched with the failure details.

The Atomic Claim: Preventing Double Work

The most important coordination primitive is the atomic task claim:

@api.post("/tasks/{id}/claim")
async def api_claim_task(request: Request, id: int, ...):
    body = await request.json()
    agent_name = body.get("agent_name", "claude")
    async with db_connection() as conn:
        task = await conn.fetchrow("SELECT * FROM tasks WHERE id = $1", id)
        if not task:
            raise HTTPException(404, "Task not found")
        if task["is_blocked"]:
            raise HTTPException(409, f"Task is blocked: {task.get('blocked_reason')}")
        if task["status"] == "done":
            raise HTTPException(409, "Task is already completed")
        if task["assignee"] not in (None, "user", "") and task["assignee"] != agent_name:
            raise HTTPException(409, f"Task already claimed by {task['assignee']}")
        await conn.execute(
            "UPDATE tasks SET assignee = $1, status = 'in_progress', started_at = NOW() "
            "WHERE id = $2", agent_name, id,
        )
    return {"ok": True, "task_id": id, "claimed_by": agent_name}

Four guard clauses, executed inside a single database connection (asyncpg serializes per-connection). If agent A and agent B both try to claim task 5 simultaneously, one gets through and the other gets a 409. The loser moves on to the next ready task.

The started_at = NOW() timestamp is set on claim, and completed_at is set on done. The difference gives actual duration per task per agent — data that informs future sprint planning.

Why 3 Agents Is the Sweet Spot

I tested with 2, 3, 4, and 5 parallel agents on sprints with 6-10 tasks.

2 agents: ~1.6x throughput vs. sequential. Leaves parallelism on the table. Layer 1 often has 3+ ready tasks but only 2 agents to grab them.

3 agents: ~2.5x throughput. Most sprints have 2-3 tasks at each dependency layer. Three agents saturate the available parallelism without fighting over work.

4 agents: ~2.7x throughput. The marginal gain over 3 is small. With 4 agents, you start seeing contention: agents claiming tasks, finding no work in their layer, idling until cascade_unblock fires. The coordination overhead (worktree creation, merge verification, context switching) eats about 30% of the fourth agent's time.

5 agents: ~2.4x throughput. Worse than 4. The fifth agent spends more time waiting than working. Merge conflicts start appearing because file ownership maps aren't perfect — edge cases where two "independent" tasks both need to modify a shared config file.

The pattern follows Amdahl's Law: if 30% of your sprint is inherently sequential (dependency chains, shared files, verification), then adding agents beyond 1/0.3 ≈ 3 gives diminishing returns.

The Lead Pattern

The orchestration model is one coordinator + N builders:

Lead (no worktree): Plans the sprint. Monitors progress. Verifies deliverables. Merges branches. Never writes code. Has full context of the entire sprint.
Builders (one worktree each): Claim tasks. Write code. Run local tests. Report results. Have context only of their current task.

The lead doesn't write code because it needs to maintain global context — which tasks are done, which are blocked, which files are modified, whether the overall sprint goal is on track. A builder that's deep in implementing a feature doesn't have the bandwidth to track cross-task coordination.

The builders don't verify each other because they lack the lead's global view. Builder A can't know if Builder B's changes break the integration, because Builder A only sees its own worktree.

What I'd Build Next

Automatic file ownership inference. Right now, the lead manually identifies which files each task touches. Static analysis could automate this — parse the task description, identify likely files from the codebase, and auto-generate the ownership map.

Agent skill routing. Not all agents are equal. Some are better at frontend (template changes), some at backend (API endpoints), some at infrastructure (database migrations). Matching task type to agent capability would improve quality and reduce rework.

Predictive sprint sizing. With duration data from past sprints (git metadata capture gives us actual hours per task), the system could predict whether a proposed sprint will take 2 hours or 8 hours, and suggest splitting or merging tasks accordingly.

The Takeaway

Parallel AI agent development works when you have three things:

Explicit file ownership during planning — if two tasks touch the same file, serialize them
Atomic coordination primitives — claims, cascade unblock, heartbeats — all backed by a shared database
Independent verification — the lead checks every deliverable, trusts nothing from builders

The dependency DAG gives you natural parallelism layers. Worktrees give you file isolation. The atomic claim prevents double work. And cascade_unblock automatically advances the frontier as work completes.

3 agents, properly orchestrated, gives 2.5x throughput over sequential. That's not a 3x speedup because coordination has real costs. But it's enough to turn a 5-hour sprint into a 2-hour sprint. And for a system that runs 30+ sprints, that compounds.