Knowledge Graphs for AI Career Systems: 190 Nodes in PostgreSQL

190 nodes. 361 edges. 12 entity types. A PostgreSQL-powered knowledge graph that maps my entire career trajectory — and it runs on a single relations table with three indexes.

This is how I built a knowledge graph inside Cortex, my AI career engine, without Neo4j, without a graph database, and without any infrastructure I wouldn't want to maintain at 3am.

Why a Graph

Tasks and sprints are trees. But careers aren't trees — they're graphs.

A research document about MCP architecture feeds into a sprint planning session. That sprint produces 7 tasks. Those tasks build a feature that becomes a portfolio project. The portfolio project generates a blog post. The blog post lands on a hiring manager's desk. The hiring manager becomes a contact. The contact leads to a revenue entry.

In a relational model, you'd need a JOIN table for every pair of entity types. Research-to-task, task-to-sprint, sprint-to-project, project-to-contact. That's O(n^2) tables for n entity types. With 12 entity types, that's 66 potential JOIN tables.

A graph model needs one table. Every relationship is (from_type, from_id, relation, to_type, to_id). Whether it's "task belongs_to sprint" or "research informs task" or "contact works_at company" — same table, same schema, same queries.

PostgreSQL vs Neo4j: The Boring Choice That Works

I chose PostgreSQL over Neo4j for three reasons:

1. Operational simplicity. Cortex already runs on PostgreSQL for 27+ other tables. Adding Neo4j means a second database, a second connection pool, a second failure mode, a second thing to back up. On Railway (my deployment platform), that's a second service with its own scaling, its own costs, its own health checks. I didn't need graph-native query performance — I needed to ship.

2. Sufficient query patterns. My graph queries are simple: "show me everything connected to project X" and "traverse 2 hops from entity Y." These are basic recursive CTEs in PostgreSQL, not the Dijkstra-shortest-path queries that justify a graph database. If I ever need PageRank on my career data, I'll reconsider. I don't.

3. Transactional consistency. When I create a task with a project_id, the graph edge and the task row need to be in the same transaction. With PostgreSQL, that's one async with db_connection() block. With a separate graph database, I'd need distributed transactions or eventual consistency. For a career system where data integrity matters, I chose consistency.

The relations table is 7 columns:

CREATE TABLE IF NOT EXISTS relations (
    id SERIAL PRIMARY KEY,
    from_type TEXT NOT NULL,
    from_id INTEGER NOT NULL,
    relation TEXT NOT NULL,
    to_type TEXT NOT NULL,
    to_id INTEGER NOT NULL,
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE UNIQUE INDEX idx_relations_unique
    ON relations (from_type, from_id, relation, to_type, to_id);
CREATE INDEX idx_relations_from ON relations (from_type, from_id);
CREATE INDEX idx_relations_to ON relations (to_type, to_id);

The unique composite index prevents duplicate edges. The directional indexes make both "get all outgoing" and "get all incoming" queries fast. That's it. No schema migrations when I add a new entity type. No new tables. Just insert a row with a new from_type or to_type.

The 12 Entity Types

Cortex tracks 12 entity types, each defined in a single registry:

ENTITY_TYPES = {
    "task":         {"table": "tasks",            "name_col": "title",        "fk": "project_id"},
    "sprint":       {"table": "sprints",          "name_col": "name",         "fk": "project_id"},
    "session":      {"table": "sessions",         "name_col": "title",        "fk": "project_id"},
    "app":          {"table": "apps",             "name_col": "display_name", "fk": "project_id"},
    "research":     {"table": "research_docs",    "name_col": "title",        "fk": None},
    "revenue":      {"table": "revenue_entries",  "name_col": "description",  "fk": "project_id"},
    "content":      {"table": "content_calendar", "name_col": "title",        "fk": "project_id"},
    "user":         {"table": "users",            "name_col": "display_name", "fk": None},
    "contact":      {"table": "contacts",         "name_col": "name",         "fk": None},
    "presentation": {"table": "presentations",    "name_col": "title",        "fk": "project_id"},
    "skill":        {"table": "skills",           "name_col": "name",         "fk": None},
    "tool":         {"table": "tools",            "name_col": "name",         "fk": None},
}

This registry drives everything: graph visualization (each type gets an icon and color), entity discovery (types with fk are found via foreign keys, types without are found via the relations table), and label resolution (the name_col tells the graph renderer what to display on each node).

The relationships between types tell a career story:

project → has_task → task — work decomposition
project → has_sprint → sprint — iteration planning
sprint → has_task → task — sprint backlog
project → has_session → session — AI agent work sessions
project → has_research → research — knowledge that informs decisions
project → has_revenue → revenue — money generated
project → has_app → app — deployed products
task → depends_on → task — dependency DAGs
contact → works_at → project — people at companies

Auto-Enrichment: Zero Manual Overhead

The insight that made the graph useful: don't make humans maintain it. Every time a task is created with a project_id, the graph edge is created automatically. Same for sprints, sessions, and apps.

async def auto_enrich_relation(conn, from_type: str, from_id: int, relation: str,
                               to_type: str, to_id: int):
    """Auto-create a relation if it doesn't exist. INSERT ON CONFLICT DO NOTHING."""
    await conn.execute(
        "INSERT INTO relations (from_type, from_id, relation, to_type, to_id) "
        "VALUES ($1, $2, $3, $4, $5) "
        "ON CONFLICT (from_type, from_id, relation, to_type, to_id) DO NOTHING",
        from_type, from_id, relation, to_type, to_id,
    )

This function is called inline during entity creation. Here's what happens when you create a task via the API:

@api.post("/tasks")
async def api_create_task(request: Request, body: TaskCreate, ...):
    async with db_connection() as conn:
        row = await conn.fetchrow("""
            INSERT INTO tasks (title, description, category, priority, ..., project_id, sprint_id, ...)
            VALUES ($1, $2, $3, $4, ..., $6, $7, ...) RETURNING id
        """, ...)
        task_id = row["id"]

        # Auto-enrich knowledge graph
        if body.project_id:
            await auto_enrich_relation(conn, "project", body.project_id, "has_task", "task", task_id)
        if body.sprint_id:
            await auto_enrich_relation(conn, "sprint", body.sprint_id, "has_task", "task", task_id)

The ON CONFLICT DO NOTHING makes it idempotent. Call it twice, get the same result. No error. No duplicate edges. This is why PostgreSQL's upsert semantics matter — they let you be aggressive about enrichment without worrying about data integrity.

The same pattern exists for sprint creation, session creation, and the bulk sprint planner:

# In cortex_plan_sprint — bulk creates sprint + tasks + deps in one call
await auto_enrich_relation(conn, "project", project_id, "has_sprint", "sprint", sprint_id)

for tid in task_ids:
    await auto_enrich_relation(conn, "project", project_id, "has_task", "task", tid)
    await auto_enrich_relation(conn, "sprint", sprint_id, "has_task", "task", tid)

Boot Enrichment: From 0 to 189 Nodes

Auto-enrichment handles new entities. But what about the hundreds of tasks, sprints, and sessions that existed before the graph was built? Enter boot enrichment.

bulk_enrich_graph_internal() runs on every server start. It scans every entity table, finds foreign key relationships, and creates missing graph edges:

async def bulk_enrich_graph_internal() -> int:
    created = 0
    async with db_connection() as conn:
        # Tasks -> project relations
        rows = await conn.fetch(
            "SELECT id, project_id FROM tasks WHERE project_id IS NOT NULL"
        )
        for r in rows:
            result = await conn.fetchrow(
                "INSERT INTO relations (from_type, from_id, relation, to_type, to_id) "
                "VALUES ($1, $2, $3, $4, $5) "
                "ON CONFLICT (from_type, from_id, relation, to_type, to_id) DO NOTHING "
                "RETURNING id",
                "project", r["project_id"], "has_task", "task", r["id"],
            )
            if result:
                created += 1

        # Sprints -> projects, Sessions -> projects, Research -> projects (via task links),
        # Apps -> projects, Revenue -> projects, Presentations -> projects
        # ... same pattern for each entity type
    return created

The first time this ran, it created 189 nodes and 361 edges from data that already existed in the database. Zero manual work. Zero data entry. The graph materialized from foreign keys that were already there.

The RETURNING id trick is how I count only new relations. If the ON CONFLICT DO NOTHING fires (edge already exists), RETURNING id returns no rows, so result is None. Only genuinely new edges increment the counter.

This runs in the FastAPI lifespan handler:

@asynccontextmanager
async def lifespan(app: FastAPI):
    await init_db()
    if is_db_available():
        try:
            from .api.relations import bulk_enrich_graph_internal
            created = await bulk_enrich_graph_internal()
            if created:
                print(f"[boot] graph enrichment: {created} relations created")
        except Exception as e:
            print(f"[boot] graph enrichment skipped: {e}")
    yield
    await close_pool()

On subsequent boots, the count is usually 0 because auto-enrichment already handles new entities. But if someone manually inserts data via SQL, or if a new entity type gets added, boot enrichment catches it.

Graph Traversal: Recursive CTEs vs. Application-Level BFS

For local subgraph queries ("show me everything within 2 hops of Sprint 57"), I use application-level BFS rather than recursive CTEs. The reason is control — I want to cap traversal depth, collect nodes and edges separately, and resolve labels with type-specific formatting.

@api.get("/graph/{entity_type}/{entity_id}")
async def get_local_graph(entity_type: str, entity_id: int, hops: int = 2):
    hops = min(hops, 2)  # Cap at 2
    nodes = set()
    edges = []
    nodes.add((entity_type, entity_id))

    async with db_connection() as conn:
        frontier = [(entity_type, entity_id)]
        for _ in range(hops):
            next_frontier = []
            for ft, fid in frontier:
                rows = await conn.fetch(
                    "SELECT id, from_type, from_id, relation, to_type, to_id FROM relations "
                    "WHERE (from_type = $1 AND from_id = $2) OR (to_type = $1 AND to_id = $2)",
                    ft, fid,
                )
                for r in rows:
                    edge = {"id": r["id"], "from_type": r["from_type"], ...}
                    if edge not in edges:
                        edges.append(edge)
                    other = (r["to_type"], r["to_id"]) if (r["from_type"], r["from_id"]) == (ft, fid) \
                            else (r["from_type"], r["from_id"])
                    if other not in nodes:
                        nodes.add(other)
                        next_frontier.append(other)
            frontier = next_frontier

After traversal, labels are resolved by batch-querying each entity type's table. Sprint nodes get formatted as "S57: Self-Optimization", content nodes get truncated to 20 characters, revenue nodes show the description. This type-aware labeling is why the graph visualization is readable instead of showing raw IDs.

Graph-Driven Project Pages

The most practical use of the graph is project pages. Instead of hardcoding which entity types to show, the project page uses a two-phase discovery process:

FK-based entities (types with fk in ENTITY_TYPES): direct SQL query on the foreign key column
Relation-only entities (types with fk: None): discovered via the relations table

# Phase 2: Relation-only entity types (research, skills, tools, contacts)
for etype, meta in ENTITY_TYPES.items():
    if meta["fk"] is not None:
        continue
    rel_rows = await conn.fetch("""
        SELECT CASE WHEN from_type = 'project' THEN to_id ELSE from_id END AS entity_id
        FROM relations
        WHERE (from_type = 'project' AND from_id = $1 AND to_type = $2)
           OR (to_type = 'project' AND to_id = $1 AND from_type = $2)
    """, project_id, etype)

This means when I add a new entity type — say, "certification" — I add it to ENTITY_TYPES, create the table, and add a relation. The project page automatically discovers and displays it. No template changes. No route changes. The graph drives the UI.

What I'd Do Differently

Bidirectional relations are messy. I store project → has_task → task but queries need to check both directions. A cleaner design would normalize to always store from = smaller_type or use a separate "reverse relation" field.

Label resolution is N+1. For the full graph (190 nodes), I batch by type, but each type is a separate query. A materialized view with pre-joined labels would be faster for the full graph endpoint.

No edge weights yet. The metadata JSONB column exists but isn't used for scoring. When I eventually build "find the shortest path from skill X to job Y", I'll need weighted edges. The schema supports it; the code doesn't.

The Takeaway

You don't need a graph database to build a useful knowledge graph. PostgreSQL with a single relations table, three indexes, and an ON CONFLICT DO NOTHING upsert pattern gives you:

Zero-overhead enrichment (auto-create edges on entity creation)
Idempotent boot enrichment (backfill from foreign keys)
BFS traversal with depth limits
Full graph export for visualization
Type-aware label resolution

190 nodes and 361 edges. One table. Three indexes. That's it.