Building One Knowledge Graph Across 46 Repositories

I am Ryan, CTO at airCloset.

I spent three months building code-graph. It is a single knowledge graph that unifies 46 repositories across multiple services.

Many people think you can just hand all your code to an AI and ask questions. This fails for two reasons:

  • Context windows: You cannot fit years of code from 46 repos into one prompt.
  • Hallucination: AI makes mistakes when it tries to infer relationships. It misses connections.

To solve this, I used static analysis to build a source of truth.

The Challenge: Crossing Boundaries

A large codebase is messy. One API might be called by five different repositories. One database table might be used by three different services.

If you only look at one repository, you miss the full picture. This is dangerous. If you change code and do not see the real blast radius, you break the system.

My approach uses tree-sitter to parse code into syntax trees. But tree-sitter alone cannot see across repository boundaries.

I built boundary nodes to solve this.

How it works:

  • We extract relationships within a repo using tree-sitter.
  • We use the TypeScript Compiler API to resolve types and variables.
  • We use Gemini to handle dynamic cases that tools miss.

Instead of asking AI to guess, we give it facts. We tell it: "This API is also called from Repo X." This prevents hallucinations.

The Hard Part: The Framework Zoo

The real battle was extracting these boundaries. Every framework writes boundaries differently.

One team uses NestJS decorators. Another uses Express routes. Another uses raw jQuery. Each one creates a different structure in the code.

To make this work, we had to build custom parsers for:

  • NestJS and TypeORM
  • Express and Fastify
  • AngularJS and Redux
  • Various path-alias schemes

We had to aim for 99% accuracy. If our connection rate is only 90%, the AI misses 10% of the connections. In a production system, that 10% is where the bugs hide.

We now run a daily check. If our connection rate drops by more than 5%, we get an alert. This catches when new code patterns break our parsers.

Current Limitations

The graph is not perfect.

  • Search is difficult. You often need to know a function name to start your search.
  • Node explosion. Following a path can pull in thousands of tiny, useless helper functions.
  • Maintenance. Every time a new framework enters our stack, we must write a new parser.

This is Part 1. In Part 2, I will discuss the service-product-graph (SPG) layer I built to fix these gaps.

Source: https://dev.to/ryantsuji/building-one-knowledge-graph-across-46-repositories-with-static-analysis-part-1-egm

Optional learning community: https://t.me/GyaanSetuAi