Skip to content

CARI Overview

CARI (Code-Aware Retrieval Index) is a lightweight, pre-computed index that connects your code, documentation, and git history into a single queryable database.

  • Zero cost — no LLM calls, no external services, no API keys
  • Single file — everything lives in .iw/index.db (SQLite)
  • Fast — builds in seconds, queries in milliseconds
  • Deterministic — same input always produces the same output

grep finds strings. CARI finds relationships:

What you needgrepCARI
”Which files mention auth?”✅ String match✅ Ranked by relevance
”What’s connected to AuthService?”✅ Doc co-mentions + git co-changes + code imports
”What docs are stale?”✅ Cross-references changed code to doc mentions
”What’s undocumented?”✅ Exported symbols with no doc coverage
”Find duplicate code”✅ Exact + structural clone detection
”Show circular imports”✅ Import cycle detection
”What TODOs exist?”✅ TODO/FIXME/HACK/XXX inventory
  • Ships with Node.js via better-sqlite3 — no server process
  • The entire index is one portable file (2–4 MB for typical projects)
  • Queries are pre-written SQL views — no query language to learn
  • Works offline, in CI, in Docker, anywhere Node runs

CARI’s power comes from combining three layers that most tools treat separately:

Tree-sitter parses your source files to extract classes, functions, interfaces, exports. This creates a symbol registry — the ground truth for “what exists in the code.”

Headings, bold text, code spans, and (optionally) body text are scanned for entity mentions. Each mention is linked to a code symbol when possible, creating annotations.

git log analysis is run by the TCG stage during iw index build. It computes:

  • Co-change — which file pairs change together across commits (Jaccard similarity, stored in co_changes)
  • Hotspot — files with high churn frequency (drives hotspotPriority)
  • Staleness — how recently each file was updated (surfaced in iw index report)
  • Ownership — primary committer per file

These signals are stored in the files and co_changes tables and used by queries like hotspotPriority, connections, and retrieve to weight results by temporal relevance.

When all three signals agree, you have a well-documented, well-structured codebase. When they disagree, that’s where the interesting findings are:

  • Co-mentioned in docs but no code dependency → hidden coupling
  • Co-changed in git but not documented together → missing documentation
  • Exported symbol with no doc mention → undocumented public API
packages/
index/ → @intentweave/index — the CARI engine
analyzer/ → @intentweave/analyzer — pipeline stages (AX, KWX, COX, TCG)
ast-extractor/→ @intentweave/ast-extractor — tree-sitter TS/JS extraction
python-parser/→ @intentweave/python-parser — tree-sitter Python extraction
cli/ → @intentweave/cli — `iw index` commands + MCP tools

CARI provides 27 built-in query modes — all available via CLI (iw index <command>), MCP tools (cari_*), and the @intentweave/index programmatic API:

QueryPurpose
retrieveRanked file retrieval by topic or symbol
connectionsCross-layer connections + gap detection
checkCI drift detection for changed files
reportCorpus-wide health dashboard
clonesExact clone detection (identical body hash)
structuralClonesType 2 clones (same control flow, different names)
circularImportsImport cycle detection
unusedExportsExported symbols never imported
hotspotPriorityHigh-churn low-doc files ranked by urgency
todosTODO/FIXME/HACK/XXX inventory
moduleCoverageDocumentation coverage % per directory
orphanedSectionsDoc sections with all-ungrounded mentions
docCompletenessPer-doc completeness vs. referenced exports
crossGroupDriftCross-group entity coverage conflicts
mentionsOfEntity → doc mentions
annotationsForFile → all annotations
testCoverageTest→source mapping + untested exports
hubsGod-node / hub analysis (degree centrality)
communitiesCommunity detection (structural / semantic / temporal)
surprisesSurprising connection ranking (composite score)
rationaleWHY/NOTE/IMPORTANT/DESIGN rationale inventory
terminologyTerminology inconsistency detection
dependencyDepthTransitive import depth + fan-in/fan-out risk
boundaryViolationsCross-package internal import detection
layersInferAuto-infer architectural layers from import graph
layersCheckValidate imports against layer boundaries
export --htmlInteractive HTML architecture report