Skip to content

Build the Index

Terminal window
iw index build

This runs the full pipeline and writes .iw/index.db.

OptionDefaultDescription
--depth <mode>structuredstructured or full (see below)
--include <glob>Only index files matching glob
--exclude <glob>Skip files matching glob
-v, --verboseoffShow per-stage progress

Scans only headings, bold text, code spans, and identifiers in documents. Fast and precise — best for everyday use.

Terminal window
iw index build
# or explicitly, passing depth to buildFromPaths:
iw index build --depth structured

Adds body text scanning with dictionary matching and IDF-based noise filtering.

Terminal window
iw index build --depth full

Full mode produces significantly more annotations:

MetricStructuredFull
Annotations6,72111,533 (+72%)
Grounded links2,5487,360 (+189%)
Co-occurrences1,0992,631 (+139%)
Build time1.1 s2.8 s

The IDF filter removes noise by penalizing terms that appear in nearly every document (common words like “data”, “type”, “config”). A baseline of 50 stopwords with a ceiling of 0.15 keeps precision high while allowing the richer recall.

The build runs these stages in order:

  1. AX — AST extraction (tree-sitter) → builds a symbol registry with all classes, functions, interfaces, types, exports
  2. KWX — Keyword extraction → scans documents for entity mentions. In full mode, receives the symbol dictionary for body text matching
  3. COX — Co-occurrence scoring → finds entity pairs mentioned together
  4. TCG — Git analysis → co-change Jaccard scores, hotspot detection, ownership, staleness flags
  5. Annotate — Matches document mentions to code symbols, applies IDF penalties
  6. Write — Persists everything to SQLite
Terminal window
# Only index src/ and docs/
iw index build --include "src/**" --include "docs/**"
# Exclude test files
iw index build --exclude "**/*.test.ts" --exclude "**/__tests__/**"

The build creates .iw/index.db with these tables:

TableContents
symbolsCode symbols from AST (name, kind, file, line, exported)
annotationsDoc spans → code symbols (confidence, source, IDF score)
co_occurrencesEntity pairs co-mentioned in docs or co-imported in code
co_changesFile pairs that change together in git (Jaccard + recency)
filesPer-file metadata (last modified, churn, hotspot, owner, content hash)
import { buildFromPaths, CariIndex } from "@intentweave/index";
// Equivalent to: iw index build --depth full
const index = await buildFromPaths({
paths: ["src/", "docs/"],
workspaceRoot: process.cwd(),
depth: "full",
});
// Use immediately after building
const results = index.retrieve({ query: "authentication" });
index.close();
// Or load a previously built index
const existing = CariIndex.load(".iw/index.db");

buildFromPaths runs the full pipeline: AX → KWX → COX → TCG → Annotate → Write and returns a ready-to-use CariIndex instance backed by the symbols, annotations, co_occurrences, co_changes, files, and imports tables.