Skip to content

Build the Index

Terminal window
iw index build

This runs the full pipeline and writes .iw/index.db.

OptionDefaultDescription
--depth <mode>structuredstructured or full (see below)
--include <glob>Only index files matching glob
--exclude <glob>Skip files matching glob
-v, --verboseoffShow per-stage progress

Scans only headings, bold text, code spans, and identifiers in documents. Fast and precise — best for everyday use.

Terminal window
iw index build
# or explicitly:
iw index build --depth structured

Adds body text scanning with dictionary matching and IDF-based noise filtering.

Terminal window
iw index build --depth full

Full mode produces significantly more annotations:

MetricStructuredFull
Annotations6,72111,533 (+72%)
Grounded links2,5487,360 (+189%)
Co-occurrences1,0992,631 (+139%)
Build time1.1 s2.8 s

The IDF filter removes noise by penalizing terms that appear in nearly every document (common words like “data”, “type”, “config”). A baseline of 50 stopwords with a ceiling of 0.15 keeps precision high while allowing the richer recall.

The build runs these stages in order:

  1. AX — AST extraction (tree-sitter) → builds a symbol registry with all classes, functions, interfaces, types, exports
  2. KWX — Keyword extraction → scans documents for entity mentions. In full mode, receives the symbol dictionary for body text matching
  3. COX — Co-occurrence scoring → finds entity pairs mentioned together
  4. TCG — Git analysis → co-change Jaccard scores, hotspot detection, ownership, staleness flags
  5. Annotate — Matches document mentions to code symbols, applies IDF penalties
  6. Write — Persists everything to SQLite
Terminal window
# Only index src/ and docs/
iw index build --include "src/**" --include "docs/**"
# Exclude test files
iw index build --exclude "**/*.test.ts" --exclude "**/__tests__/**"

The build creates .iw/index.db with these tables:

TableContents
symbolsCode symbols from AST (name, kind, file, line, exported)
annotationsDoc spans → code symbols (confidence, source, IDF score)
co_occurrencesEntity pairs co-mentioned in docs or co-imported in code
co_changesFile pairs that change together in git (Jaccard + recency)
filesPer-file metadata (last modified, churn, hotspot, owner, content hash)