Skip to content

CARI Technical Specification

Implemented — Phases 1–10 complete, 1248 tests passing across 63 test files.

  1. Zero external cost — no LLM calls, no paid APIs, no database servers
  2. Single-file output — one SQLite database, portable and inspectable
  3. Three-signal fusion — AST structure + document semantics + git temporal data
  4. Gap detection — disagreements between signals are the most valuable findings
  5. CI-ready — exit codes, machine-readable formats, fast execution

Tree-sitter parses source files to produce a symbol registry:

  • Classes, interfaces, type aliases, enums
  • Functions (top-level and methods)
  • Exported variables and constants
  • Parameter types and return types (where available)

Supported languages: TypeScript, JavaScript, Swift, Python.

Markdown documents are scanned for entity mentions:

SourceWhat It CapturesDepth
Headings (H1-H6)Section topicsstructured
Bold textEmphasized entitiesstructured
Code spansInline code referencesstructured
IdentifiersCamelCase / snake_case tokensstructured
Body textDictionary-matched termsfull only

Git log analysis produces:

  • Co-change Jaccard — P(A∩B) / P(A∪B) for file pairs across commits
  • Recency weighting — recent co-changes score higher
  • Hotspot detection — files with high churn
  • Ownership — most frequent committer per file
  • Staleness — time since last modification relative to dependent code

The annotator matches document mentions to code symbols:

  1. For each keyword mention in a document, search the symbol registry
  2. Exact matches get confidence 1.0
  3. Partial/fuzzy matches get reduced confidence
  4. In full mode, apply IDF penalties to body-text matches

Terms appearing in > 85% of documents are penalized:

  • Stopword baseline: ~50 known-noisy terms (e.g., “data”, “type”, “value”)
  • Ceiling: 0.15 maximum IDF score for baseline terms
  • Exemptions: Heading, bold, and code-span annotations are never penalized

Measured on the IntentWeave monorepo (264 code files, 7 docs, 5316 symbols):

MetricStructuredFullDelta
Build time1.1 s2.8 s+1.7 s
Annotations6,72111,533+72%
Grounded (code-linked)2,548 (38%)7,360 (64%)+189%
Co-occurrence edges1,0992,631+139%
IDF terms tracked2,843
Index file size~2 MB~4 MB+100%

All phases are complete:

PhaseScopeTests
1. FoundationWriter, schema, retrieval queries22
2. ConnectionsCo-occurrence, co-change, gap detection20
3. CI DriftCheck command, severity levels, formats16
4. IncrementalContent-hash updates, corpus report12
5. Annotation DepthDictionary matching, IDF filtering, stopword baseline22
6. Code QualityClone detection (exact + structural), circular imports, unused exports, TODO tracking, hotspot prioritization
7. Doc CompletenessModule coverage, orphaned sections, per-doc completeness, cross-group drift, doc-group classification
8. Graph TopologyCommunity detection, hub analysis, surprising connections, rationale extraction, terminology inconsistency
9. ArchitectureLayer inference, layer check, dependency depth, boundary violations, HTML architecture report, LLM layer naming
10. Advanced AnalysisFocused architecture, interface conformance, dead feature detection, API surface changelog, layer comparison
Total1248
FilePurpose
packages/index/src/writer.tsSQLite index builder
packages/index/src/annotator.tsMention→symbol matching + IDF
packages/index/src/idf.tsIDF scorer + stopword baseline
packages/index/src/schema.tsSQLite table definitions
packages/index/src/queries/retrieve.tsRanked retrieval
packages/index/src/queries/connections.tsConnection discovery
packages/index/src/queries/check.tsCI drift detection
packages/index/src/queries/report.tsHealth dashboard
packages/index/src/queries/clones.tsExact + structural clone detection
packages/index/src/queries/imports.tsCircular imports + unused exports
packages/index/src/queries/hotspotPriority.tsHigh-churn low-doc file ranking
packages/index/src/queries/todos.tsTODO/FIXME/HACK/XXX inventory
packages/index/src/queries/moduleCoverage.tsDocumentation coverage per directory
packages/index/src/queries/orphanedSections.tsDoc sections with ungrounded mentions
packages/index/src/queries/docCompleteness.tsPer-doc completeness vs. exports
packages/index/src/queries/crossGroupDrift.tsCross-group coverage conflicts
packages/index/src/queries/layersCheck.tsValidate imports against layers
packages/index/src/queries/layerNaming.tsLLM-generated layer & directory names
packages/index/src/queries/archReport.tsArchitecture report data collector
packages/index/src/export/htmlReport.tsStandalone HTML architecture renderer
packages/index/src/export/focusReport.tsFocused Graphviz SVG report
packages/index/src/queries/focus.tsFocused architecture query
packages/index/src/queries/interfaceConformance.tsInterface conformance drift
packages/index/src/queries/deadFeatures.tsDead feature detection (3 signals)
packages/index/src/queries/layersCompare.tsAs-is vs. as-should layer comparison
packages/cli/src/api-surface/apiSurface.tsAPI surface changelog (git + AST)
packages/analyzer/src/kwg/heuristicExtractor.tsKeyword extraction
packages/analyzer/src/kwg/kwxStage.tsKWX stage options
packages/cli/src/commands/indexBuild.tsBuild orchestrator