SQLite-first knowledge compiler: immutable raw sources → canonical SQLite store → grounded LLM answers.
A riff on Andrej Karpathy's LLM Wiki pattern: instead of an agent-maintained markdown wiki alone, KC keeps a canonical SQLite store as source of truth and projects lean markdown views from it (current/ for browsing, audit/ for full ingest detail).
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Set OPENAI_API_KEY or ANTHROPIC_API_KEY in .envkc migrate
kc ingest --dir data/raw/
kc project
kc claims list --conflicts
kc query "When did Jane Doe stop being CEO?"The demo corpus in data/raw/ includes three Acme Corp text files and acme-quarterly-revenue.csv.
Apply pending SQLite migrations.
kc migrateDatabase is up to date.
Ingest one file or scan a directory. Text sources go through LLM extraction; CSV files become materialized tables with column semantics.
kc ingest --dir data/raw/ # batch: only changed files
kc ingest data/raw/acme-leadership.txt # single text source
kc ingest data/raw/acme-quarterly-revenue.csv
kc ingest data/raw/acme-leadership.txt --force # re-extract despite unchanged hashUnchanged files are a no-op (content-hash dedup):
$ kc ingest --dir data/raw/
acme-annual-report-2025.txt: noop — No changes for acme-annual-report-2025.txt
acme-corp-history.txt: noop — No changes for acme-corp-history.txt
acme-leadership.txt: noop — No changes for acme-leadership.txt
acme-quarterly-revenue.csv: noop — No changes for acme-quarterly-revenue.csv
First CSV ingest:
$ kc ingest data/raw/acme-quarterly-revenue.csv
acme-quarterly-revenue.csv: success — Ingested acme-quarterly-revenue.csv: 24 rows, 5 columns
Regenerate markdown projections from the canonical store.
kc project # current/ + audit/
kc project --audit-only # audit/ onlyProjections written to projections/current (audit view at projections/audit)
Browse claims from SQLite. Grouped by conflict slot (predicate_norm).
kc claims list
kc claims list --entity "Jane Doe"
kc claims list --conflicts$ kc claims list --conflicts
## completed_milestone
- [18] In March 2024, Orion completed its first successful ingest pipeline demo (disputed, conf=0.95) [acme-corp-history.txt]
- [37] Project Orion completed its first production ingest pipeline in June 2024 (disputed, conf=0.9) [acme-annual-report-2025.txt]
## founded_in_year
- [19] Acme Corp was founded in 2010 by Jane Doe in San Francisco (disputed, conf=0.95) [acme-leadership.txt]
- [31] Acme Corp was founded in 2011 by Jane Doe and John Smith in San Francisco (disputed, conf=0.95) [acme-annual-report-2025.txt]
## served_as_role:CEO
- [22] Jane Doe served as CEO of Acme Corp from 2010 until 2022 (disputed, conf=0.95) [acme-leadership.txt]
- [35] Jane Doe served as CEO from 2011 until December 2023 (disputed, conf=0.95) [acme-annual-report-2025.txt]
Resolve disputes in the canonical store (with ingest audit trail).
kc claims accept 22 # accept; supersede siblings by default
kc claims accept 22 --no-supersede-siblings
kc claims supersede 35 --reason "Outdated timeline"$ kc claims accept 22
Claim 22: disputed -> accepted
Conflict group: edd02e0397bd2ed9...
Superseded siblings: 35
After manual edits, run kc project to refresh projections.
Plan and execute claim-logic steps (entities, claims, conflicts, assessment). Prints the trace JSON; does not synthesize prose unless --explain.
kc reason "When did Jane Doe stop being CEO?"
kc reason "When did Jane Doe stop being CEO?" --explain$ kc reason "When did Jane Doe stop being CEO?"
Plan:
0. resolve_entities({'query': 'Jane Doe', 'limit': 10})
1. get_entity_claims({'entity_id': '$0[0].entity_id', 'role': 'any', 'status': 'any', 'limit': 50})
2. find_conflicts({'entity_id': '$0[0].entity_id'})
3. derive_assessment({'target': {'entity_id': '$0[0].entity_id'}, 'question_type': 'role_timeline'})
The trace includes a deterministic verdict (disputed, supported, etc.). The LLM plans steps and narrates; it does not decide truth.
Route by intent — text, tabular, or hybrid — then plan, execute, and synthesize a grounded answer with citations.
| Intent | Example question | Executor |
|---|---|---|
| text | When did Jane Doe stop being CEO? |
claim-logic (+ FTS fallback) |
| tabular | Compare Acme and Globex revenue |
tabular (aggregate on CSV) |
| hybrid | Compare revenue and explain the CEO dispute |
both |
kc query "When did Jane Doe stop being CEO?"
kc query "Compare Acme and Globex revenue"
kc query "Compare revenue and explain the CEO dispute"Tabular — totals from dataset execution (sum_revenue_usd grouped by company):
$ kc query "Compare Acme and Globex revenue"
Based on aggregated data from the quarterly revenue dataset [source:acme-quarterly-revenue.csv]:
| Company | Total Revenue (USD) |
|-----------|---------------------|
| Acme Corp | $11,990,000 |
| Globex | $20,010,000 |
Globex's total revenue significantly exceeds Acme Corp's by approximately $8,020,000 (~67% higher).
Note: Answer synthesized from tabular dataset execution.
Citations:
- [acme-quarterly-revenue.csv]: company=Acme Corp, sum_revenue_usd=11990000
Text — claim trace + span citations (wording varies with dispute state):
Note: Some retrieved claims are disputed.
Citations:
- [acme-leadership.txt] span 8: Jane Doe served as CEO of Acme Corp from 2010 until 2022...
Hybrid — merges tabular totals with claim/dispute narrative in one answer.
Note: Answer synthesized from tabular data and claim reasoning.
Inspect ingested CSV datasets and column semantics (semantic_role, entity link).
kc datasets list
kc datasets describe acme-quarterly-revenue.csv
kc datasets describe 1 # by numeric dataset ID$ kc datasets list
- [1] acme-quarterly-revenue.csv (24 rows, 5 columns) -> d_4_acme_quarterly_revenue_csv
$ kc datasets describe acme-quarterly-revenue.csv
Dataset 1: acme-quarterly-revenue.csv
Table: d_4_acme_quarterly_revenue_csv (24 rows)
Columns:
- company (TEXT) [entity link]
- region (TEXT)
- quarter (TEXT)
- revenue_usd (INTEGER)
- headcount (INTEGER)
Sample rows:
{'company': 'Acme Corp', 'region': 'North America', 'quarter': '2024-Q1', 'revenue_usd': 1250000, 'headcount': 420}
...
Plan and execute tabular query steps (filter, aggregate, entity lookup). Prints plan + JSON trace — useful for debugging without LLM narration.
kc table reason "What is total revenue by company?"
kc table reason "Compare Acme and Globex revenue"$ kc table reason "Compare Acme and Globex revenue"
Plan:
0. describe_dataset({'dataset_id': 1})
1. aggregate({'dataset_id': 1, 'group_by': ['company'], 'metrics': [{'column': 'revenue_usd', 'op': 'sum'}]})
"result": [
{"company": "Acme Corp", "sum_revenue_usd": 11990000},
{"company": "Globex", "sum_revenue_usd": 20010000}
]
Tabular planning uses structured LLM output (LLMTabularPlan) with schema validation and a deterministic heuristic fallback. Column names come from the dataset catalog — not hardcoded demo fields.
Merge duplicate entities by ID or name/alias.
kc merge-entities "Stanford AI Lab" "Stanford" --dry-run
kc merge-entities "Stanford AI Lab" "Stanford"$ kc merge-entities "Stanford AI Lab" "Stanford" --dry-run
Would merge [4] Stanford AI Lab
into [13] Stanford
Claims to repoint: 0
Aliases: Stanford AI Lab, Stanford's AI lab
Lean browse view — claims grouped by conflict slot, source links. (conflict) only on real disputes:
### served_as_role:CEO (conflict)
- Jane Doe served as CEO of Acme Corp from 2010 until 2022 (disputed, conf=0.95) [[acme-leadership.txt](../sources/acme-leadership.txt.md)]
- Jane Doe served as CEO from 2011 until December 2023 (disputed, conf=0.95) [[acme-annual-report-2025.txt](../sources/acme-annual-report-2025.txt.md)]Verbose projection: relations, retired claims, span listings — for debugging ingest and reconcile.
data/raw/ ──ingest──▶ data/knowledge.db ──project──▶ projections/
│
├── text path: LLM extract → claims, entities, spans
├── csv path: column roles → materialized tables + row evidence
└── query:
intent (text | tabular | hybrid)
→ claim-logic executor (reasoning/)
→ tabular executor (tabular/)
→ LLM narration (deterministic trace in, prose out)
| Layer | Role |
|---|---|
Source adapters (ingest/adapters/) |
Text → LLM claims; CSV → SQLite tables + semantic_role metadata |
Claim-logic engine (reasoning/) |
Typed LLM plans → 12 deterministic API functions → assessment verdict |
Tabular engine (tabular/) |
Catalog-driven plans → filter / aggregate / entity lookup on dataset tables |
Query synthesis (query/) |
Intent routing, hybrid merge, FTS fallback |
| Manual ops | claims accept/supersede, merge-entities — audited writes to canonical store |
| Command | Purpose |
|---|---|
kc migrate |
Apply pending migrations |
kc ingest <path> |
Ingest one raw source |
kc ingest --dir <dir> |
Batch ingest changed sources |
kc ingest --force |
Re-extract despite unchanged hash |
kc project |
Regenerate markdown projections |
kc project --audit-only |
Regenerate audit view only |
kc claims list [--entity] [--conflicts] |
List claims |
kc claims accept <id> |
Accept claim; supersede siblings by default |
kc claims supersede <id> |
Mark claim superseded |
kc reason "<q>" [--explain] |
Claim-logic trace (no narration by default) |
kc query "<q>" |
Route + execute + grounded answer |
kc datasets list |
List ingested CSV datasets |
kc datasets describe <key|id> |
Schema and sample rows |
kc table reason "<q>" |
Tabular plan + execution trace |
kc merge-entities <from> <to> [--dry-run] |
Merge entities |