Editor intelligence: reference data loader and tokenizer dynamic-keyword API (Phase 2a)

Split from #22. Phase 2a of the textarea editor enhancement track. The autocomplete dropdown (#26) and hover docs (#27) build on this; do not start them before this is merged.

## Goal

Load ClickHouse reference data once per connection, expose it on app state, and teach the tokenizer to accept dynamic keyword/function sets — all without touching autocomplete UI.

## New module: `src/net/reference-data.js`

```js
export async function loadReferenceData(ctx)
```

Returns:

```js
{
  keywords: Set,         // upper-cased canonical names; SQL_KEYWORDS fallback
  funcs: Set,            // canonical names from system.functions, e.g. toDateTime, BLAKE3
                         // SQL_FUNCS fallback if system.functions absent
  funcLookup: Set,       // lowercase normalized set for tokenizer matching and filtering
                         // fallback: new Set([...SQL_FUNCS].map(f => f.toLowerCase()))
  funcDefs: Map | null,  // canonical name → {syntax: string, arguments: string}; null if unavailable
  completions: Array | null,  // [{word, context, belongs}]; null if system.completions absent
}
```

**Capability detection — do not version-string-check.** Attempt each query independently; catch any error and fall back:

- `SELECT keyword FROM system.keywords FORMAT JSON` → catch → use imported `SQL_KEYWORDS`.
- `SELECT name, syntax, arguments FROM system.functions FORMAT JSON` → catch → use imported `SQL_FUNCS` for `funcs`, fallback `funcLookup`, `funcDefs: null`.
  When the query succeeds, build all three from the rows:
  - `funcs` — `new Set(rows.map(r => r.name))` preserving canonical casing (`toDateTime`, `BLAKE3`, `sipHash128Reference`, …).
  - `funcLookup` — `new Set(rows.map(r => r.name.toLowerCase()))` for case-insensitive tokenizer matching and autocomplete filtering.
  - `funcDefs` — `new Map(rows.map(r => [r.name, { syntax: r.syntax, arguments: r.arguments }]))` keyed by canonical name.
- `SELECT word, context, belongs FROM system.completions LIMIT 2000 FORMAT JSON` → catch → `completions: null`.

Uses the existing `queryJson(ctx, sql)`. No `signal` needed — fire-and-forget on connect.

**`funcDefs` is loaded here** (alongside `funcs`) so hover docs (#27) can use `system.functions.syntax` and `arguments` without an additional network round-trip.

## Tokenizer API change: `src/core/sql-highlight.js`

Change `tokenize` signature:

```js
// Before
export function tokenize(sql)

// After — backward-compatible; all existing callers work unchanged
export function tokenize(sql, { keywords, funcs, funcLookup } = {})
```

When options are absent, fall back to module-level sets. The word-classification block uses two sets: `funcs` (canonical, for exact match) and `funcLookup` (lowercase, for case-insensitive match):

```js
const funcsSet    = funcs      ?? SQL_FUNCS;
const funcLookupSet = funcLookup ?? new Set([...SQL_FUNCS].map(f => f.toLowerCase()));
// ...
else if (funcsSet.has(word) || funcLookupSet.has(word.toLowerCase())) type = 'func';
```

This correctly highlights `toDateTime`, `BLAKE3`, and `sipHash128Reference` as typed by the user (any case), while keeping canonical names intact for autocomplete and `funcDefs` lookups. **Do not lower-case the canonical `funcs` set** — that would cause autocomplete to insert wrong-cased names.

`renderHighlightInto(preEl, sql)` in `editor.js` gains an optional third arg `opts` and passes it to `tokenize(sql, opts)`.

## State

Add `editorReference: null` to the object returned by `createState()` in `src/state.js`.

## App lifecycle call site (`src/ui/app.js`)

After `loadSchema` resolves successfully, call:

```js
loadReferenceData(ctx).then(ref => {
  app.state.editorReference = ref;
  app.dom.editorSync?.();   // repaint active editor with new keyword/func sets
}).catch(() => {});
```

Pass `{ keywords: ref.keywords, funcs: ref.funcs, funcLookup: ref.funcLookup }` into `renderHighlightInto` via `editorSync` so the editor uses server-loaded sets.

One implementation note, not a blocker: avoid rebuilding the fallback funcLookup set on every tokenize() call. Make it a module-level constant, for example SQL_FUNC_LOOKUP, so the keystroke path stays allocation-light.

## Acceptance criteria

- [ ] `loadReferenceData` returns `SQL_KEYWORDS`/`SQL_FUNCS`/fallback `funcLookup` sets when the corresponding system tables are absent.
- [ ] `funcs` preserves canonical casing from `system.functions` (e.g. `toDateTime`, `BLAKE3`, `sipHash128Reference`); names are **not** lower-cased.
- [ ] `funcLookup` contains the lowercase forms of all canonical names and is used only for case-insensitive matching — never inserted into the editor.
- [ ] `funcDefs` is `null` when `system.functions` is absent; a `Map` keyed by canonical name with `syntax` and `arguments` fields when present.
- [ ] `completions` entries have `word`, `context`, and `belongs` fields (not `kind`/`syntax`).
- [ ] Each of the three queries fails independently without affecting the others.
- [ ] `app.state.editorReference` is `null` in `createState()` and populated after a successful connection.
- [ ] `tokenize(sql)` with no second argument behaves identically to before — existing tests pass unchanged.
- [ ] `tokenize(sql, { keywords, funcs, funcLookup })` uses the provided sets; `funcs` and `funcLookup` are both consulted for classification.
- [ ] A word typed as `count`, `Count`, or `COUNT` is classified as `func` when `funcLookup` contains `count`.
- [ ] The editor repaints with server-loaded keyword/function sets after reference data loads.
- [ ] No SQL executes on ordinary typing.
- [ ] No `localStorage` cache introduced.
- [ ] `src/net/reference-data.js` at 100% coverage (success path, each table absent individually, all tables absent).
- [ ] `src/core/sql-highlight.js` maintains 100% coverage (default-arg and provided-arg paths both exercised).
- [ ] `src/state.js` maintains coverage (new `editorReference` field covered).

## Non-goals

- Autocomplete dropdown UI (#26).
- Hover docs UI (#27).
- `system.table_functions`, `system.data_type_families`, `system.formats`, `system.settings`, `system.documentation` — extend later.
- `localStorage` caching.
- Live validation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editor intelligence: reference data loader and tokenizer dynamic-keyword API (Phase 2a) #25

Goal

New module: `src/net/reference-data.js`

Tokenizer API change: `src/core/sql-highlight.js`

State

App lifecycle call site (`src/ui/app.js`)

Acceptance criteria

Non-goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Editor intelligence: reference data loader and tokenizer dynamic-keyword API (Phase 2a) #25

Description

Goal

New module: src/net/reference-data.js

Tokenizer API change: src/core/sql-highlight.js

State

App lifecycle call site (src/ui/app.js)

Acceptance criteria

Non-goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

New module: `src/net/reference-data.js`

Tokenizer API change: `src/core/sql-highlight.js`

App lifecycle call site (`src/ui/app.js`)