Skip to content

Editor intelligence: reference data loader and tokenizer dynamic-keyword API (Phase 2a) #25

Description

@BorisTyshkevich

Split from #22. Phase 2a of the textarea editor enhancement track. The autocomplete dropdown (#26) and hover docs (#27) build on this; do not start them before this is merged.

Goal

Load ClickHouse reference data once per connection, expose it on app state, and teach the tokenizer to accept dynamic keyword/function sets — all without touching autocomplete UI.

New module: src/net/reference-data.js

export async function loadReferenceData(ctx)

Returns:

{
  keywords: Set,         // upper-cased canonical names; SQL_KEYWORDS fallback
  funcs: Set,            // canonical names from system.functions, e.g. toDateTime, BLAKE3
                         // SQL_FUNCS fallback if system.functions absent
  funcLookup: Set,       // lowercase normalized set for tokenizer matching and filtering
                         // fallback: new Set([...SQL_FUNCS].map(f => f.toLowerCase()))
  funcDefs: Map | null,  // canonical name → {syntax: string, arguments: string}; null if unavailable
  completions: Array | null,  // [{word, context, belongs}]; null if system.completions absent
}

Capability detection — do not version-string-check. Attempt each query independently; catch any error and fall back:

  • SELECT keyword FROM system.keywords FORMAT JSON → catch → use imported SQL_KEYWORDS.
  • SELECT name, syntax, arguments FROM system.functions FORMAT JSON → catch → use imported SQL_FUNCS for funcs, fallback funcLookup, funcDefs: null.
    When the query succeeds, build all three from the rows:
    • funcsnew Set(rows.map(r => r.name)) preserving canonical casing (toDateTime, BLAKE3, sipHash128Reference, …).
    • funcLookupnew Set(rows.map(r => r.name.toLowerCase())) for case-insensitive tokenizer matching and autocomplete filtering.
    • funcDefsnew Map(rows.map(r => [r.name, { syntax: r.syntax, arguments: r.arguments }])) keyed by canonical name.
  • SELECT word, context, belongs FROM system.completions LIMIT 2000 FORMAT JSON → catch → completions: null.

Uses the existing queryJson(ctx, sql). No signal needed — fire-and-forget on connect.

funcDefs is loaded here (alongside funcs) so hover docs (#27) can use system.functions.syntax and arguments without an additional network round-trip.

Tokenizer API change: src/core/sql-highlight.js

Change tokenize signature:

// Before
export function tokenize(sql)

// After — backward-compatible; all existing callers work unchanged
export function tokenize(sql, { keywords, funcs, funcLookup } = {})

When options are absent, fall back to module-level sets. The word-classification block uses two sets: funcs (canonical, for exact match) and funcLookup (lowercase, for case-insensitive match):

const funcsSet    = funcs      ?? SQL_FUNCS;
const funcLookupSet = funcLookup ?? new Set([...SQL_FUNCS].map(f => f.toLowerCase()));
// ...
else if (funcsSet.has(word) || funcLookupSet.has(word.toLowerCase())) type = 'func';

This correctly highlights toDateTime, BLAKE3, and sipHash128Reference as typed by the user (any case), while keeping canonical names intact for autocomplete and funcDefs lookups. Do not lower-case the canonical funcs set — that would cause autocomplete to insert wrong-cased names.

renderHighlightInto(preEl, sql) in editor.js gains an optional third arg opts and passes it to tokenize(sql, opts).

State

Add editorReference: null to the object returned by createState() in src/state.js.

App lifecycle call site (src/ui/app.js)

After loadSchema resolves successfully, call:

loadReferenceData(ctx).then(ref => {
  app.state.editorReference = ref;
  app.dom.editorSync?.();   // repaint active editor with new keyword/func sets
}).catch(() => {});

Pass { keywords: ref.keywords, funcs: ref.funcs, funcLookup: ref.funcLookup } into renderHighlightInto via editorSync so the editor uses server-loaded sets.

One implementation note, not a blocker: avoid rebuilding the fallback funcLookup set on every tokenize() call. Make it a module-level constant, for example SQL_FUNC_LOOKUP, so the keystroke path stays allocation-light.

Acceptance criteria

  • loadReferenceData returns SQL_KEYWORDS/SQL_FUNCS/fallback funcLookup sets when the corresponding system tables are absent.
  • funcs preserves canonical casing from system.functions (e.g. toDateTime, BLAKE3, sipHash128Reference); names are not lower-cased.
  • funcLookup contains the lowercase forms of all canonical names and is used only for case-insensitive matching — never inserted into the editor.
  • funcDefs is null when system.functions is absent; a Map keyed by canonical name with syntax and arguments fields when present.
  • completions entries have word, context, and belongs fields (not kind/syntax).
  • Each of the three queries fails independently without affecting the others.
  • app.state.editorReference is null in createState() and populated after a successful connection.
  • tokenize(sql) with no second argument behaves identically to before — existing tests pass unchanged.
  • tokenize(sql, { keywords, funcs, funcLookup }) uses the provided sets; funcs and funcLookup are both consulted for classification.
  • A word typed as count, Count, or COUNT is classified as func when funcLookup contains count.
  • The editor repaints with server-loaded keyword/function sets after reference data loads.
  • No SQL executes on ordinary typing.
  • No localStorage cache introduced.
  • src/net/reference-data.js at 100% coverage (success path, each table absent individually, all tables absent).
  • src/core/sql-highlight.js maintains 100% coverage (default-arg and provided-arg paths both exercised).
  • src/state.js maintains coverage (new editorReference field covered).

Non-goals

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions