Summary
The CI test workflow (.github/workflows/ci.yml) runs every test job on a single Linux runner (ubuntu-latest). There is no OS matrix, so the test suite never executes on macOS or Windows — even though executor ships OS-sensitive code:
- a cross-compiled CLI with
BUN_TARGET values for bun-darwin-arm64, bun-darwin-x64, bun-windows-x64, bun-linux-x64
- an Electron desktop app (mac/win/linux distributables)
- a self-host Docker image
The only place all three OSes appear today is publish-desktop.yml, which is a release workflow — it builds distributables and runs a minimal "does the compiled binary boot" smoke test. That smoke test doesn't even cover all architectures (the mac x64 cross-compile leg skips it), and it isn't part of the per-PR test gate.
Current state
Every job in ci.yml is pinned to ubuntu-latest:
| Job |
runs-on |
| format |
ubuntu-latest |
| lint |
ubuntu-latest |
| typecheck |
ubuntu-latest |
| test |
ubuntu-latest |
| e2e-local |
ubuntu-latest |
| desktop-smoke |
ubuntu-latest |
| selfhost-docker-smoke |
ubuntu-latest |
Why this matters
The publish workflow's own comments document that v1.5.0/.1 shipped local-server binaries that died on launch (missing libsql native binding) — "a regression dev mode can't catch because bun run resolves node_modules that bun build --compile does not bundle."
That is exactly the class of platform-native failure a multi-OS test matrix is designed to catch. As it stands, a macOS- or Windows-specific runtime regression can merge with zero coverage — the gate only sees Linux.
Proposal
Add macos-latest and windows-latest to a matrix on the jobs that actually exercise runtime/platform behavior. Example for the test job:
test:
name: Test (${{ matrix.os }})
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: 1.3.11
- uses: actions/setup-node@v4
with:
node-version: 22
- run: bun install --frozen-lockfile
- run: bun run test
Suggested scope per job (pragmatic — matrix the high-value jobs, leave OS-independent ones alone):
test + typecheck → full 3-OS matrix. Highest value — these exercise runtime behavior and types across platforms.
desktop-smoke → extend to mac + win too. publish-desktop.yml already cross-builds per-OS sidecar targets; smoke-building on all three in CI would catch sidecar build breaks before release instead of at publish time.
format / lint → leave Linux-only. OS-independent; the matrix cost isn't justified.
e2e-local → defer for now (the workflow notes existing browser/boot flakiness). Revisit once the local suite is stabilized.
selfhost-docker-smoke → Linux-only is appropriate (Docker-centric image).
This closes the gap between "we build on 3 OSes" and "we test on 3 OSes," and turns the kind of native-binding launch failure that slipped through in v1.5.0/.1 into a CI failure instead of a shipped release.
Happy to open a PR with the matrix changes if that's welcome — let me know.
Summary
The CI test workflow (
.github/workflows/ci.yml) runs every test job on a single Linux runner (ubuntu-latest). There is no OS matrix, so the test suite never executes on macOS or Windows — even thoughexecutorships OS-sensitive code:BUN_TARGETvalues forbun-darwin-arm64,bun-darwin-x64,bun-windows-x64,bun-linux-x64The only place all three OSes appear today is
publish-desktop.yml, which is a release workflow — it builds distributables and runs a minimal "does the compiled binary boot" smoke test. That smoke test doesn't even cover all architectures (the mac x64 cross-compile leg skips it), and it isn't part of the per-PR test gate.Current state
Every job in
ci.ymlis pinned toubuntu-latest:Why this matters
The publish workflow's own comments document that v1.5.0/.1 shipped local-server binaries that died on launch (missing libsql native binding) — "a regression dev mode can't catch because
bun runresolvesnode_modulesthatbun build --compiledoes not bundle."That is exactly the class of platform-native failure a multi-OS test matrix is designed to catch. As it stands, a macOS- or Windows-specific runtime regression can merge with zero coverage — the gate only sees Linux.
Proposal
Add
macos-latestandwindows-latestto a matrix on the jobs that actually exercise runtime/platform behavior. Example for thetestjob:Suggested scope per job (pragmatic — matrix the high-value jobs, leave OS-independent ones alone):
test+typecheck→ full 3-OS matrix. Highest value — these exercise runtime behavior and types across platforms.desktop-smoke→ extend to mac + win too.publish-desktop.ymlalready cross-builds per-OS sidecar targets; smoke-building on all three in CI would catch sidecar build breaks before release instead of at publish time.format/lint→ leave Linux-only. OS-independent; the matrix cost isn't justified.e2e-local→ defer for now (the workflow notes existing browser/boot flakiness). Revisit once the local suite is stabilized.selfhost-docker-smoke→ Linux-only is appropriate (Docker-centric image).This closes the gap between "we build on 3 OSes" and "we test on 3 OSes," and turns the kind of native-binding launch failure that slipped through in v1.5.0/.1 into a CI failure instead of a shipped release.
Happy to open a PR with the matrix changes if that's welcome — let me know.