fix(wire): honor binary result-format codes (Power BI/Tableau preview ERROR)#7
Merged
Merged
Conversation
f5817ca to
bc7d52a
Compare
Power BI Desktop / Tableau and other Npgsql/pgjdbc clients bind result columns for binary in the extended-query protocol, but the gateway emitted text-format DataRows unconditionally. Binary-decoding text bytes fails for every type except varchar (whose binary and text wire forms are identical), which surfaced as "ERROR on Float/Integer/Timestamp columns, strings fine" in the Power BI Navigator preview while the full (text) data load worked. Real PostgreSQL honors the Bind request and sends binary, which was the one concrete behavioural difference. Honor the portal's `result_column_format` on the live-data path: - `types::encode_cell` dispatches per-column on the requested `FieldFormat` and converts Trino JSON values to typed Rust values so pgwire's `DataRowEncoder` emits PostgreSQL binary: bool, int2/4/8, float4/8, numeric, date, time, timestamp (without tz); the string family passes through as bytes (binary == text). Unsupported types fail closed (SQLSTATE 0A000) rather than emit bytes the client would misread; SQL NULL is handled once (format-independent -1 length). - `trino_stream::build_pg_schema` sets each column's format from the portal, threaded through `process_query` -> `execute_trino_query`. The simple-query path stays text (it never negotiates binary). The static catalog/intercept path remains text-only (type-loading drivers request text there); tracked as follow-up. Tests: - Byte-level unit tests for `encode_cell` in `types.rs` (no Trino). - The three extended-protocol tests that were `#[ignore]`'d for this are re-enabled, plus a new test covering the customer's exact column types (bigint, real, double, timestamp). Verified end-to-end against Trino 479: they pass with the fix and fail with "error deserializing column" when the schema-format line is reverted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bc7d52a to
2df9a4d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Honor the client-requested binary result-column format in the extended-query protocol. Previously the gateway always emitted text-format
DataRows, so clients that bind result columns for binary (Power BI's Npgsql typed path, Tableau/pgjdbc, tokio-postgres) misdecoded everything exceptvarchar.Why
A customer reported the Power BI Desktop Navigator preview showing
ERRORon Float / Integer / Timestamp columns while string columns rendered, and the full data load worked. Root cause: binary-decoding our text bytes fails for every type except the string family (whose binary and text wire forms are byte-identical). Real PostgreSQL honors theBindresult-format request and sends binary — the one concrete behavioural difference. This was originally mis-filed as an upstream pgwire gap; pgwire already supports binary output, so the fix is ours.How
types::encode_celldispatches per column on the requestedFieldFormat, converting Trino JSON values to typed Rust values so pgwire'sDataRowEncoderemits PostgreSQL binary. Supported:bool,int2/4/8,float4/8,numeric,date,time,timestamp(no tz); the string family passes through as raw bytes. SQL NULL handled once (format-independent).0A000) for types without a faithful binary encoder rather than emitting bytes the client would misread — we cannot silently downgrade to text because the client decodes strictly per the format it bound.trino_stream::build_pg_schematakes the portal'sresult_column_format, threaded throughprocess_query→execute_trino_query. The schema drives both the RowDescription and the per-cell encoding, so they cannot diverge. Simple-query path stays text.Scope / follow-ups (not in this PR)
pg_typeetc.) remains text-only. Type-loading drivers request text there, so it's not user-visible; pinned by a regression test.timestamptz,interval,uuid,bytea,inet,json/jsonb, arrays fail closed under binary — add encoders as needed.Testing
encode_cell(no Trino).#[ignore]'d extended-protocol tests + a new test for the customer's exact types (bigint,real,double,timestamp).error deserializing columnwhen the schema-format line is reverted — confirming they exercise the fix.cargo test(257 across 5 suites, with Trino),cargo clippy --all-targets -- -D warnings,cargo fmt --checkall clean.Review notes
chronoandrust_decimal(versions unify with what pgwire's default features already pull intopostgres-types;rust_decimalusesdb-postgres). Both Apache-2.0 / MIT.types.rs::encode_binary_cell(range checks, NaN/Infinity-as-string, decimal precision fail-closed, timestamp parse formats).🤖 Generated with Claude Code