diff --git a/.github/workflows/base-image.yml b/.github/workflows/base-image.yml index 42daeda..4060dfd 100644 --- a/.github/workflows/base-image.yml +++ b/.github/workflows/base-image.yml @@ -1,7 +1,7 @@ name: Base image (DuckDB 1.5.x) # Builds + publishes the coldfront-duckdb-base image (libcurl 8.12 + pg_duckdb -# 1.5.3 + patched duckdb-iceberg) to ghcr.io/pgedge for PG 16/17/18. The app image +# 1.5.4 + patched duckdb-iceberg) to ghcr.io/pgedge for PG 16/17/18. The app image # (docker/Dockerfile.duckdb15) does `FROM` this; CI and local builds pull it. # # Rebuilt only when the base inputs change (Dockerfile / iceberg patch / config) diff --git a/CLAUDE.md b/CLAUDE.md index e47f625..0901e3f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -44,7 +44,7 @@ - github.com/jackc/pgx/v5 (PostgreSQL driver — use pgxpool directly) - gopkg.in/yaml.v3 (config) - github.com/stretchr/testify (test assertions only) -- pg_duckdb 1.5.3 (PR #1025, DuckDB 1.5.3) + patched duckdb-iceberg, prebuilt into the `coldfront-duckdb-base` image — see [DUCKDB_1.5_PATCHED.md](DUCKDB_1.5_PATCHED.md) +- pg_duckdb 1.5.4 (PR #1025, DuckDB 1.5.4) + patched duckdb-iceberg, prebuilt into the `coldfront-duckdb-base` image — see [DUCKDB_1.5_PATCHED.md](DUCKDB_1.5_PATCHED.md) - `cmd/compactor/` is a separate Go module (apache/iceberg-go) — its heavy deps are quarantined from the lean archiver - `extension/coldfront/` — PGXS C extension. Requires `pg_config` and PG dev headers. Built inside the Docker image; users on bare-metal install with `make && make install`. diff --git a/DUCKDB_1.5_PATCHED.md b/DUCKDB_1.5_PATCHED.md index 2793b49..7784bee 100644 --- a/DUCKDB_1.5_PATCHED.md +++ b/DUCKDB_1.5_PATCHED.md @@ -1,6 +1,6 @@ # PATCHED — duckdb-iceberg patches & build (DuckDB 1.5.x) -ColdFront runs on a **custom-built DuckDB 1.5.3 base image** that carries a +ColdFront runs on a **custom-built DuckDB 1.5.4 base image** that carries a small set of patches against `duckdb-iceberg` `v1.5-variegata`. This is the one home for *what* we patch, *why*, *how the base is built*, and *how it is wired and verified*. The cold-tier compactor's own story (and the three interop @@ -131,13 +131,13 @@ patches only. | Component | Pin | Notes | |---|---|---| -| pg_duckdb | **PR #1025 head** (`9c9fbcd`) | no released tag carries 1.5.x; `git fetch origin pull/1025/head`. Sets `DUCKDB_VERSION=v1.5.3`. | -| DuckDB | **v1.5.3** (submodule `9a64d338`) | the `duckdb.*` GUCs + PRE_COMMIT iceberg-commit deferral ColdFront relies on are unchanged by the PR. | -| duckdb-iceberg | **`v1.5-variegata` @ `0fad545a`** | transaction code lives in `src/catalog/rest/transaction/`; the four patches apply here. | +| pg_duckdb | **merged PR #1025** (`c04e6a2`) | no released tag carries 1.5.x; `git checkout c04e6a2`. Its duckdb submodule is the v1.5.4 tag (`08e34c4`). | +| DuckDB | **v1.5.4 tag** (`08e34c4`) | pinned by pg_duckdb @ `c04e6a2`; the iceberg build re-pins ITS duckdb submodule to the same tag so the extension ABI matches the engine. The `duckdb.*` GUCs + PRE_COMMIT iceberg-commit deferral ColdFront relies on are unchanged. | +| duckdb-iceberg | **`v1.5-variegata` @ `0fad545a`** | extension code the four patches target — kept fixed, so the patches apply unchanged. The build re-pins its duckdb submodule to the v1.5.4 tag (the branch tracks duckdb `main`, which drifts off the release). Transaction code lives in `src/catalog/rest/transaction/`. | | avro | **`7f423d69`** | the pin `v1.5-variegata` uses. | | azure | **`v1.5-variegata` @ `563589b2`** | the ABI-matched sibling of iceberg's branch. **NOT `main`** — azure `main` collides at link (`multiple definition of duckdb::FileFlags::FILE_FLAGS_NULL_IF_NOT_EXISTS`). | -| postgres_scanner | duckdb-postgres **`main` @ `916d862b`** | the `postgres` ext; built bundled (ABI-matched, stamped v1.5.3), **shipped** in the image (never downloaded). Its vcpkg `libpq` build needs **flex** + **bison**. | -| libcurl | **build 8.12.0** (≥ 7.77) | **REQUIRED** — DuckDB 1.5.3 httpfs uses `CURLSSLOPT_AUTO_CLIENT_CERT` (≥ 7.77); the pgEdge base ships 7.76.1. 8.12.0 fixes CVE-2025-0665 (the 8.11.1 resolver SIGABRT); runtime still pins httplib regardless. | +| postgres_scanner | duckdb-postgres **`6b2b12ca`** | the `postgres` ext; built bundled (ABI-matched, stamped v1.5.4), **shipped** in the image (never downloaded). Its vcpkg `libpq` build needs **flex** + **bison**. | +| libcurl | **build 8.12.0** (≥ 7.77) | **REQUIRED** — DuckDB 1.5.4 httpfs uses `CURLSSLOPT_AUTO_CLIENT_CERT` (≥ 7.77); the pgEdge base ships 7.76.1. 8.12.0 fixes CVE-2025-0665 (the 8.11.1 resolver SIGABRT); runtime still pins httplib regardless. | ## 6. Build — `docker/Dockerfile.duckdb15-base` @@ -155,7 +155,7 @@ requirements (each a real build failure if missing): - **`flex` + `bison`** for the `postgres_scanner` vcpkg `libpq` build. - **azure pinned `v1.5-variegata` (`563589b2`), not `main`** (link collision). - iceberg/avro/azure/postgres_scanner are built **bundled** against one DuckDB - (`make release`, `OVERRIDE_GIT_DESCRIBE=v1.5.3`) so they are ABI-safe; build + (`make release`, `OVERRIDE_GIT_DESCRIBE=v1.5.4`) so they are ABI-safe; build config is `docker/iceberg-azure-extension-config-v15.cmake`. - The bakery + three interop patches are `COPY`'d in and `git apply --check`'d then applied (see the Dockerfile's patch block). @@ -172,9 +172,9 @@ current source. | File | Role | |---|---| -| `docker/Dockerfile.duckdb15-base` | base: pg_duckdb 1.5.3 (PR #1025) + libcurl 8.12 + patched iceberg/avro/azure/postgres_scanner; runtime stage = the 4 extensions + entrypoint, **no coldfront**. | +| `docker/Dockerfile.duckdb15-base` | base: pg_duckdb 1.5.4 (PR #1025) + libcurl 8.12 + patched iceberg/avro/azure/postgres_scanner; runtime stage = the 4 extensions + entrypoint, **no coldfront**. | | `docker/Dockerfile.duckdb15` | app: a `cf-build` stage compiles coldfront (PG devel only — coldfront links libpq, not pg_duckdb), then `FROM ${COLDFRONT_BASE}` copies the `.so`/SQL on top. | -| `docker/entrypoint.sh` | first-init: sets `COLDFRONT_DUCKDB_VERSION=v1.5.3`, pre-places the extensions under `$PGDATA/pg_duckdb/extensions/v1.5.3//`, writes the GUCs. | +| `docker/entrypoint.sh` | first-init: sets `COLDFRONT_DUCKDB_VERSION=v1.5.4`, pre-places the extensions under `$PGDATA/pg_duckdb/extensions/v1.5.4//`, writes the GUCs. | | `docker/iceberg-azure-extension-config-v15.cmake` | the bundled-build extension config (iceberg + avro + azure + postgres_scanner). | | `.github/workflows/base-image.yml` | builds + pushes the base via `GITHUB_TOKEN` (base rebuilds are rare). | diff --git a/README.md b/README.md index 9b9a40f..22440a0 100644 --- a/README.md +++ b/README.md @@ -212,7 +212,7 @@ pgedge-coldfront/ │ ├── topo/ ← vanilla.sh (1 node) · mesh.sh (3-node Spock) │ └── runbooks/ ← failover-patroni.md (failover delegated to Patroni) ├── docker/ -│ ├── Dockerfile.duckdb15-base ← DuckDB 1.5.x base (pg_duckdb 1.5.3 + patched iceberg) +│ ├── Dockerfile.duckdb15-base ← DuckDB 1.5.x base (pg_duckdb 1.5.4 + patched iceberg) │ ├── Dockerfile.duckdb15 ← thin coldfront app layer (ARG PG_MAJOR=16|17|18) │ ├── iceberg-*.patch ← duckdb-iceberg patches (bakery commit-refresh + strict-reader interop) │ ├── entrypoint.sh @@ -238,7 +238,7 @@ against: | Component | Version | Purpose | |-----------|---------|---------| | PostgreSQL | 16, 17, or 18 | Database with native partitioning (stock upstream; no fork) | -| pg_duckdb | 1.5.3 (PR #1025) | Iceberg reads + writes via DuckDB in-process | +| pg_duckdb | 1.5.4 (PR #1025) | Iceberg reads + writes via DuckDB in-process | | duckdb-iceberg | `v1.5-variegata` @ `0fad545a`, patched | Iceberg catalog/IO for DuckDB; carries ColdFront's four patches (see [DUCKDB_1.5_PATCHED.md](DUCKDB_1.5_PATCHED.md)) | | Lakekeeper | latest | Iceberg REST catalog (Rust binary) | | S3-compatible store | any | SeaweedFS, MinIO, GCS, Azure Blob, etc. | diff --git a/cmd/compactor/flatwalk.go b/cmd/compactor/flatwalk.go new file mode 100644 index 0000000..bc81b49 --- /dev/null +++ b/cmd/compactor/flatwalk.go @@ -0,0 +1,116 @@ +package main + +import ( + "context" + "fmt" + "io" + stdfs "io/fs" + "net/url" + "reflect" + "strings" + "time" + + iceio "github.com/apache/iceberg-go/io" + "github.com/apache/iceberg-go/table" + "gocloud.dev/blob" +) + +// flatWalkIO wraps an iceberg-go FileIO and replaces WalkDir with a FLAT object +// list (no delimiter). iceberg-go's blob WalkDir does a hierarchical fs.WalkDir +// whose per-path Open() decides directory-vs-file via Exists(): on an object +// store an object at exactly a "directory" path (e.g. .../data) collides with +// the .../data/ prefix, gets returned by List as BOTH a file and a directory, +// and the recursive ReadDir on the phantom directory fails with the Go stdlib's +// literal "readdir …: not implemented". A flat list never opens a path as a +// directory, so the collision cannot occur. Everything else delegates to the +// wrapped FileIO, and orphan reachability + deletion stay iceberg-go's. +type flatWalkIO struct { + iceio.IO +} + +func (f flatWalkIO) WalkDir(root string, fn stdfs.WalkDirFunc) error { + bucket, err := bucketOf(f.IO) + if err != nil { + // Non-blob backend (e.g. local FS): defer to the wrapped walk. + if lw, ok := f.IO.(iceio.ListableIO); ok { + return lw.WalkDir(root, fn) + } + return err + } + u, err := url.Parse(root) + if err != nil { + return fmt.Errorf("invalid URL %s: %w", root, err) + } + prefix := strings.TrimPrefix(u.Path, "/") + iter := bucket.List(&blob.ListOptions{Prefix: prefix}) // empty Delimiter => flat + for { + obj, err := iter.Next(context.Background()) + if err == io.EOF { + break + } + if err != nil { + return err + } + if obj.IsDir { // not set without a delimiter, but be defensive + continue + } + full := *u + full.Path = "/" + obj.Key // preserve scheme + container@host, swap the key + if err := fn(full.String(), flatDirEntry{obj}, nil); err != nil { + return err + } + } + return nil +} + +// bucketOf extracts the *blob.Bucket from a gocloud-backed iceberg FileIO by +// reflection — the same access iceberg-go itself uses internally +// (table/orphan_cleanup.go getBucketName). Errors for a non-blob FileIO. +func bucketOf(fio iceio.IO) (*blob.Bucket, error) { + v := reflect.ValueOf(fio) + if v.Kind() == reflect.Pointer { + v = v.Elem() + } + if v.Kind() != reflect.Struct { + return nil, fmt.Errorf("FileIO %T is not a struct", fio) + } + field := v.FieldByName("Bucket") + if !field.IsValid() { + return nil, fmt.Errorf("FileIO %T has no Bucket field", fio) + } + b, ok := field.Interface().(*blob.Bucket) + if !ok { + return nil, fmt.Errorf("FileIO %T Bucket field is not *blob.Bucket", fio) + } + return b, nil +} + +// flatDirEntry / flatFileInfo adapt a gocloud ListObject to fs.DirEntry so +// iceberg-go's scanFiles can read ModTime/Size for the orphan-age filter. +type flatDirEntry struct{ obj *blob.ListObject } + +func (e flatDirEntry) Name() string { return e.obj.Key } +func (e flatDirEntry) IsDir() bool { return false } +func (e flatDirEntry) Type() stdfs.FileMode { return 0 } +func (e flatDirEntry) Info() (stdfs.FileInfo, error) { return flatFileInfo(e), nil } + +type flatFileInfo struct{ obj *blob.ListObject } + +func (i flatFileInfo) Name() string { return i.obj.Key } +func (i flatFileInfo) Size() int64 { return i.obj.Size } +func (i flatFileInfo) Mode() stdfs.FileMode { return 0 } +func (i flatFileInfo) ModTime() time.Time { return i.obj.ModTime } +func (i flatFileInfo) IsDir() bool { return false } +func (i flatFileInfo) Sys() any { return nil } + +// withFlatWalk reconstructs tbl so DeleteOrphanFiles walks via flatWalkIO. +// Orphan cleanup only reads metadata and lists/deletes files — it never calls +// the catalog — so a nil CatalogIO is safe here. +func withFlatWalk(ctx context.Context, tbl *table.Table) (*table.Table, error) { + realIO, err := tbl.FS(ctx) + if err != nil { + return nil, err + } + fsF := func(context.Context) (iceio.IO, error) { return flatWalkIO{realIO}, nil } + return table.New(tbl.Identifier(), tbl.Metadata(), tbl.MetadataLocation(), fsF, nil), nil +} diff --git a/cmd/compactor/go.mod b/cmd/compactor/go.mod index 16f279e..709f102 100644 --- a/cmd/compactor/go.mod +++ b/cmd/compactor/go.mod @@ -5,6 +5,7 @@ go 1.26.4 require ( github.com/apache/iceberg-go v0.6.0 github.com/jackc/pgx/v5 v5.10.0 + gocloud.dev v0.45.0 gopkg.in/yaml.v3 v3.0.1 ) @@ -108,7 +109,6 @@ require ( go.opentelemetry.io/otel/sdk v1.43.0 // indirect go.opentelemetry.io/otel/sdk/metric v1.43.0 // indirect go.opentelemetry.io/otel/trace v1.43.0 // indirect - gocloud.dev v0.45.0 // indirect golang.org/x/crypto v0.50.0 // indirect golang.org/x/exp v0.0.0-20260218203240-3dfff04db8fa // indirect golang.org/x/net v0.53.0 // indirect diff --git a/cmd/compactor/main.go b/cmd/compactor/main.go index 0f64c04..49e7a4a 100644 --- a/cmd/compactor/main.go +++ b/cmd/compactor/main.go @@ -194,6 +194,14 @@ func doOrphans(ctx context.Context, cat *rest.Catalog, ns, tableName string, o r if err != nil { return err } + // Walk the table location with a flat object list: an object at exactly a + // directory path (e.g. .../data) otherwise collides with that prefix and + // iceberg-go's hierarchical walk dies with "readdir: not implemented" on + // object stores. See flatwalk.go. + tbl, err = withFlatWalk(ctx, tbl) + if err != nil { + return err + } if o.dryRun { n, derr := deleteOrphans(ctx, tbl, o.orphanAge, true) if derr != nil { diff --git a/docker/Dockerfile.duckdb15 b/docker/Dockerfile.duckdb15 index 6a5fbd9..9d092a7 100644 --- a/docker/Dockerfile.duckdb15 +++ b/docker/Dockerfile.duckdb15 @@ -1,6 +1,6 @@ # ColdFront image on the DuckDB 1.5.x stack — the THIN coldfront layer on top of # the prebuilt coldfront-duckdb-base. The base (docker/Dockerfile.duckdb15-base) -# carries the expensive, STABLE compiles — pg_duckdb 1.5.3 (PR #1025) and the +# carries the expensive, STABLE compiles — pg_duckdb 1.5.4 (PR #1025) and the # patched duckdb-iceberg extensions (iceberg/avro/azure/postgres_scanner, # v1.5-variegata + the bakery-aware commit-refresh patch). This build only # compiles the coldfront C extension (seconds), so CI and local builds are fast @@ -35,7 +35,7 @@ COPY extension/coldfront /build/coldfront RUN DESTDIR=/out make -C /build/coldfront install with_llvm=no # ─── app runtime: coldfront extension on top of the base ───────────────────────── -# pg_duckdb 1.5.3, the iceberg/avro/azure/postgres_scanner extensions, libcurl +# pg_duckdb 1.5.4, the iceberg/avro/azure/postgres_scanner extensions, libcurl # 8.12, COLDFRONT_DUCKDB_* env and /data are inherited from the base. This stage # adds coldfront's .so + SQL/control (distinct filenames — no overwrite of # pg_duckdb) AND refreshes the entrypoint from the CURRENT branch. diff --git a/docker/Dockerfile.duckdb15-base b/docker/Dockerfile.duckdb15-base index 8a9b901..9167dfa 100644 --- a/docker/Dockerfile.duckdb15-base +++ b/docker/Dockerfile.duckdb15-base @@ -1,5 +1,5 @@ # ColdFront DuckDB 1.5.x BASE image — the expensive, STABLE layer: libcurl 8.12 + -# pg_duckdb (DuckDB 1.5.3, PR #1025) + the patched duckdb-iceberg extensions +# pg_duckdb (DuckDB 1.5.4, PR #1025) + the patched duckdb-iceberg extensions # (iceberg/avro/azure/postgres_scanner, v1.5-variegata + the bakery-aware # commit-refresh patch). It does NOT contain the coldfront extension — that thin, # fast-changing layer is added by docker/Dockerfile.duckdb15 (the "app" build) @@ -36,7 +36,7 @@ ARG PGEDGE_TAG=${PG_MAJOR}-spock5-minimal # cross-build under emulation on an x86_64 host. ARG TARGETARCH=amd64 -# ─── build stage: libcurl 8.12 + pg_duckdb (DuckDB 1.5.3, PR #1025) ────────────── +# ─── build stage: libcurl 8.12 + pg_duckdb (DuckDB 1.5.4, PR #1025) ────────────── FROM ghcr.io/pgedge/pgedge-postgres:${PGEDGE_TAG} AS build ARG PG_MAJOR USER root @@ -51,7 +51,7 @@ RUN dnf install -y --setopt=install_weak_deps=False \ ENV PATH=/usr/pgsql-${PG_MAJOR}/bin:$PATH ENV PG_CONFIG=/usr/pgsql-${PG_MAJOR}/bin/pg_config -# libcurl 8.12.0 — built ONLY to satisfy DuckDB 1.5.3 httpfs's COMPILE-TIME +# libcurl 8.12.0 — built ONLY to satisfy DuckDB 1.5.4 httpfs's COMPILE-TIME # dependency: its CMake does find_package(CURL REQUIRED) and curl_client.cpp # references CURLSSLOPT_AUTO_CLIENT_CERT (curl >= 7.77); the base ships 7.76.1. # 8.12.0 fixes CVE-2025-0665 (the 8.11.1 threaded resolver double-closed an fd @@ -68,11 +68,12 @@ RUN wget -q https://curl.se/download/curl-8.12.0.tar.gz && tar xf curl-8.12.0.ta --without-libpsl --without-libssh2 --without-nghttp2 --without-brotli --without-zstd \ && make -j"$(nproc)" && make install && ldconfig -# pg_duckdb from PR #1025 (DuckDB 1.5.3). No released tag carries 1.5.x. +# pg_duckdb at the merged PR #1025 commit (DuckDB 1.5.4). No released pg_duckdb +# tag carries 1.5.x; this commit pins its duckdb submodule to the v1.5.4 tag. WORKDIR /build RUN git clone https://github.com/duckdb/pg_duckdb /build/pg_duckdb \ && cd /build/pg_duckdb \ - && git fetch origin pull/1025/head && git checkout FETCH_HEAD \ + && git checkout c04e6a2dcf4e999abb921da1ba2f8335dad644e0 \ && git submodule update --init --recursive # pgEdge propagates -fexcess-precision=standard into CXXFLAGS, which gcc rejects @@ -104,10 +105,16 @@ RUN dnf install -y --setopt=install_weak_deps=False \ flex bison \ && dnf clean all WORKDIR /build +# ICEBERG_REF is the extension code the four patches target (kept fixed so they +# apply unchanged). Its duckdb submodule tracks duckdb main, which drifts off the +# release, so after init we pin that submodule to the v1.5.4 TAG — the same engine +# pg_duckdb links — and build the extension against it. RUN git clone --filter=blob:none --no-checkout https://github.com/duckdb/duckdb-iceberg \ && git -C duckdb-iceberg fetch --depth 80 origin v1.5-variegata \ && git -C duckdb-iceberg checkout ${ICEBERG_REF} \ && git -C duckdb-iceberg submodule update --init --recursive --depth 1 --jobs 8 \ + && rm -rf duckdb-iceberg/duckdb \ + && git clone --depth 1 --branch v1.5.4 --recurse-submodules https://github.com/duckdb/duckdb duckdb-iceberg/duckdb \ && git clone https://github.com/microsoft/vcpkg /build/vcpkg \ && /build/vcpkg/bootstrap-vcpkg.sh -disableMetrics COPY docker/iceberg-azure-extension-config-v15.cmake /build/duckdb-iceberg/extension_config.cmake @@ -153,7 +160,7 @@ RUN cd /build/duckdb-iceberg && bash -c 'source /opt/rh/gcc-toolset-*/enable 2>/ esac; \ export VCPKG_TOOLCHAIN_PATH=/build/vcpkg/scripts/buildsystems/vcpkg.cmake VCPKG_ROOT=/build/vcpkg \ VCPKG_TARGET_TRIPLET=$VCPKG_TRIPLET VCPKG_HOST_TRIPLET=$VCPKG_TRIPLET USE_MERGED_VCPKG_MANIFEST=1 \ - EXT_CONFIG=/build/duckdb-iceberg/extension_config.cmake OVERRIDE_GIT_DESCRIBE=v1.5.3 \ + EXT_CONFIG=/build/duckdb-iceberg/extension_config.cmake OVERRIDE_GIT_DESCRIBE=v1.5.4 \ CMAKE_BUILD_PARALLEL_LEVEL=$(nproc); \ make -j$(nproc) release' @@ -166,10 +173,10 @@ RUN dnf install -y --setopt=install_weak_deps=False --allowerasing \ libcurl lz4 pgedge-postgresql${PG_MAJOR}-contrib openssl \ && dnf clean all -# pg_duckdb 1.5.3 + bundled libduckdb 1.5.3 + extension SQL/control (NO coldfront) +# pg_duckdb 1.5.4 + bundled libduckdb 1.5.4 + extension SQL/control (NO coldfront) COPY --from=build /out/usr/pgsql-${PG_MAJOR}/lib/ /usr/pgsql-${PG_MAJOR}/lib/ COPY --from=build /out/usr/pgsql-${PG_MAJOR}/share/extension/ /usr/pgsql-${PG_MAJOR}/share/extension/ -# libcurl 8.12 (DuckDB 1.5.3 httpfs); overwrite the base 7.76.1 so the newer SONAME +# libcurl 8.12 (DuckDB 1.5.4 httpfs); overwrite the base 7.76.1 so the newer SONAME # wins. Stage into a clean dir first so the .so.4.x.y glob can't collide with the # base's pre-existing libcurl.so.4.* (its minor version tracks curl's libtool # version-info, not the release), then install the resolved real file + symlinks. @@ -179,7 +186,7 @@ RUN realname=$(basename /tmp/curl/libcurl.so.4.*) \ && ln -sf "$realname" /usr/lib64/libcurl.so.4 && ln -sf libcurl.so.4 /usr/lib64/libcurl.so && ldconfig # iceberg + avro + azure (DuckDB 1.5.x); entrypoint places them under -# $PGDATA/pg_duckdb/extensions/v1.5.3// (azure cp is conditional). +# $PGDATA/pg_duckdb/extensions/v1.5.4// (azure cp is conditional). COPY --from=iceberg-builder /build/duckdb-iceberg/build/release/extension/iceberg/iceberg.duckdb_extension /opt/coldfront/iceberg/iceberg.duckdb_extension COPY --from=iceberg-builder /build/duckdb-iceberg/build/release/extension/avro/avro.duckdb_extension /opt/coldfront/iceberg/avro.duckdb_extension COPY --from=iceberg-builder /build/duckdb-iceberg/build/release/extension/azure/azure.duckdb_extension /opt/coldfront/iceberg/azure.duckdb_extension @@ -188,8 +195,8 @@ COPY --from=iceberg-builder /build/duckdb-iceberg/build/release/extension/azure/ COPY --from=iceberg-builder /build/duckdb-iceberg/build/release/extension/postgres_scanner/postgres_scanner.duckdb_extension /opt/coldfront/iceberg/postgres_scanner.duckdb_extension # DuckDB's platform string is linux_ with the SAME arch tokens buildx uses # (amd64/arm64), so the extension dir is just linux_${TARGETARCH}. The entrypoint -# places the iceberg/avro/azure extensions under .../v1.5.3/$COLDFRONT_DUCKDB_PLATFORM. -ENV COLDFRONT_DUCKDB_VERSION=v1.5.3 \ +# places the iceberg/avro/azure extensions under .../v1.5.4/$COLDFRONT_DUCKDB_PLATFORM. +ENV COLDFRONT_DUCKDB_VERSION=v1.5.4 \ COLDFRONT_DUCKDB_PLATFORM=linux_${TARGETARCH} COPY docker/entrypoint.sh /usr/local/bin/coldfront-entrypoint.sh @@ -201,7 +208,7 @@ ENV PG_MAJOR=${PG_MAJOR} LABEL org.opencontainers.image.title="coldfront-duckdb-base" \ org.opencontainers.image.vendor="pgEdge" \ org.opencontainers.image.source="https://github.com/pgEdge/ColdFront" \ - org.opencontainers.image.description="ColdFront DuckDB 1.5.3 base (pg_duckdb PR#1025 + patched duckdb-iceberg with the bakery-aware commit-refresh patch)." + org.opencontainers.image.description="ColdFront DuckDB 1.5.4 base (pg_duckdb PR#1025 + patched duckdb-iceberg with the bakery-aware commit-refresh patch)." USER postgres ENTRYPOINT ["/usr/local/bin/coldfront-entrypoint.sh"] diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh index a6a9d8b..cb4636f 100755 --- a/docker/entrypoint.sh +++ b/docker/entrypoint.sh @@ -133,7 +133,7 @@ EOF aarch64) _cf_default_platform=linux_arm64 ;; *) _cf_default_platform="linux_$(uname -m)" ;; esac - EXTDIR="$PGDATA/pg_duckdb/extensions/${COLDFRONT_DUCKDB_VERSION:-v1.5.3}/${COLDFRONT_DUCKDB_PLATFORM:-$_cf_default_platform}" + EXTDIR="$PGDATA/pg_duckdb/extensions/${COLDFRONT_DUCKDB_VERSION:-v1.5.4}/${COLDFRONT_DUCKDB_PLATFORM:-$_cf_default_platform}" mkdir -p "$EXTDIR" cp /opt/coldfront/iceberg/iceberg.duckdb_extension "$EXTDIR/iceberg.duckdb_extension" cp /opt/coldfront/iceberg/avro.duckdb_extension "$EXTDIR/avro.duckdb_extension" diff --git a/docker/iceberg-azure-extension-config-v15.cmake b/docker/iceberg-azure-extension-config-v15.cmake index 94c18c2..4783660 100644 --- a/docker/iceberg-azure-extension-config-v15.cmake +++ b/docker/iceberg-azure-extension-config-v15.cmake @@ -1,7 +1,7 @@ # extension_config.cmake for the DuckDB 1.5.x ColdFront build (Azure ADLS cold # tier). Builds iceberg + avro + azure against ONE DuckDB (iceberg's # v1.5-variegata submodule) so all three .duckdb_extension files share one ABI -# and load together in pg_duckdb 1.5.3. See DUCKDB_1.5_PATCHED.md for the why. +# and load together in pg_duckdb 1.5.4. See DUCKDB_1.5_PATCHED.md for the why. # # This config only SELECTS which extensions to build; the bakery-aware-commit- # refresh patch is applied separately by docker/Dockerfile.duckdb15-base @@ -23,7 +23,7 @@ duckdb_extension_load(postgres_scanner GIT_URL https://github.com/duckdb/duckdb-postgres # The 'postgres' extension (pglocal write path: DuckDB reads PG tables to # stream into Iceberg). Built here so it is SHIPPED in the image and never - # downloaded at runtime — extensions.duckdb.org has no reliably-cached v1.5.3 + # downloaded at runtime — extensions.duckdb.org has no reliably-cached v1.5.4 # build, and install_extension would block on the network. This is the EXACT # commit + submodule DuckDB 14eca11b pins for postgres_scanner # (duckdb/.github/config/extensions/postgres_scanner.cmake) — guaranteed diff --git a/docs/architecture.md b/docs/architecture.md index 8dc6650..c55c78a 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -109,7 +109,7 @@ The following table describes each component, its role, and its license: | Component | Role | License | |-----------|------|---------| | PostgreSQL 16+ | Heap storage; range partitioning for the tiered hot tier. Works uniformly on PG 16, 17, and 18 - the cold-tier secret is a DuckDB persistent secret loaded at instance init, with no version-gated mechanism. | PostgreSQL | -| pg_duckdb | DuckDB in-process. Iceberg read + write. Analytics. pg_duckdb 1.5.3 (PR #1025). The `duckdb-iceberg` carries the bakery-aware commit-refresh patch (async parquet overlap, no 409); see [Cold-write strategy](#cold-write-strategy-stock-vs-patched-duckdb-iceberg). | MIT | +| pg_duckdb | DuckDB in-process. Iceberg read + write. Analytics. pg_duckdb 1.5.4 (PR #1025). The `duckdb-iceberg` carries the bakery-aware commit-refresh patch (async parquet overlap, no 409); see [Cold-write strategy](#cold-write-strategy-stock-vs-patched-duckdb-iceberg). | MIT | | coldfront | PGXS C extension. `post_parse_analyze_hook` rewrites INSERT/UPDATE/DELETE on registered views to the correct tier; `ProcessUtility_hook` handles DDL; the hook lazily ATTACHes the Iceberg catalog on the first query touching a tiered view. | PostgreSQL | | Lakekeeper | Iceberg REST catalog. Single Rust binary. | Apache 2.0 | | S3-compatible store | Any: SeaweedFS, MinIO, GCS, Azure Blob, etc. | Varies | @@ -537,7 +537,7 @@ S3-compatible store. Both extensions must be in `_PG_init`, which fires at backend start. A two-layer image serves every deployment: a prebuilt **base** -(`docker/Dockerfile.duckdb15-base`) carrying pg_duckdb 1.5.3 (PR #1025) + +(`docker/Dockerfile.duckdb15-base`) carrying pg_duckdb 1.5.4 (PR #1025) + the patched duckdb-iceberg on a pgEdge `*-spock5-minimal` base (Spock + Snowflake), and a thin **app** layer (`docker/Dockerfile.duckdb15`, `--build-arg PG_MAJOR=16|17|18`) that compiles coldfront on top. The same diff --git a/docs/installation.md b/docs/installation.md index 2d3138e..429897e 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -6,7 +6,7 @@ > yourself, in Docker or bare-metal. ColdFront runs on a **DuckDB 1.5.x** stack: PostgreSQL + pg_duckdb -(DuckDB 1.5.3) and a **patched** duckdb-iceberg that carries ColdFront's +(DuckDB 1.5.4) and a **patched** duckdb-iceberg that carries ColdFront's four patches - the bakery-aware commit-refresh patch (the no-409 guarantee for concurrent cold-tier writers) and three strict-reader interop patches (so apache/iceberg-go, the cold-tier compactor, can read @@ -24,8 +24,8 @@ components: | Component | Source | |---|---| -| libcurl 8.12.0 | `curl.se`, built from source (compile-time dep of DuckDB 1.5.3 httpfs; needs curl >= 7.77, the pgEdge base ships 7.76.1) | -| pg_duckdb (DuckDB 1.5.3) | `github.com/duckdb/pg_duckdb`, PR #1025 | +| libcurl 8.12.0 | `curl.se`, built from source (compile-time dep of DuckDB 1.5.4 httpfs; needs curl >= 7.77, the pgEdge base ships 7.76.1) | +| pg_duckdb (DuckDB 1.5.4) | `github.com/duckdb/pg_duckdb`, PR #1025 | | duckdb-iceberg | `github.com/duckdb/duckdb-iceberg`, `v1.5-variegata` @ `0fad545a` | | vcpkg | `github.com/microsoft/vcpkg` | @@ -63,7 +63,7 @@ Build the stack in two stages, the prebuilt base and the thin app layer: git clone && cd coldfront # 1. Build the base (fetches the deps above, applies our patches, compiles -# pg_duckdb 1.5.3 + the patched duckdb-iceberg). ~30–60 min, +# pg_duckdb 1.5.4 + the patched duckdb-iceberg). ~30–60 min, # needs network + a few GB of disk/RAM. Repeat with =16 / =17 for those majors. docker build -f docker/Dockerfile.duckdb15-base --build-arg PG_MAJOR=18 \ -t ghcr.io/pgedge/coldfront-duckdb-base:pg18 . @@ -75,7 +75,7 @@ docker compose up -d --build # end-user single-node stack (ports published) ``` The split keeps app builds fast and always testing current source: the -expensive, stable compiles (pg_duckdb 1.5.3 + the patched duckdb-iceberg) +expensive, stable compiles (pg_duckdb 1.5.4 + the patched duckdb-iceberg) live in the prebuilt **base**, published to `ghcr.io/pgedge/coldfront-duckdb-base:pg{16,17,18}`; the **app** build ([`docker/Dockerfile.duckdb15`](https://github.com/pgEdge/ColdFront/blob/main/docker/Dockerfile.duckdb15)) just `FROM`s it and @@ -89,10 +89,9 @@ base-image.yml`) when its inputs change. Then follow [usage.md → One-time setup](usage.md#one-time-setup) (bootstrap Lakekeeper → create a table → tier → verify). -> **Pin pg_duckdb for reproducible builds.** The base pins pg_duckdb to -> `pull/1025/head` (a moving, unreleased PR ref). For reproducible -> builds, pin it to a specific commit SHA (or the eventual DuckDB-1.5.x -> release) instead of the live PR head. +> **pg_duckdb pin.** The base pins pg_duckdb to the merged PR #1025 commit +> `c04e6a2` (DuckDB 1.5.4 — its duckdb submodule is the v1.5.4 tag), a fixed +> commit for reproducible builds rather than a moving PR head. > > **Base foundation.** The base is > `FROM ghcr.io/pgedge/pgedge-postgres:-spock5-minimal`; you need pull @@ -176,7 +175,7 @@ cd extension/coldfront make && make install # needs pg_config + PG server dev headers on PATH ``` -You separately need pg_duckdb (DuckDB 1.5.3) and the **patched** iceberg +You separately need pg_duckdb (DuckDB 1.5.4) and the **patched** iceberg DuckDB extension installed in your PostgreSQL - follow the compile steps in [`docker/Dockerfile.duckdb15-base`](https://github.com/pgEdge/ColdFront/blob/main/docker/Dockerfile.duckdb15-base) - plus, in