Skip to content

[SYSTEMDS-3949] Add native Delta Lake matrix read/write via Delta Kernel#2511

Merged
Baunsgaard merged 3 commits into
apache:mainfrom
Baunsgaard:delta-matrix-io
Jun 28, 2026
Merged

[SYSTEMDS-3949] Add native Delta Lake matrix read/write via Delta Kernel#2511
Baunsgaard merged 3 commits into
apache:mainfrom
Baunsgaard:delta-matrix-io

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Introduce a DELTA file format that reads and writes Delta Lake tables natively through the Spark-free Delta Kernel library, for matrices on the single-node CP path. DML read/write with format="delta" now operates directly on Delta tables without a Spark DataFrame round-trip.

  • Add FileFormat.DELTA and exclude it from the text formats
  • Accept format="delta" with unknown dimensions in the parser (like CSV) and set blocksize -1 for the columnar format
  • Wire DELTA into the matrix reader and writer factories
  • Add DeltaKernelUtils plus ReaderDelta and WriterDelta with column-at-a-time, boxing-free data transfer
  • Refresh cached matrix metadata after a Delta read (discovered dimensions)
  • Pin parquet 1.13.1 and jackson core/annotations to 2.15.2 to align with Delta Kernel's transitive requirements

Append/overwrite table semantics, distributed execution, frames, and time travel are out of scope.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.86667% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.60%. Comparing base (384a8dc) to head (55f5e92).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
.../java/org/apache/sysds/runtime/io/WriterDelta.java 86.56% 4 Missing and 5 partials ⚠️
...g/apache/sysds/runtime/io/ReaderDeltaParallel.java 92.85% 1 Missing and 5 partials ⚠️
.../org/apache/sysds/runtime/io/DeltaKernelUtils.java 96.21% 0 Missing and 5 partials ⚠️
...n/java/org/apache/sysds/parser/DataExpression.java 83.33% 0 Missing and 1 partial ⚠️
...g/apache/sysds/runtime/io/MatrixReaderFactory.java 75.00% 0 Missing and 1 partial ⚠️
.../java/org/apache/sysds/runtime/io/ReaderDelta.java 98.61% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2511      +/-   ##
============================================
+ Coverage     71.56%   71.60%   +0.04%     
- Complexity    49110    49250     +140     
============================================
  Files          1575     1579       +4     
  Lines        189793   190286     +493     
  Branches      37235    37319      +84     
============================================
+ Hits         135816   136246     +430     
- Misses        43480    43536      +56     
- Partials      10497    10504       +7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Baunsgaard

Copy link
Copy Markdown
Contributor Author

Native Delta read performance vs binary reader

Benchmark (org.apache.sysds.performance DeltaIO), 32 vCPU, JDK 17, -Xmx16g,
20 timed repetitions after warmup, isolated (no competing load).

Workload: 5,000,000 × 16 dense FP64 matrix (~640 MB in memory, sparsity 1.0).

Read path Time (ms) Disk throughput
Delta (serial) 5411 ± 79 79 MB/s
Delta (parallel) 875 ± 19 490 MB/s
Binary (parallel) ~100–170 3.7–6.0 GB/s

Takeaways

  • The parallel reader decodes data files concurrently and is ~6.2× faster than the
    serial Delta reader
    (5411 ms → 875 ms).
  • The binary reader is still ~5–9× faster than parallel Delta. This is expected:
    binary is a near-raw double[] dump, while Delta Kernel 3.3.0 decodes parquet
    per-value (row-by-row converters, not vectorized). File-level parallelism is the
    main lever available, which is what this reader exploits.
  • On disk Delta is smaller: 450 MB vs 645 MB for binary (0.70×), thanks to parquet
    encoding — so the read-time cost buys a ~30% smaller footprint.

For reference, the write path is encode-bound: Delta write ~7.0 s vs binary ~0.15 s
(parquet encoding dominates); not the focus of this PR but noted for completeness.

Introduce a DELTA file format that reads and writes Delta Lake tables
natively through the Spark-free Delta Kernel library, for matrices on the
single-node CP path. DML read/write with format="delta" now operates
directly on Delta tables without a Spark DataFrame round-trip.

- Add FileFormat.DELTA and exclude it from the text formats
- Accept format="delta" with unknown dimensions in the parser and set
  blocksize -1 for the columnar format
- Wire DELTA into the matrix reader and writer factories
- Add DeltaKernelUtils plus serial and parallel native Delta readers and
  WriterDelta with column-at-a-time, boxing-free data transfer
- Expose Delta reader batch size and writer target file size via DMLConfig
- Refresh cached matrix metadata after a Delta read (discovered dimensions)
- Add a parquet.version property and pin delta-kernel 3.3.2
- Run Delta component IO tests in CI and broaden matrix coverage

Append/overwrite table semantics, distributed execution, frames, and time
travel are out of scope.
- Bound each per-file decode task in the direct parallel read path to its
  numRecords-derived row slice, so a Delta file that decodes more rows than
  its statistic claims fails with a clear error instead of overflowing into
  the next file's region (concurrent overlapping writes) or off the array.
- Use the parallel recomputeNonZeros(_numThreads) in the buffered read path
  to match the direct path; the buffered fallback handles the largest matrices.
- Add a test-only dependency on Delta's Spark connector (delta-spark) and a
  cross-engine interop suite: SystemDS-written tables are read back by the
  reference Delta/Spark engine, and Spark-written tables (multi-file, plus a
  deletion-vector + second-commit layout the SystemDS writer never emits) are
  read back by both the serial and parallel SystemDS readers. All comparisons
  are keyed by an id column since neither engine guarantees row order.
- Add targeted coverage for the Delta read/write defensive branches the
  round-trip tests miss: unresolvable table paths, unsupported column types,
  unsupported stream operations, malformed/absent per-file statistics (mocked
  scan rows), the non-contiguous dense fill fallback, a failed parallel
  per-file decode, and the sparse-format writer input path.
@Baunsgaard Baunsgaard merged commit e53b5c0 into apache:main Jun 28, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant