fix: compose successive batch_slice_columns calls (#186)#223
Merged
Conversation
61f8367 to
8b7b394
Compare
batch_slice_columns memoized a single index set for the whole import, so a second call reused the first call's column positions against already-sliced rows — corrupting values instead of narrowing further (e.g. the name column received the last_name). Reset the working headers to the parsed source set at the start of each batch, so each call computes indices from the current headers and successive calls narrow the previous call's result. @source_headers holds the complete parsed header row; @headers is its per-batch working copy. Cover successive calls across one and multiple batches.
8b7b394 to
2fc2c03
Compare
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #186.
Problem
batch_slice_columnsmemoized a single@use_indexesfor the whole import (viaunless defined?(@use_indexes)). The memoization exists so the same slice is replayed for every batch, but it also means a second call reuses the first call's original column positions — applied to rows that the first call already sliced. As the reporter put it, this "ends up replacing the values unexpectedly."Reproduction (
Birthday,Name,Last namefixture):Before the fix the second call replayed indices
[1, 2]against the now 2‑column rows, sonamereceived thelast_namevalue —["Doe", nil, nil]instead of["John", nil, nil].Fix
Store one index set per call (a small array of slices) instead of a single
@use_indexes. Each entry is computed against the columns present at that point in the chain, memoized on the first batch, and replayed for later batches;batch_importresets the per-batch cursor so the pipeline restarts each batch. Successive calls now progressively reduce the column set, each operating on the previous call's result — exactly the behaviour the issue asks for.Single-call usage (including across multiple batches) is unchanged.
Tests
New
with successive batch_slice_columns callscontext covering one batch and multiple batches. It fails on the old code (["Doe", nil, nil]) and passes on the fix.Full suite green locally on Ruby 3.3 / Rails 8.1.3 / AA 3.5.1 / SQLite — 63 examples, 0 failures.