Skip to content

fix: compose successive batch_slice_columns calls (#186)#223

Merged
Fivell merged 1 commit into
masterfrom
fix/batch-slice-columns-successive-calls
Jun 15, 2026
Merged

fix: compose successive batch_slice_columns calls (#186)#223
Fivell merged 1 commit into
masterfrom
fix/batch-slice-columns-successive-calls

Conversation

@Fivell

@Fivell Fivell commented Jun 15, 2026

Copy link
Copy Markdown
Member

Closes #186.

Problem

batch_slice_columns memoized a single @use_indexes for the whole import (via unless defined?(@use_indexes)). The memoization exists so the same slice is replayed for every batch, but it also means a second call reuses the first call's original column positions — applied to rows that the first call already sliced. As the reporter put it, this "ends up replacing the values unexpectedly."

Reproduction (Birthday,Name,Last name fixture):

before_batch_import: ->(importer) {
  importer.batch_slice_columns(%w[name last_name])  # 3 cols -> 2
  importer.batch_slice_columns(%w[name])            # should be 2 -> 1
}

Before the fix the second call replayed indices [1, 2] against the now 2‑column rows, so name received the last_name value — ["Doe", nil, nil] instead of ["John", nil, nil].

Fix

Store one index set per call (a small array of slices) instead of a single @use_indexes. Each entry is computed against the columns present at that point in the chain, memoized on the first batch, and replayed for later batches; batch_import resets the per-batch cursor so the pipeline restarts each batch. Successive calls now progressively reduce the column set, each operating on the previous call's result — exactly the behaviour the issue asks for.

Single-call usage (including across multiple batches) is unchanged.

Tests

New with successive batch_slice_columns calls context covering one batch and multiple batches. It fails on the old code (["Doe", nil, nil]) and passes on the fix.

Note: that test uses validate: false to isolate the slicing from the model's validates_uniqueness_of :last_name — with two NULL last_names across separate batches the validator would reject the second author, which is unrelated to the slicing behaviour under test.

Full suite green locally on Ruby 3.3 / Rails 8.1.3 / AA 3.5.1 / SQLite — 63 examples, 0 failures.

@Fivell Fivell force-pushed the fix/batch-slice-columns-successive-calls branch from 61f8367 to 8b7b394 Compare June 15, 2026 11:01
batch_slice_columns memoized a single index set for the whole import, so a
second call reused the first call's column positions against already-sliced
rows — corrupting values instead of narrowing further (e.g. the name column
received the last_name).

Reset the working headers to the parsed source set at the start of each batch,
so each call computes indices from the current headers and successive calls
narrow the previous call's result. @source_headers holds the complete parsed
header row; @headers is its per-batch working copy. Cover successive calls
across one and multiple batches.
@Fivell Fivell force-pushed the fix/batch-slice-columns-successive-calls branch from 8b7b394 to 2fc2c03 Compare June 15, 2026 11:07
@Fivell Fivell merged commit bc5ad3d into master Jun 15, 2026
38 checks passed
@Fivell Fivell mentioned this pull request Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calling batch_slice_columns multiple times results in unexpected behavior

1 participant