Skip to content

Harden backfill_original_filenames against per-row failures (#1391)#1402

Merged
jonfroehlich merged 1 commit into
masterfrom
1391-backfill-resilience
Jun 25, 2026
Merged

Harden backfill_original_filenames against per-row failures (#1391)#1402
jonfroehlich merged 1 commit into
masterfrom
1391-backfill-resilience

Conversation

@jonfroehlich

Copy link
Copy Markdown
Member

Follow-up hardening for #1391 (no observed prod failure — defensive insurance).

Why

backfill_original_filenames runs on every container start over the whole artifact dataset, but had no per-row error isolation. A single malformed row would propagate its exception out of the command and abort the entire run — and because docker-entrypoint.sh has no set -e, that fails silently (the traceback prints, but startup continues to runserver), leaving every row's original_*_filename unset. The most likely trigger is a null date or title, which makes Artifact.generate_filename() raise (date.year / title.title()).

To be clear: the 2.25.0 prod backfill worked — legacy rows (e.g. the Mauriello talk) are populated correctly. This just makes a deploy-time batch job robust against future bad data.

Changes

  • Each (artifact, file field) pair is processed inside its own try/except; on error it logs (with _logger.exception) and counts the row, then continues. Row logic extracted into _backfill_row().
  • Summary log line now reports an errored-row count.
  • Regression test (test_one_bad_row_does_not_abort_the_batch): a row with a null date (so generate_filename raises) no longer aborts the run, and a good legacy row in the same batch is still backfilled.

Tests

Full suite green locally: 552 tests, OK (8 skipped).

Note

The underlying fragility is in get_filename_without_ext_for_artifact (no None-guard on date/title), which the rename path also calls. Hardening that shared helper is better scoped to #1401 (the legacy re-rename work); this PR isolates the batch backfill instead.

🤖 Generated with Claude Code

The backfill runs on every container start over the whole artifact dataset.
It had no per-row error isolation, so a single malformed row would propagate
its exception out of the command and abort the entire run — and since
docker-entrypoint.sh has no `set -e`, that would fail silently (traceback
prints, startup continues to runserver), leaving every row's original_*_filename
unset. The most likely trigger is a null `date` or `title`, which makes
Artifact.generate_filename() raise (date.year / title.title()).

This is defensive insurance, not a fix for an observed failure — the 2.25.0
prod backfill did populate the legacy rows correctly.

- Process each (artifact, file field) pair inside its own try/except; log and
  count the row, then continue. Extract the row logic into _backfill_row().
- Surface an errored-row count in the summary log line.
- Regression test: a row with a null date (generate_filename raises) no longer
  aborts the batch; a good legacy row in the same run is still backfilled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jonfroehlich jonfroehlich merged commit 15be9d1 into master Jun 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant