You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigating the 2026-06-14 prod dump while working on #1391 surfaced that many artifact files on production were never renamed to the standardized Author_TitleInTitleCase_VenueYear scheme. The non-standard names are the original uploads (artifact title with spaces→underscores via Django's get_valid_filename, plus Django's random _xxxxxxx collision suffix).
Approximate counts (2026-06-14 dump):
Talks (187): pdf ~76 non-standard, raw ~148/163 non-standard
Publications (227): pdf ~12 non-standard (no raw files)
Posters (9): all standardized
Why they were never renamed (two compounding causes):
The rename only runs in Artifact.save() via the authors_changed m2m signal, which fires on admin add/edit. Bulk-imported rows never went through an authored save(), so the rename never triggered.
The fix-up one-shots rename_talk_files / rename_person_images are commented out of docker-entrypoint.sh (~lines 167–174), and rename_poster_files was never wired in at all. Evidence of a partial historical run: the same talk can have its pdf renamed (Froehlich_…) but its raw_file not.
Prerequisite — must land first
#1391 (PR #1400) captures the original upload filename into original_pdf_filename / original_raw_filename and backfills the never-renamed rows on container start. This issue is blocked on that deploying + backfilling on prod, because re-standardizing a file destroys its original on-disk name — we want the provenance recorded first.
Proposed work
After Store original uploaded filename and show it (admin-only) for talks/posters/publications #1391 has deployed and backfilled prod, write/repair a one-shot management command (or fix + re-enable rename_talk_files / rename_poster_files) that renames any artifact whose file basename ≠ Artifact.generate_filename() to the standardized name, on disk and in the DB together (the existing rename_artifact_in_db_and_filesystem / Artifact.save() path keeps them in sync — verify this leg for raw_file specifically, since raw is overwhelmingly the unrenamed case).
Confirm the raw_file rename leg actually works end-to-end (the dump suggests it historically did not).
Run on test first, verify via logs, then prod via the entrypoint one-shot pattern (no shell access to prod).
Mind serve_pdf's fuzzy filename matching so any external links to the old names still resolve after rename.
Risk
Medium — this rewrites file names on disk + DB on prod. Must be idempotent, logged, and verified on -test before prod. The #1391 capture is the safety net (original names are preserved regardless).
Background
Investigating the 2026-06-14 prod dump while working on #1391 surfaced that many artifact files on production were never renamed to the standardized
Author_TitleInTitleCase_VenueYearscheme. The non-standard names are the original uploads (artifact title with spaces→underscores via Django'sget_valid_filename, plus Django's random_xxxxxxxcollision suffix).Approximate counts (2026-06-14 dump):
Why they were never renamed (two compounding causes):
Artifact.save()via theauthors_changedm2m signal, which fires on admin add/edit. Bulk-imported rows never went through an authoredsave(), so the rename never triggered.rename_talk_files/rename_person_imagesare commented out ofdocker-entrypoint.sh(~lines 167–174), andrename_poster_fileswas never wired in at all. Evidence of a partial historical run: the same talk can have its pdf renamed (Froehlich_…) but its raw_file not.Prerequisite — must land first
#1391 (PR #1400) captures the original upload filename into
original_pdf_filename/original_raw_filenameand backfills the never-renamed rows on container start. This issue is blocked on that deploying + backfilling on prod, because re-standardizing a file destroys its original on-disk name — we want the provenance recorded first.Proposed work
rename_talk_files/rename_poster_files) that renames any artifact whose file basename ≠Artifact.generate_filename()to the standardized name, on disk and in the DB together (the existingrename_artifact_in_db_and_filesystem/Artifact.save()path keeps them in sync — verify this leg for raw_file specifically, since raw is overwhelmingly the unrenamed case).serve_pdf's fuzzy filename matching so any external links to the old names still resolve after rename.Risk
Medium — this rewrites file names on disk + DB on prod. Must be idempotent, logged, and verified on
-testbefore prod. The #1391 capture is the safety net (original names are preserved regardless).Spun out of #1391.