Skip to content

Re-standardize legacy talk/poster/pub filenames that were never renamed #1401

Description

@jonfroehlich

Background

Investigating the 2026-06-14 prod dump while working on #1391 surfaced that many artifact files on production were never renamed to the standardized Author_TitleInTitleCase_VenueYear scheme. The non-standard names are the original uploads (artifact title with spaces→underscores via Django's get_valid_filename, plus Django's random _xxxxxxx collision suffix).

Approximate counts (2026-06-14 dump):

  • Talks (187): pdf ~76 non-standard, raw ~148/163 non-standard
  • Publications (227): pdf ~12 non-standard (no raw files)
  • Posters (9): all standardized

Why they were never renamed (two compounding causes):

  1. The rename only runs in Artifact.save() via the authors_changed m2m signal, which fires on admin add/edit. Bulk-imported rows never went through an authored save(), so the rename never triggered.
  2. The fix-up one-shots rename_talk_files / rename_person_images are commented out of docker-entrypoint.sh (~lines 167–174), and rename_poster_files was never wired in at all. Evidence of a partial historical run: the same talk can have its pdf renamed (Froehlich_…) but its raw_file not.

Prerequisite — must land first

#1391 (PR #1400) captures the original upload filename into original_pdf_filename / original_raw_filename and backfills the never-renamed rows on container start. This issue is blocked on that deploying + backfilling on prod, because re-standardizing a file destroys its original on-disk name — we want the provenance recorded first.

Proposed work

  1. After Store original uploaded filename and show it (admin-only) for talks/posters/publications #1391 has deployed and backfilled prod, write/repair a one-shot management command (or fix + re-enable rename_talk_files / rename_poster_files) that renames any artifact whose file basename ≠ Artifact.generate_filename() to the standardized name, on disk and in the DB together (the existing rename_artifact_in_db_and_filesystem / Artifact.save() path keeps them in sync — verify this leg for raw_file specifically, since raw is overwhelmingly the unrenamed case).
  2. Confirm the raw_file rename leg actually works end-to-end (the dump suggests it historically did not).
  3. Run on test first, verify via logs, then prod via the entrypoint one-shot pattern (no shell access to prod).
  4. Mind serve_pdf's fuzzy filename matching so any external links to the old names still resolve after rename.

Risk

Medium — this rewrites file names on disk + DB on prod. Must be idempotent, logged, and verified on -test before prod. The #1391 capture is the safety net (original names are preserved regardless).

Spun out of #1391.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions