Skip to content

Drop vestigial sort_value column from website_publication_keywords (schema drift) #1363

Description

@jonfroehlich

Summary

website_publication_keywords (the Publication↔Keyword M2M through table) carries a leftover sort_value column — a relic from when keywords was a SortedManyToManyField. It's now a plain ManyToManyField in the model, so the column is vestigial and should be dropped to make the schema match the model across all environments.

Surfaced while fixing the keyword-merge action (#1352 / PR #1362). The merge itself is already fixed (it now repoints existing rows instead of inserting), so this issue is the deeper cleanup, not urgent — but it also fixes a separate latent bug (see below).

Evidence it's a legacy artifact (from the prod dump, 2026-06-14)

  • Of the six website_*_keywords through tables, only website_publication_keywords has a sort_value column. The other five (grant/poster/project/projectumbrella/talk) have none — confirming a plain M2M never creates it.
  • In that table's data: 986 of 1,215 rows have sequential sort_values (1..N per publication) — the SortedManyToManyField signature — and 229 are NULL (newer plain .add()s).
  • The genuinely-sorted tables (authors, project_umbrellas, award recipients) all show sort_value integer NOT NULL, as expected.

Environment drift (important)

The two environments disagree on this one column's constraint (independent gitignored per-env migration histories):

  • prod: sort_value is nullable → a plain .add() inserting NULL succeeds.
  • -test: sort_value is NOT NULL → a plain .add() 500s with null value in column "sort_value" ... violates not-null constraint.

Latent bug this fixes

On -test, any INSERT into website_publication_keywords fails — that includes editing a Publication's keywords in the admin and saving, not just the merge action. Masked on prod only because prod's column is nullable. Dropping the column fixes this everywhere.

  • Confirm the latent bug: edit a Publication on -test, change a keyword, save → expect a 500 today.

Proposed approach

Django won't auto-generate the drop (the model is already a plain M2M with no sort_value), so it needs an explicit, idempotent migration:

  • A migrations.RunSQL (or schema_editor) op that drops the column only if it exists, e.g. ALTER TABLE website_publication_keywords DROP COLUMN IF EXISTS sort_value; — safe on the five envs/tables that never had it and on prod (nullable) and -test (NOT NULL) alike.
  • Wrap with a no-op reverse.
  • Mind the deploy constraints: migrations run inside the container on each deploy via docker-entrypoint.sh, and there's no shell on prod to fix a half-applied migration — so the op must be idempotent and not fail on any current env state. DROP COLUMN IF EXISTS satisfies this.
  • Verify locally first (the model-built settings_test DB won't have the column, so also test against a DB that does — add the column, run the migration, confirm clean).

Acceptance

  • Idempotent migration drops sort_value from website_publication_keywords where present
  • No-op / safe on tables and environments that never had the column
  • Editing a Publication's keywords in the admin works on -test after deploy
  • Keyword-merge (Fix keyword-merge 500 on legacy sort_value column (#1352) #1362) still works (it's schema-agnostic, but re-verify)

Context: PR #1362, #1352. See repipoint fix rationale in PR #1362's discussion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions