Skip to content

Production: OC sync (#272) + #277/#283 fixes → isamples_202608#284

Merged
rdhyee merged 1 commit into
isamplesorg:mainfrom
rdhyee:promote/oc-sync-202608
Jun 14, 2026
Merged

Production: OC sync (#272) + #277/#283 fixes → isamples_202608#284
rdhyee merged 1 commit into
isamplesorg:mainfrom
rdhyee:promote/oc-sync-202608

Conversation

@rdhyee

@rdhyee rdhyee commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Production cutover: OC sync (#272) + search/facet fixes (#277/#283) → isamples_202608

Points the Interactive Explorer at the 202608 data build and adds the reproducible pipeline that produced it. The data files are already on R2 (data.isamples.org, uniquely named) — this is the explorer cutover + provenance.

Verified superior to 202606 with zero regressions

OpenContext synced to @ekansa's current export: +67,187 new (incl. Tall al-ʿUmayri Jordan lithics) / −21,227 stale Murlo re-IDs (his Option B). Count 1,110,791 = his wide; material/1.0/rock 37,953 = his number
SESAR / GEOME / Smithsonian byte-identical to 202606
Only data removed exactly the 21,227 intended Murlo PIDs; only facet dropped is the nonsense blank one

Fixes

Rigor (8 build rounds, 5 Codex reviews, independent verification each round)

  • Fixpoint orphan removal · in-script dangling-ref gate (all 12 reference columns) · hard silent-drop guard · keyword concept minting
  • --wide semantic trust gate restored and shares its description expression with the builder (structurally can't drift)
  • 70/70 pipeline tests pass; staging browser-verified on rdhyee.github.io

Scope note

"pottery Cyprus" returns per-sample concept matches (1,305), not OC.org's collection-level full-text (~14K) — folding parent-project text in is the next refinement.

🤖 Generated with Claude Code

…mplesorg#277/isamplesorg#283) -> isamples_202608

Cuts the explorer over from 202606 to the 202608 build and adds the
reproducible pipeline that produced it.

Data (verified, no regressions vs 202606):
- OC synced to Eric's current export: +67,187 new records (incl. Tall
  al-Umayri Jordan lithics) / -21,227 stale Murlo re-IDs (his Option B).
  OC count 1,110,791 (= his wide); material/1.0/rock 37,953 (his number).
- Other sources (SESAR/GEOME/Smithsonian) byte-identical.

Fixes:
- isamplesorg#277: OC site-path descriptions restored (Cyprus search 0 -> 69,230);
  concept labels now searchable (pottery Cyprus 0 -> 1,305).
- #283a: blank Sampled-Feature facet removed. #283b: specimentype/1.0 labels.
- isamplesorg#260/isamplesorg#265: OC material/object-type corrected via the synced concepts.

Pipeline hardening (8 build rounds, 5 Codex reviews):
- ingest_oc_records.py: fixpoint orphan removal; in-script dangling-ref gate
  over all 12 reference columns; hard silent-drop guard; keyword minting.
- validate_frontend_derived.py: --wide semantic gate shares the facets
  description expression with the builder (cannot drift).
- 70/70 pipeline tests pass; --wide trust gate green.

Data files already published to R2 (data.isamples.org), uniquely named;
this is the explorer cutover. vocab_labels_202608 keeps prod vocab_labels
untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@rdhyee rdhyee merged commit d26fabb into isamplesorg:main Jun 14, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant