feat(ilamb): use obs4REF reference data for standard diagnostics#779
feat(ilamb): use obs4REF reference data for standard diagnostics#779lewisjared wants to merge 10 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR updates obs4REF handling end to end: parsing and validation now accept the newer reference keys, ILAMB configuration and execution now resolve obs4REF-backed sources, and the affected regression fixtures, manifests, and catalogues were refreshed for the migrated diagnostics. Changesobs4REF reference dataset migration
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Point ten ILAMB standard diagnostics at their obs4REF reference files instead of the legacy ilamb/iomb registry paths: gpp-WECANN, gpp-FLUXNET2015, mrro-LORA, cSoil-HWSD2, nbp-Hoffman, snc-ESACCI and burntFractionAll-GFED on land, plus thetao/so-WOA2023-surface and amoc-RAPID in the ocean suite. burntFractionAll-GFED now reads the CMIP burntFractionAll variable directly rather than the legacy burntArea alternate, and amoc-RAPID derives the AMOC index from the RAPID-2023-1a msftmz field through the existing msftmz_to_rapid transform. Bumps ILAMBStandard.version to 3: the reference change alters diagnostic outputs, so the ILAMB regression baselines must be re-minted.
…er diagnostic Replace the bare obs4REF registry paths with the ingested obs4MIPs data requirement form, pinning each reference to a single file by source_id + variable_id + grid_label + version. This routes the reference through the obs4MIPs catalogue as a first-class dataset rather than a raw registry key. Loosen the obs4REF key parser to accept monthly-climatology (monC) frequencies and single-year fixed-field (fx) time ranges so WOA-23 and HWSD-2-0 files resolve. Give each rewired diagnostic its own version (3) via a per-diagnostic ``version`` config key, so changing one diagnostic's reference data no longer forces every ILAMB baseline to be re-minted; the untouched diagnostics stay at version 2. The esgf-catalog test fixture and solve-regression snapshots are regenerated to include the newly ingested obs4REF reference datasets.
obs4REF is the REF-specific observational product and follows the same metadata conventions as obs4MIPs, but its files carry ``activity_id = "obs4REF"``. The obs4MIPs adapter rejected them, so obs4REF references could not be ingested and any diagnostic that requests an obs4REF dataset as an obs4MIPs source solved to nothing. Accept ``activity_id`` of either ``obs4MIPs`` or ``obs4REF`` so these references ingest as first-class datasets.
…urce_id The FLUXNET2015 obs4REF file spells its source_id three different ways across the registry path (FLUXNET2015-1-0), the filename (Fluxnet-2015-1-0) and the internal attribute (Fluxnet-2015), so the ingested facet form cannot match both the fetch and solve stages. Reference it by exact registry key until the upstream metadata is reconciled.
…egistry data ILAMBStandard.execute() replaced the reference dataframe with only the ingested obs4MIPs datasets, dropping the ILAMB registry collection. A diagnostic combining an obs4REF primary source with legacy-path relationships (e.g. gpp-WECANN's pr/tas) could no longer resolve those relationship keys. Concatenate the ILAMB and obs4REF registry collections alongside the ingested obs4MIPs datasets so those keys stay resolvable. Re-mint obs4REF baselines for the affected land and ocean diagnostics: burntfractionall-gfed, csoil-hwsd2, mrro-lora, nbp-hoffman, snc-esacci, so-woa2023-surface and thetao-woa2023-surface (cmip6 + cmip7), plus cmip6 for gpp-wecann and gpp-fluxnet2015. The cmip7 baselines for gpp-wecann and gpp-fluxnet2015 remain outstanding pending an upstream ilamb3 fix for a ZeroDivisionError on the CMIP7 path; amoc-rapid likewise stays failing.
21265c3 to
ca7986d
Compare
…apshots Bump test_case_version for the re-minted ILAMB baselines so the coupling gate authorises the changed bundles, and refresh the solver regression snapshots to match the obs4REF-referenced data requirements after rebasing onto main.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: fa3ee72e-2fea-4081-8ca9-7e4d81b7c1ca
📒 Files selected for processing (25)
packages/climate-ref-ilamb/tests/test-data/burntfractionall-gfed/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/burntfractionall-gfed/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/csoil-hwsd2/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/csoil-hwsd2/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/mrro-lora/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/mrro-lora/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/nbp-hoffman/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/nbp-hoffman/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/snc-esacci/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/snc-esacci/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/so-woa2023-surface/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/so-woa2023-surface/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/thetao-woa2023-surface/cmip6/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/thetao-woa2023-surface/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_amoc_rapid_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_burntfractionall_gfed_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_csoil_hwsd2_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_gpp_wecann_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_mrro_lora_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_nbp_hoffman_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_snc_esacci_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_so_woa2023_surface_.ymlpackages/climate-ref-ilamb/tests/unit/test_solve_regression/test_solve_regression_thetao_woa2023_surface_.yml
✅ Files skipped from review due to trivial changes (10)
- packages/climate-ref-ilamb/tests/test-data/nbp-hoffman/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/burntfractionall-gfed/cmip6/manifest.json
- packages/climate-ref-ilamb/tests/test-data/snc-esacci/cmip6/manifest.json
- packages/climate-ref-ilamb/tests/test-data/csoil-hwsd2/cmip6/manifest.json
- packages/climate-ref-ilamb/tests/test-data/so-woa2023-surface/cmip6/manifest.json
- packages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip6/manifest.json
- packages/climate-ref-ilamb/tests/test-data/thetao-woa2023-surface/cmip6/manifest.json
- packages/climate-ref-ilamb/tests/test-data/snc-esacci/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/thetao-woa2023-surface/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/csoil-hwsd2/cmip7/manifest.json
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/climate-ref-ilamb/tests/test-data/mrro-lora/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/burntfractionall-gfed/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/so-woa2023-surface/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/mrro-lora/cmip6/manifest.json
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
The obs4REF diagnostics require an obs4MIPs reference dataset to solve, but
the solve-regression catalog fixture contained none of them, so those
diagnostics matched nothing and their snapshots regenerated to `{}` -
passing while asserting no solver output.
Append the obs4REF reference datasets to obs4mips_catalog.parquet (221 -> 231
rows, existing rows untouched) and regenerate the affected snapshots so each
diagnostic again asserts its matched CMIP groups. Only emp-GLEAMGPCP2.3
remains empty, matching its GLEAM reference that is intentionally absent from
the fixture.
An empty snapshot passes while asserting no solver output, silently hiding a diagnostic that matches nothing because its reference data is missing from the solve catalog fixture. Assert each diagnostic solves to at least one execution group unless it is listed in KNOWN_NO_MATCH (currently only emp-GLEAMGPCP2.3, whose GLEAM reference is intentionally absent).
The converter applied least_significant_digit=3 (~1e-3 absolute precision) to every variable. gpp magnitudes (~1e-8 kg m-2 s-1) fall five orders of magnitude below that floor, so the entire converted field rounded to zero. A spatially uniform (zero-variance) field made ilamb3's Taylor spatial-distribution score divide by 1/norm_std == 0, so gpp-wecann and gpp-fluxnet2015 failed on the CMIP7 path. Skip the decimal-place quantization for gpp; other variables are unaffected and their converted output is byte-identical. Adds the now-passing CMIP7 obs4REF baselines for gpp-wecann and gpp-fluxnet2015.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py (1)
99-109: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winHardcoded
gppexception is fragile for future variables.The fix correctly resolves the immediate Taylor-score regression, but hardcoding a single variable name means any other future variable with similarly tiny magnitudes (e.g. other carbon/nitrogen fluxes) would silently be rounded to zero again, and only be caught if a downstream score visibly breaks.
Consider a more general safeguard, e.g. skip
least_significant_digitwhen the variable's data magnitude is below some threshold relative to the requested precision, or maintain an explicit, documented allow-list of "lossless" variables that's easy to extend.♻️ Example: magnitude-based safeguard
encoding: dict[str, dict[str, Any]] = {} for var in ds_cmip7.data_vars: var_encoding: dict[str, Any] = {"zlib": True, "complevel": 5} - if str(var) != "gpp": + data = ds_cmip7[var].values + max_abs = float(abs(data).max()) if data.size else 0.0 + # Skip quantization if it would round the entire field to zero. + if max_abs == 0.0 or max_abs >= 10 ** -3: var_encoding["least_significant_digit"] = 3 encoding[str(var)] = var_encoding
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 57f131fc-c430-4580-8f1a-0e2c5a69e879
📒 Files selected for processing (11)
packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.pypackages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/catalog.yamlpackages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/regression/diagnostic.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/regression/output.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/regression/series.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/catalog.yamlpackages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/manifest.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/regression/diagnostic.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/regression/output.jsonpackages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/regression/series.json
✅ Files skipped from review due to trivial changes (6)
- packages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/regression/diagnostic.json
- packages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/manifest.json
- packages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/regression/output.json
- packages/climate-ref-ilamb/tests/test-data/gpp-fluxnet2015/cmip7/regression/series.json
- packages/climate-ref-ilamb/tests/test-data/gpp-wecann/cmip7/regression/series.json
Summary
Points ten ILAMB standard diagnostics at their newly published obs4REF reference files instead of the legacy
ilamb/iombregistry paths. Stacked on #778, which registers and uploads the underlying obs4REF data — merge that first.Diagnostics rewired
Land (
ilamb.yaml):gpp-WECANN(WECANN-1-0 v20250902),gpp-FLUXNET2015(Fluxnet-2015-1-0),mrro-LORA(LORA-1-0),cSoil-HWSD2(HWSD-2-0),nbp-Hoffman(Hoffman-1-0),snc-ESACCI(CCI-CryoClim-FSC-1),burntFractionAll-GFED(GFED-5-0). Ocean (iomb.yaml):thetao-WOA2023-surfaceandso-WOA2023-surface(WOA-23),amoc-RAPID(RAPID-2023-1a).The bare-path form is used throughout because the obs4REF key parser (
_parse_obs4ref_key) cannot yet parse themonCfrequency (WOA-23 climatologies) or the single-yearfxtime range (HWSDcSoil), so the ingestedobs_source: obs4refdict form is not an option for those. Bare paths resolve directly against the concatenated obs4REF registry at execute time, matching the existing GPCP reference inemp-GLEAMGPCP2.3.Two edits that are more than a path swap
burntFractionAll-GFEDnow reads the CMIPburntFractionAllvariable directly and drops theburntAreaalternate, because the obs4REF file carriesburntFractionAll.amoc-RAPIDswitches its reference to the raw RAPID-2023-1amsftmzfield and relies on the existingmsftmz_to_rapidtransform to derive the AMOC index, replacing the pre-derived legacyamocfile.Versioning and baselines
ILAMBStandard.versionis bumped 2 -> 3. This attribute is shared by every ILAMB diagnostic, so all ILAMB regression baselines (not only the ten rewired here) must be re-minted before the coupling gate passes.Validation done
All obs4REF source keys resolve against the registry, all diagnostics construct, the ILAMB unit suite passes (28 passed), and ruff/mypy are clean. Not yet validated: an end-to-end run against the real reference data.
amoc-RAPIDin particular is unverified — it carries both the transform-semantics question above and the known upstream ilamb3trim_timecrash, and needs a real run to confirm. WOA-23 is a 2005-2022 monthly climatology replacing a 2005-2014 monthly field, which interacts with the surface-variable time-coverage handling inexecute().Summary by CodeRabbit
New Features
Bug Fixes
Chores / Configuration
Tests