security(dci): restrict Data API pull to non-sensitive provider variables#257
security(dci): restrict Data API pull to non-sensitive provider variables#257gonzalesedwin1123 wants to merge 2 commits into
Conversation
…bles The Data API GET /Data/pull required only the generic data:read scope, read spp.data.value with sudo (bypassing the #234 ORM ACL), and matched any variable_name with no provider/variable check (unlike push/invalidate). Once eligibility precompute/refresh cached DCI-backed values, a data:read client could pull sensitive CRVS facts (r.dci.crvs.is_alive, birth_verified) and disability metrics for any subject. Scoring/PMT values were exposed the same way, and /variables enumerated all of them. Add a positive, default-deny allowlist on the resolved variable: - spp.cel.variable.is_data_api_pullable() — true only for ordinary external-provider variables; computed/scoring/aggregate are not. - spp_dci_indicators overrides it to also exclude DCI-backed providers (inter-registry data has its own consent/provider boundary). - /pull resolves the requested name by cel_accessor (the cache key), denies with a uniform 403 if it resolves to nothing (fail-closed, covers orphan rows) or if any match is not pullable; also adds the missing company_id filter (cross-company read under sudo). - /variables lists only pullable external-provider variables. The internal ORM/RPC read by base.group_user was already closed by #234; this covers the FastAPI sudo path it did not. Sibling oracles (Simulation metric() expressions; subject-existence via sudo resolve) are tracked separately. Tests: base rule (spp_cel_domain), DCI override (spp_dci_indicators), endpoint behavior incl. unknown/non-pullable -> 403 and company isolation (spp_api_v2_data). spp_api_v2_data 57/57, spp_cel_domain 620/620, spp_dci_indicators 82/82.
There was a problem hiding this comment.
Code Review
This pull request restricts the generic external Data API to only expose ordinary external-provider variables, preventing computed, scoring, and DCI-backed variables from being retrieved. It also adds company isolation to the pull endpoint and includes comprehensive tests. The review feedback highlights a critical issue in the list_variables endpoint where Python-side filtering of variables causes a mismatch with database-side total counts and pagination. To resolve this, the reviewer suggests implementing a model method _get_data_api_pullable_domain to perform the filtering directly at the database level.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 19.0 #257 +/- ##
==========================================
+ Coverage 74.75% 74.80% +0.05%
==========================================
Files 1090 1092 +2
Lines 64813 64836 +23
==========================================
+ Hits 48453 48503 +50
+ Misses 16360 16333 -27
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Code review (Gemini) + Codecov follow-ups on the pull/list allowlist: - list_variables: filter at the DB level via a new model method spp.cel.variable._get_data_api_pullable_domain() (overridden in spp_dci_indicators to also exclude DCI-backed providers), instead of a Python post-filter. This fixes the total count and cursor pagination, which were computed on the unfiltered domain. - Real, executing endpoint tests: the existing Data API endpoint tests are `async def` on a plain TransactionCase, which unittest never awaits (silent no-ops) -- which is why the router changes showed as uncovered. Add test_data_api_pull_security.py driving the async endpoints via asyncio.run so they actually run: pull rejects unknown (fail-closed) and non-pullable variables with 403, enforces company isolation, returns ordinary values; list excludes non-external with a consistent total. Remove the 3 no-op async pull tests added earlier (superseded by these). These running tests surfaced two pre-existing latent bugs in the endpoints (both in functions touched here), now fixed: - pull_values crashed (pydantic) on any cached value without a TTL, because an unset Odoo Datetime returns False, not None -> serialize expires_at as `rec.expires_at or None`. - list_variables crashed (AttributeError) when spp_studio is not installed, since `description` is contributed by spp_studio -> use getattr. Tests: spp_api_v2_data 59/59, spp_cel_domain 621/621, spp_dci_indicators 82/82. NOTE: the repo-wide async-test-no-op problem (async def endpoint tests across several spp_api_v2_* modules never execute) is pre-existing and tracked separately; only this module's relevant tests are made real here.
Summary
The Data API pull endpoint exposed sensitive cached values to any client with the generic
data:readscope.GET /Data/pull(spp_api_v2_data/routers/data.py) readspp.data.valuewithsudo()(bypassing the ORM ACL #234 added) and matched anyvariable_namewith no provider/variable-ownership check — unlikepush/invalidate, which call_validate_provider_access. Once eligibility precompute/refresh cached DCI-backed values, adata:readclient could pull CRVS facts (r.dci.crvs.is_alive,birth_verified) and disability metrics for any subject. Scoring/PMT values leaked the same way, andGET /Data/variablesenumerated all of them.The internal ORM/RPC read by
base.group_userwas already closed by #234; this PR covers the FastAPIsudopath that #234 did not.Fix — positive, default-deny allowlist on the resolved variable
spp_cel_domain:spp.cel.variable.is_data_api_pullable()— true only for ordinary external-provider variables; computed/scoring/aggregate are not.spp_dci_indicators: overrides it to also exclude DCI-backed providers (inter-registry data has its own consent/provider boundary).spp_api_v2_data/routers/data.py:/pullresolves the requested name bycel_accessor(the cache key), and returns a uniform 403 if it resolves to nothing (fail-closed — covers orphan rows from a deleted/de-provisioned variable) or if any match isn't pullable. Adds the missingcompany_idfilter (cross-company read undersudo)./variableslists only pullable external-provider variables (no longer enumerates computed/scoring/DCI accessors).Why an allowlist resolved by
cel_accessor, fail-closedA staff review showed a DCI-only denylist was insufficient: cache rows are keyed by
cel_accessor(notname, which is whatis_dci_backed-by-name would miss);cel_accessoris unique only perapplies_to(so all matches must be checked, deny-on-any); scoring/PMT values aren'tis_dci_backedbut leak identically; and orphaned rows must fail closed.Scope / non-impact
metric('r.dci.crvs.is_alive', me)==1→beneficiary_countoracle; separatesimulation:executescope) and a subject-existence oracle viasudo_resolve_subject_id.Tests
spp_cel_domain: baseis_data_api_pullablerule (external/computed/scoring/no-provider).spp_dci_indicators: DCI-backed override (DCI → not pullable; ordinary → pullable).spp_api_v2_data: endpoint behavior — unknown variable → 403, non-pullable (computed) → 403, company isolation; updated existingno_results/list_variablestests for the deliberate behavior change.All green:
spp_api_v2_data57/57 ·spp_cel_domain620/620 ·spp_dci_indicators82/82. Lint-clean (ruff/ruff-format).