Skip to content

security(dci): restrict Data API pull to non-sensitive provider variables#257

Open
gonzalesedwin1123 wants to merge 2 commits into
19.0from
security-dci-crvs-cache
Open

security(dci): restrict Data API pull to non-sensitive provider variables#257
gonzalesedwin1123 wants to merge 2 commits into
19.0from
security-dci-crvs-cache

Conversation

@gonzalesedwin1123

Copy link
Copy Markdown
Member

Summary

The Data API pull endpoint exposed sensitive cached values to any client with the generic data:read scope. GET /Data/pull (spp_api_v2_data/routers/data.py) read spp.data.value with sudo() (bypassing the ORM ACL #234 added) and matched any variable_name with no provider/variable-ownership check — unlike push/invalidate, which call _validate_provider_access. Once eligibility precompute/refresh cached DCI-backed values, a data:read client could pull CRVS facts (r.dci.crvs.is_alive, birth_verified) and disability metrics for any subject. Scoring/PMT values leaked the same way, and GET /Data/variables enumerated all of them.

The internal ORM/RPC read by base.group_user was already closed by #234; this PR covers the FastAPI sudo path that #234 did not.

Fix — positive, default-deny allowlist on the resolved variable

  • spp_cel_domain: spp.cel.variable.is_data_api_pullable() — true only for ordinary external-provider variables; computed/scoring/aggregate are not.
  • spp_dci_indicators: overrides it to also exclude DCI-backed providers (inter-registry data has its own consent/provider boundary).
  • spp_api_v2_data/routers/data.py:
    • /pull resolves the requested name by cel_accessor (the cache key), and returns a uniform 403 if it resolves to nothing (fail-closed — covers orphan rows from a deleted/de-provisioned variable) or if any match isn't pullable. Adds the missing company_id filter (cross-company read under sudo).
    • /variables lists only pullable external-provider variables (no longer enumerates computed/scoring/DCI accessors).

Why an allowlist resolved by cel_accessor, fail-closed

A staff review showed a DCI-only denylist was insufficient: cache rows are keyed by cel_accessor (not name, which is what is_dci_backed-by-name would miss); cel_accessor is unique only per applies_to (so all matches must be checked, deny-on-any); scoring/PMT values aren't is_dci_backed but leak identically; and orphaned rows must fail closed.

Scope / non-impact

  • No change to how values are cached or to internal CEL evaluation. Ordinary external-provider pull/push/invalidate behavior is unchanged.
  • Two sibling oracles found during review are tracked separately, not in this PR: the Simulation API runs arbitrary CEL (metric('r.dci.crvs.is_alive', me)==1beneficiary_count oracle; separate simulation:execute scope) and a subject-existence oracle via sudo _resolve_subject_id.

Tests

  • spp_cel_domain: base is_data_api_pullable rule (external/computed/scoring/no-provider).
  • spp_dci_indicators: DCI-backed override (DCI → not pullable; ordinary → pullable).
  • spp_api_v2_data: endpoint behavior — unknown variable → 403, non-pullable (computed) → 403, company isolation; updated existing no_results/list_variables tests for the deliberate behavior change.

All green: spp_api_v2_data 57/57 · spp_cel_domain 620/620 · spp_dci_indicators 82/82. Lint-clean (ruff/ruff-format).

…bles

The Data API GET /Data/pull required only the generic data:read scope, read
spp.data.value with sudo (bypassing the #234 ORM ACL), and matched any
variable_name with no provider/variable check (unlike push/invalidate). Once
eligibility precompute/refresh cached DCI-backed values, a data:read client
could pull sensitive CRVS facts (r.dci.crvs.is_alive, birth_verified) and
disability metrics for any subject. Scoring/PMT values were exposed the same
way, and /variables enumerated all of them.

Add a positive, default-deny allowlist on the resolved variable:
- spp.cel.variable.is_data_api_pullable() — true only for ordinary
  external-provider variables; computed/scoring/aggregate are not.
- spp_dci_indicators overrides it to also exclude DCI-backed providers
  (inter-registry data has its own consent/provider boundary).
- /pull resolves the requested name by cel_accessor (the cache key), denies
  with a uniform 403 if it resolves to nothing (fail-closed, covers orphan
  rows) or if any match is not pullable; also adds the missing company_id
  filter (cross-company read under sudo).
- /variables lists only pullable external-provider variables.

The internal ORM/RPC read by base.group_user was already closed by #234; this
covers the FastAPI sudo path it did not. Sibling oracles (Simulation metric()
expressions; subject-existence via sudo resolve) are tracked separately.

Tests: base rule (spp_cel_domain), DCI override (spp_dci_indicators), endpoint
behavior incl. unknown/non-pullable -> 403 and company isolation
(spp_api_v2_data). spp_api_v2_data 57/57, spp_cel_domain 620/620,
spp_dci_indicators 82/82.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restricts the generic external Data API to only expose ordinary external-provider variables, preventing computed, scoring, and DCI-backed variables from being retrieved. It also adds company isolation to the pull endpoint and includes comprehensive tests. The review feedback highlights a critical issue in the list_variables endpoint where Python-side filtering of variables causes a mismatch with database-side total counts and pagination. To resolve this, the reviewer suggests implementing a model method _get_data_api_pullable_domain to perform the filtering directly at the database level.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread spp_api_v2_data/routers/data.py Outdated
Comment thread spp_cel_domain/models/cel_variable.py
Comment thread spp_dci_indicators/models/cel_variable_dci.py Outdated
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.80%. Comparing base (f04d936) to head (5c31c51).

Files with missing lines Patch % Lines
spp_api_v2_data/routers/data.py 71.42% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             19.0     #257      +/-   ##
==========================================
+ Coverage   74.75%   74.80%   +0.05%     
==========================================
  Files        1090     1092       +2     
  Lines       64813    64836      +23     
==========================================
+ Hits        48453    48503      +50     
+ Misses      16360    16333      -27     
Flag Coverage Δ
spp_analytics 93.13% <ø> (ø)
spp_api_v2_change_request 66.85% <ø> (ø)
spp_api_v2_cycles 71.12% <ø> (ø)
spp_api_v2_data 77.43% <71.42%> (+13.01%) ⬆️
spp_api_v2_entitlements 70.19% <ø> (ø)
spp_api_v2_gis 84.27% <ø> (ø)
spp_api_v2_programs 92.03% <ø> (ø)
spp_api_v2_simulation 71.12% <ø> (ø)
spp_approval 50.29% <ø> (ø)
spp_audit_programs 0.00% <ø> (?)
spp_base_common 90.26% <ø> (ø)
spp_case_cel 89.01% <ø> (ø)
spp_case_entitlements 97.61% <ø> (ø)
spp_case_programs 97.14% <ø> (ø)
spp_cel_domain 62.82% <100.00%> (+0.04%) ⬆️
spp_cel_event 85.23% <ø> (ø)
spp_dci_indicators 95.83% <100.00%> (+0.21%) ⬆️
spp_programs 65.12% <ø> (ø)
spp_registry 86.83% <ø> (ø)
spp_security 66.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
spp_cel_domain/models/cel_variable.py 82.58% <100.00%> (+0.60%) ⬆️
spp_dci_indicators/models/__init__.py 100.00% <100.00%> (ø)
spp_dci_indicators/models/cel_variable_dci.py 100.00% <100.00%> (ø)
spp_api_v2_data/routers/data.py 69.17% <71.42%> (+20.58%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Code review (Gemini) + Codecov follow-ups on the pull/list allowlist:

- list_variables: filter at the DB level via a new model method
  spp.cel.variable._get_data_api_pullable_domain() (overridden in
  spp_dci_indicators to also exclude DCI-backed providers), instead of a Python
  post-filter. This fixes the total count and cursor pagination, which were
  computed on the unfiltered domain.

- Real, executing endpoint tests: the existing Data API endpoint tests are
  `async def` on a plain TransactionCase, which unittest never awaits (silent
  no-ops) -- which is why the router changes showed as uncovered. Add
  test_data_api_pull_security.py driving the async endpoints via asyncio.run so
  they actually run: pull rejects unknown (fail-closed) and non-pullable
  variables with 403, enforces company isolation, returns ordinary values; list
  excludes non-external with a consistent total. Remove the 3 no-op async pull
  tests added earlier (superseded by these).

These running tests surfaced two pre-existing latent bugs in the endpoints
(both in functions touched here), now fixed:
- pull_values crashed (pydantic) on any cached value without a TTL, because an
  unset Odoo Datetime returns False, not None -> serialize expires_at as
  `rec.expires_at or None`.
- list_variables crashed (AttributeError) when spp_studio is not installed,
  since `description` is contributed by spp_studio -> use getattr.

Tests: spp_api_v2_data 59/59, spp_cel_domain 621/621, spp_dci_indicators 82/82.

NOTE: the repo-wide async-test-no-op problem (async def endpoint tests across
several spp_api_v2_* modules never execute) is pre-existing and tracked
separately; only this module's relevant tests are made real here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant