Skip to content

direct: Fix permanent drift on permissions after out-of-band parent recreate#5587

Draft
janniklasrose wants to merge 5 commits into
mainfrom
janniklasrose/permissions-remote-id
Draft

direct: Fix permanent drift on permissions after out-of-band parent recreate#5587
janniklasrose wants to merge 5 commits into
mainfrom
janniklasrose/permissions-remote-id

Conversation

@janniklasrose

@janniklasrose janniklasrose commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Changes

When a parent resource is deleted and recreated remotely under the same name (e.g. a model serving endpoint), its *.permissions sub-resource previously drifted forever: deployment state kept the old object_id as the resource ID, and every bundle plan showed a permissions update that deploy never resolved.

  • ResourcePermissions now implements DoUpdateWithID, persisting the new object_id as the resource ID; update_id_on_changes: object_id is wired for every *.permissions entry in resources.yml.
  • DoRead gains a typed newState parameter (all resources; mirrors DoCreate). Most resources ignore it; ResourcePermissions reads ACLs against newState.ObjectID instead of the possibly-stale state ID. The plan delete path passes nil and falls back to the state ID.
  • New acceptance drift test for model_serving_endpoints (V1 permissions API counterpart to the existing V2 vector_search_endpoints test).

Why

V1 permissions APIs (jobs, pipelines, model serving) don't delete ACLs immediately when the parent is gone, so reading by the old object_id keeps returning the deleted parent's ACLs and the drift never converges; on V2 (vector search) the read 404s instead of observing the new endpoint's ACLs. Reading via the planned newState and persisting the new ID via DoUpdateWithID makes the second deploy converge to zero drift on both API generations.

The recorded outputs of bundle/state/permission_level_migration change as a consequence: the permissions plan action becomes update_id (reason id_changes) and state __id__ now tracks the new object ID — the old recording exhibited the exact stale-ID pattern this PR removes.

Tests

  • New acceptance test bundle/resources/model_serving_endpoints/drift/recreated_same_name; updated vector_search_endpoints twin.
  • Full ./task test, ./task fmt, ./task checks, ./task lint green; generate-schema/generate-direct are no-ops.
  • Both drift tests pass against a real workspace (aws-prod-ucws via deco), confirming the V1 eventual-consistency behavior end-to-end.

This pull request and its description were written by Isaac, an AI coding agent.

… same name

After deleting and recreating a model serving endpoint remotely with the
same name but a different endpoint_id, V1 permissions API behavior
results in permanent drift on permissions: bundle plan keeps showing an
update for permissions even after a successful deploy. The V1 endpoints
do not delete ACLs immediately when the parent is gone, so DoRead from
the old object_id keeps returning ACL data, while plan computes the new
object_id from the recreated endpoint.

This is the V1 counterpart to vector_search_endpoints/drift/recreated_same_name,
which exercises V2 behavior (404 → create plan, no drift after deploy).

Co-authored-by: Isaac
When the parent resource is recreated remotely with a different identifier
(e.g. a model serving endpoint deleted and recreated under the same name),
PermissionsState.ObjectID changes between deploys. Previously the framework
called DoUpdate, which kept the deployment state ID pointing at the old,
gone object_id. On V1 permissions APIs (jobs, pipelines, model serving),
DoRead with the old object_id keeps returning ACL data for the deleted
parent due to eventual consistency, so plan saw the same ObjectID drift
on every subsequent run — a permanent update on permissions.

Add DoUpdateWithID that returns newState.ObjectID as the new resource ID
so the framework persists the new ID in deployment state, and wire up
update_id_on_changes for object_id on every *.permissions resource that
uses ResourcePermissions. Subsequent plans then compare against the new
ObjectID and see no drift.

Update the model_serving_endpoints drift acceptance test to assert no
permanent drift after deploy (same shape as the vector_search V2 test).

Co-authored-by: Isaac
DoRead previously took only the deployment-state id, which is stale after
an out-of-band recreate of the parent resource: the id points at the
gone object_id while the new plan has already resolved newState.ObjectID
to the freshly-created identifier. The permissions resource needs to read
against the new identifier, otherwise on V1 permissions APIs (jobs,
pipelines, model serving) DoRead keeps returning ACL data for the deleted
parent and produces a permanent drift; on V2 (vector search) it 404s
when it should be reading the new endpoint's empty ACLs.

Add a newState parameter to DoRead across all resources. Most resources
ignore it and continue to read by id. ResourcePermissions uses
newState.ObjectID when available so subsequent plans see no drift after
a parent recreate, and the test-server's new-endpoint ACL state is
correctly observed.

For the delete path in plan, where there is no planned new state, pass
nil so the adapter falls back to id-based read. For refreshRemoteState
post-deploy, thread newState (sv.Value) through from bundle_apply.

Update vector_search_endpoints/drift/recreated_same_name acceptance to
match the new behaviour: the plan now reads the freshly-created endpoint
and shows an "update" on permissions instead of "create" — both end with
no permanent drift after deploy.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 7418948

Run: 27440413461

Env ❌​FAIL 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 264 978 9:01
🟨​ aws windows 7 15 266 976 15:41
❌​ aws-ucws linux 2 1 6 15 358 892 9:19
❌​ aws-ucws windows 2 1 6 15 360 890 14:39
💚​ azure linux 1 17 267 976 6:46
💚​ azure windows 1 17 269 974 11:58
❌​ azure-ucws linux 2 1 3 17 360 888 17:20
❌​ azure-ucws windows 35 1 17 332 886 14:20
❌​ gcp linux 3 1 17 260 979 10:56
🔄​ gcp windows 5 17 261 977 32:32
57 interesting tests: 35 FAIL, 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 🟨​K 🟨​K 💚​R 💚​R 🟨​K 🟨​K 💚​R 🔄​f
❌​ TestAccept/bundle/deploy/files/no-snapshot-sync ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deploy/mlops-stacks ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deploy/mlops-stacks/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deployment/bind/alert 🙈​s 🙈​s 🙈​s 🙈​s ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deployment/bind/alert/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deployment/bind/alert/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deployment/bind/job/generate-and-bind ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/destroy/jobs-and-pipeline ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/destroy/jobs-and-pipeline/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/destroy/jobs-and-pipeline/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
❌​ TestAccept/bundle/resources/alerts/with_file ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/alerts/with_file/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/alerts/with_file/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/apps/inline_config ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p 🔄​f
❌​ TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F ✅​p 🔄​f
❌​ TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F ✅​p 🔄​f
❌​ TestAccept/bundle/resources/dashboards/generate_inplace ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/dashboards/generate_inplace/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/dashboards/generate_inplace/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/jobs/check-metadata ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/jobs/check-metadata/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/jobs/check-metadata/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/resources/model_serving_endpoints/basic 🙈​s 🙈​s ❌​F ❌​F 🙈​s 🙈​s ❌​F ❌​F 🙈​s 🙈​s
❌​ TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=direct ❌​F ❌​F ❌​F ❌​F
❌​ TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform ✅​p ✅​p ✅​p ❌​F
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
❌​ TestAccept/bundle/run_as/job_default ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
❌​ TestAccept/bundle/run_as/job_default/DATABRICKS_BUNDLE_ENGINE=direct ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ✅​p ✅​p
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
❌​ TestFetchRepositoryInfoAPI_FromRepo ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ❌​F ❌​F 🔄​f
❌​ TestFetchRepositoryInfoAPI_FromRepo/root ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ❌​F ✅​p
❌​ TestFetchRepositoryInfoAPI_FromRepo/subdir ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p ❌​F ❌​F ✅​p
Top 21 slowest tests (at least 2 minutes):
duration env testname
6:18 azure windows TestAccept
4:42 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:18 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:18 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:10 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:32 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:29 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:10 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:01 gcp linux TestAccept
2:58 azure linux TestAccept
2:50 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:49 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:48 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:41 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:35 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:34 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:31 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:29 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:20 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants