Skip to content

Derive config-remote-sync field filtering from resource lifecycle metadata#5590

Open
ilyakuz-db wants to merge 1 commit into
configsync-skip-dashboard-etagfrom
configsync-use-lifecycle-metadata
Open

Derive config-remote-sync field filtering from resource lifecycle metadata#5590
ilyakuz-db wants to merge 1 commit into
configsync-skip-dashboard-etagfrom
configsync-use-lifecycle-metadata

Conversation

@ilyakuz-db

Copy link
Copy Markdown
Contributor

Stacked on #5589 (review only the last commit here until that one merges).

Changes

Stacked on the dashboard-etag quick fix PR.

  1. config-remote-sync now derives its field filtering from the resource lifecycle metadata (bundle/direct/dresources/resources.yml + resources.generated.yml) instead of the hand-maintained serverSideDefaults table in bundle/configsync/defaults.go:
    • ignore_remote_changes matches are dropped from sync results unconditionally — for sync, "ignore remote changes" means "this field never belongs in configuration". This covers output-only fields (etag), etag_based fields (serialized_dashboard), and input_only fields (dataset_catalog/dataset_schema can no longer produce remove operations).
    • backend_defaults matches are dropped only when the field is absent from config, respecting the optional values constraint.
    • configsync deliberately matches the metadata directly, never via plan actions or adapter OverrideChangeDesc hooks: those answer "what will deploy do", not "what belongs in config" (e.g. dashboard etag drift is re-promoted to Update for modified-remotely detection). This is engine-agnostic and requires no adapters.
    • Sync-specific rules stay in the configsync package (jobs edit_mode, queue reset values, name-prefix stripping).
  2. FindMatchingRule / MatchesAllowedValue / MatchesBackendDefault moved (unchanged) from bundle/direct/bundle_plan.go into the dresources package so configsync reuses them instead of duplicating the matching logic.
  3. resources.yml updates:
    • dashboards: etag added to ignore_remote_changes (inert for deploy — OverrideChangeDesc runs after the metadata chain).
    • jobs: performance_target backend default (PERFORMANCE_OPTIMIZED; the backend returns it for all jobs when unset, not only serverless), tasks[*].timeout_seconds/disabled defaults, email_notifications.no_alert_for_skipped_runs defaults (job/task/for_each levels; the real backend returns it when notifications aren't configured), single_user_name defaults for task/job clusters.
    • clusters: driver/worker_node_type_flexibility backend defaults (computed by some backends for standalone clusters).
    • Removed all node_type_id backend-default entries: the backend cannot default it (creating a cluster without it fails with 400 "Unknown node type id"), and sync's nested filtering must not strip it from remotely added clusters or the synced config becomes undeployable.

Two intentional semantic changes, pinned by unit tests: (a) when a config field's remote value filters entirely to defaults, sync now emits remove instead of replace with {}; (b) backend_defaults entries without values constraints strip any remote value during sync when the field is absent from config, matching the deploy plan's backend-default semantics (the old table only stripped one specific value).

Why

The hardcoded table constantly lagged behind resources.yml — every new resource or output-only field had to be added in two places, and gaps leaked server-side values into users' YAML (the dashboard etag being the worst case: a hardcoded stale etag makes config/state/remote permanently inconsistent). With this change, lifecycle metadata added for the deploy plan automatically protects config-remote-sync, including future output-only fields flowing in via codegen.

Tests

  • Unit: metadata-driven skip logic, nested entity filtering (incl. the no_alert_for_skipped_runs form that previously leaked), array-arity preservation, replace→remove semantics, moved dresources helpers, MustParsePattern.
  • All acceptance/bundle/config-remote-sync tests pass on both engines, locally and on real workspaces (cloud acceptance run); acceptance/bundle/resources/dashboards (including modified-remotely detection) unchanged and passing.
  • No committed acceptance outputs changed.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/bundle/ - needs approval

8 files changed
Suggested: @denik
Also eligible: @pietern, @janniklasrose, @andrewnester, @shreyas-goenka, @anton-107, @lennartkats-db

General files (require maintainer)

Files: libs/structs/structpath/path.go, libs/structs/structpath/path_test.go
Based on git history:

  • @denik -- recent work in bundle/direct/, bundle/direct/dresources/, libs/structs/structpath/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 12, 2026 20:27 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 12, 2026 20:27 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db force-pushed the configsync-use-lifecycle-metadata branch from ea73f97 to e6c3d6a Compare June 12, 2026 20:27
@ilyakuz-db ilyakuz-db force-pushed the configsync-skip-dashboard-etag branch from c9295ba to 74fe136 Compare June 12, 2026 20:27
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 12, 2026 20:28 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 12, 2026 20:28 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 12, 2026 20:28 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 12, 2026 20:28 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: acbd897

Run: 27452721192

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 264 970 8:01
🟨​ aws windows 7 15 266 968 16:47
💚​ aws-ucws linux 7 15 360 884 7:26
💚​ aws-ucws windows 7 15 362 882 12:02
💚​ azure linux 1 17 267 968 6:31
💚​ azure windows 1 17 269 966 12:34
💚​ azure-ucws linux 1 17 365 880 7:51
💚​ azure-ucws windows 1 17 367 878 12:33
💚​ gcp linux 1 17 263 971 7:17
💚​ gcp windows 1 17 265 969 11:29
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 28 slowest tests (at least 2 minutes):
duration env testname
6:15 gcp windows TestAccept
6:10 aws-ucws windows TestAccept
6:03 azure windows TestAccept
6:02 azure-ucws windows TestAccept
4:47 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:19 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:14 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:33 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:31 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:04 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:01 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:01 azure linux TestAccept
3:00 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:54 azure-ucws linux TestAccept
2:53 gcp linux TestAccept
2:49 aws-ucws linux TestAccept
2:48 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:48 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:45 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:44 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:41 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:41 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:41 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:38 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:37 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:31 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:31 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:31 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

…adata

config-remote-sync previously filtered server-side noise out of detected
remote changes via a hand-maintained serverSideDefaults table in
bundle/configsync/defaults.go. The table constantly lagged behind
resources.yml. This change derives the filtering from the same lifecycle
metadata that bundle plan uses (bundle/direct/dresources/resources.yml
and resources.generated.yml) and deletes the table.

configsync matches the metadata directly instead of relying on the
plan's per-field actions: the plan answers "what will deploy do", which
is not the same as "does this field belong in configuration". For
example, ResourceDashboard.OverrideChangeDesc re-promotes etag drift to
Update after the metadata chain (deploy needs that for modified-remotely
detection), but etag must never be written to YAML. Concretely:

- Fields matching ignore_remote_changes (in either the hand-written or
  the generated config) are never written to configuration, regardless
  of values or plan action.
- Fields matching backend_defaults are dropped when absent from the
  config (cd.New == nil) and the remote value matches the rule,
  including the optional values constraint.
- The same matching drives the nested filtering of entities added
  remotely as a whole (e.g. a new task); map fields that become empty
  after filtering are pruned, but empty elements inside arrays are kept
  as {} to preserve array arity.

Two intentional semantic changes compared to the old table:

- When a field set in config has a remote value that filters entirely
  to backend defaults, the operation is now Remove (the field is
  dropped from the YAML) instead of Replace with an empty map. This
  produces cleaner config and is pinned by a unit test.
- Pre-existing backend_defaults entries without values constraints
  (e.g. notebook_task.source, data_security_mode) now make sync drop
  any remote value when the field is absent from config, where the old
  table only dropped one specific default value. This intentionally
  aligns sync with the deploy plan's backend-default semantics; adding
  values constraints to those entries would change plan behavior and is
  out of scope here.

The matching helpers move from bundle/direct/bundle_plan.go into the
dresources package (FindMatchingRule, MatchesAllowedValue, and the
backend-defaults loop as ResourceLifecycleConfig.MatchesBackendDefault)
so plan and sync share one implementation.

Sync-specific rules stay in the configsync package, ported to structpath
patterns keyed by resource type: jobs edit_mode (set by the CLI, not
user-settable in databricks.yml), the jobs queue reset value, and
name-prefix stripping. These have sync-only drift semantics and don't
belong in resources.yml.

One backend_defaults rule needs a sync-side exception rather than
removal: the jobs new_cluster node_type_id entries (tasks[*] and
job_clusters[*]) are computed fields for the plan. Config legitimately
omits node_type_id when a cluster policy fixes it, and jobs/get then
returns the policy-resolved value in the stored settings (verified on
AWS prod, both with and without apply_policy_default_values), so
removing the entries would cause permadrift on every plan and a
spurious update on every deploy. For sync the opposite holds: a
remotely added cluster's explicit node_type_id is required
configuration, and stripping it via nested filtering would make the
synced YAML undeployable (cluster creation fails with 400 "Unknown
node type id" unless a pool or policy provides the node type). These
fields are therefore listed in fieldsKeptForSync, which nested entity
filtering consults before dropping a field. Keeping them
unconditionally cannot introduce a node_type_id/instance_pool_id
conflict: the Jobs API rejects requests carrying both, and jobs/get
does not materialize node_type_id for pool-backed clusters (also
verified empirically), so a remote cluster never has both fields.

resources.yml changes:

- jobs: add performance_target backend default with values
  ["PERFORMANCE_OPTIMIZED"]; the backend returns it for all jobs when
  unset, not only for serverless jobs.
- jobs: add tasks[*].timeout_seconds / tasks[*].disabled backend
  defaults (and for_each_task variants) with values [0] / [false].
  These are inert for the plan, which already skips them as all-empty;
  they are needed so config-remote-sync does not write these
  backend-returned defaults into YAML for tasks added remotely.
- jobs: add email_notifications.no_alert_for_skipped_runs backend
  defaults with values [false] at the job level, tasks[*], and
  tasks[*].for_each_task.task. The backend populates this deprecated
  field when notifications are not configured. Inert for the plan: the
  all-empty check in addPerFieldActions runs before backend defaults
  and already skips zero values. Needed so sync drops the leaf and
  then prunes the empty email_notifications map for entities added
  remotely.
- jobs: add tasks/job_clusters new_cluster.single_user_name backend
  defaults, mirroring the existing standalone clusters entry (#4418).
- clusters: add driver_node_type_flexibility and
  worker_node_type_flexibility backend defaults; some backends compute
  alternate node types for standalone clusters when the config does not
  specify flexibility (observed on AWS staging workspaces).
- dashboards: add etag to ignore_remote_changes (output_only). This does
  not change deploy behavior because OverrideChangeDesc runs after the
  metadata chain; see the comment in resources.yml.
- pipelines: remove the clusters[*].node_type_id backend_defaults
  entry. Unlike jobs and standalone clusters, pipelines/get returns
  the stored spec verbatim: clusters relying on an instance pool or a
  policy-fixed node type never gain node_type_id in the spec (verified
  on AWS prod for both cases), and ResourcePipeline.DoRead reads the
  spec, so the entry can never match a real remote diff. Removing it
  also keeps sync's nested filtering from stripping node_type_id off
  remotely added pipeline clusters.
- The jobs and standalone clusters node_type_id entries stay: jobs/get
  materializes policy-fixed node types (see above), and clusters/get
  reports the pool's node type at the top level for pool-backed
  clusters while config must omit it (the API rejects node_type_id
  together with instance_pool_id).

Table entries that needed no resources.yml counterpart: zero/empty
values (job and task timeout_seconds: 0 at the top level, empty
email_notifications/webhook_notifications, pipelines continuous: false,
tasks[*].pipeline_task.full_refresh: false) are already skipped by the
plan's all-empty check before they reach sync, and empty maps inside
remotely added entities are pruned by the nested filtering; the
non-empty email_notifications form the backend returns
({no_alert_for_skipped_runs: false}) is covered by the new explicit
rules above. The remaining entries (cloud attributes,
data_security_mode, experiments artifact_location, registered_models
owner/full_name/metastore_id/storage_location, volumes
storage_location, sql_warehouses
creator_name/min_num_clusters/warehouse_type, jobs run_as, pipelines
storage) were already covered by existing resources.yml rules.
@ilyakuz-db ilyakuz-db force-pushed the configsync-skip-dashboard-etag branch from 74fe136 to f4c45ef Compare June 13, 2026 01:35
@ilyakuz-db ilyakuz-db force-pushed the configsync-use-lifecycle-metadata branch from e6c3d6a to acbd897 Compare June 13, 2026 01:35
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 13, 2026 01:36 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 13, 2026 01:36 — with GitHub Actions Inactive
@ilyakuz-db ilyakuz-db deployed to test-trigger-is June 13, 2026 01:36 — with GitHub Actions Active
@ilyakuz-db ilyakuz-db temporarily deployed to test-trigger-is June 13, 2026 01:36 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants