Derive config-remote-sync field filtering from resource lifecycle metadata#5590
Open
ilyakuz-db wants to merge 1 commit into
Open
Derive config-remote-sync field filtering from resource lifecycle metadata#5590ilyakuz-db wants to merge 1 commit into
ilyakuz-db wants to merge 1 commit into
Conversation
Contributor
Approval status: pending
|
ea73f97 to
e6c3d6a
Compare
c9295ba to
74fe136
Compare
Collaborator
Integration test reportCommit: acbd897
22 interesting tests: 15 SKIP, 7 KNOWN
Top 28 slowest tests (at least 2 minutes):
|
…adata
config-remote-sync previously filtered server-side noise out of detected
remote changes via a hand-maintained serverSideDefaults table in
bundle/configsync/defaults.go. The table constantly lagged behind
resources.yml. This change derives the filtering from the same lifecycle
metadata that bundle plan uses (bundle/direct/dresources/resources.yml
and resources.generated.yml) and deletes the table.
configsync matches the metadata directly instead of relying on the
plan's per-field actions: the plan answers "what will deploy do", which
is not the same as "does this field belong in configuration". For
example, ResourceDashboard.OverrideChangeDesc re-promotes etag drift to
Update after the metadata chain (deploy needs that for modified-remotely
detection), but etag must never be written to YAML. Concretely:
- Fields matching ignore_remote_changes (in either the hand-written or
the generated config) are never written to configuration, regardless
of values or plan action.
- Fields matching backend_defaults are dropped when absent from the
config (cd.New == nil) and the remote value matches the rule,
including the optional values constraint.
- The same matching drives the nested filtering of entities added
remotely as a whole (e.g. a new task); map fields that become empty
after filtering are pruned, but empty elements inside arrays are kept
as {} to preserve array arity.
Two intentional semantic changes compared to the old table:
- When a field set in config has a remote value that filters entirely
to backend defaults, the operation is now Remove (the field is
dropped from the YAML) instead of Replace with an empty map. This
produces cleaner config and is pinned by a unit test.
- Pre-existing backend_defaults entries without values constraints
(e.g. notebook_task.source, data_security_mode) now make sync drop
any remote value when the field is absent from config, where the old
table only dropped one specific default value. This intentionally
aligns sync with the deploy plan's backend-default semantics; adding
values constraints to those entries would change plan behavior and is
out of scope here.
The matching helpers move from bundle/direct/bundle_plan.go into the
dresources package (FindMatchingRule, MatchesAllowedValue, and the
backend-defaults loop as ResourceLifecycleConfig.MatchesBackendDefault)
so plan and sync share one implementation.
Sync-specific rules stay in the configsync package, ported to structpath
patterns keyed by resource type: jobs edit_mode (set by the CLI, not
user-settable in databricks.yml), the jobs queue reset value, and
name-prefix stripping. These have sync-only drift semantics and don't
belong in resources.yml.
One backend_defaults rule needs a sync-side exception rather than
removal: the jobs new_cluster node_type_id entries (tasks[*] and
job_clusters[*]) are computed fields for the plan. Config legitimately
omits node_type_id when a cluster policy fixes it, and jobs/get then
returns the policy-resolved value in the stored settings (verified on
AWS prod, both with and without apply_policy_default_values), so
removing the entries would cause permadrift on every plan and a
spurious update on every deploy. For sync the opposite holds: a
remotely added cluster's explicit node_type_id is required
configuration, and stripping it via nested filtering would make the
synced YAML undeployable (cluster creation fails with 400 "Unknown
node type id" unless a pool or policy provides the node type). These
fields are therefore listed in fieldsKeptForSync, which nested entity
filtering consults before dropping a field. Keeping them
unconditionally cannot introduce a node_type_id/instance_pool_id
conflict: the Jobs API rejects requests carrying both, and jobs/get
does not materialize node_type_id for pool-backed clusters (also
verified empirically), so a remote cluster never has both fields.
resources.yml changes:
- jobs: add performance_target backend default with values
["PERFORMANCE_OPTIMIZED"]; the backend returns it for all jobs when
unset, not only for serverless jobs.
- jobs: add tasks[*].timeout_seconds / tasks[*].disabled backend
defaults (and for_each_task variants) with values [0] / [false].
These are inert for the plan, which already skips them as all-empty;
they are needed so config-remote-sync does not write these
backend-returned defaults into YAML for tasks added remotely.
- jobs: add email_notifications.no_alert_for_skipped_runs backend
defaults with values [false] at the job level, tasks[*], and
tasks[*].for_each_task.task. The backend populates this deprecated
field when notifications are not configured. Inert for the plan: the
all-empty check in addPerFieldActions runs before backend defaults
and already skips zero values. Needed so sync drops the leaf and
then prunes the empty email_notifications map for entities added
remotely.
- jobs: add tasks/job_clusters new_cluster.single_user_name backend
defaults, mirroring the existing standalone clusters entry (#4418).
- clusters: add driver_node_type_flexibility and
worker_node_type_flexibility backend defaults; some backends compute
alternate node types for standalone clusters when the config does not
specify flexibility (observed on AWS staging workspaces).
- dashboards: add etag to ignore_remote_changes (output_only). This does
not change deploy behavior because OverrideChangeDesc runs after the
metadata chain; see the comment in resources.yml.
- pipelines: remove the clusters[*].node_type_id backend_defaults
entry. Unlike jobs and standalone clusters, pipelines/get returns
the stored spec verbatim: clusters relying on an instance pool or a
policy-fixed node type never gain node_type_id in the spec (verified
on AWS prod for both cases), and ResourcePipeline.DoRead reads the
spec, so the entry can never match a real remote diff. Removing it
also keeps sync's nested filtering from stripping node_type_id off
remotely added pipeline clusters.
- The jobs and standalone clusters node_type_id entries stay: jobs/get
materializes policy-fixed node types (see above), and clusters/get
reports the pool's node type at the top level for pool-backed
clusters while config must omit it (the API rejects node_type_id
together with instance_pool_id).
Table entries that needed no resources.yml counterpart: zero/empty
values (job and task timeout_seconds: 0 at the top level, empty
email_notifications/webhook_notifications, pipelines continuous: false,
tasks[*].pipeline_task.full_refresh: false) are already skipped by the
plan's all-empty check before they reach sync, and empty maps inside
remotely added entities are pruned by the nested filtering; the
non-empty email_notifications form the backend returns
({no_alert_for_skipped_runs: false}) is covered by the new explicit
rules above. The remaining entries (cloud attributes,
data_security_mode, experiments artifact_location, registered_models
owner/full_name/metastore_id/storage_location, volumes
storage_location, sql_warehouses
creator_name/min_num_clusters/warehouse_type, jobs run_as, pipelines
storage) were already covered by existing resources.yml rules.
74fe136 to
f4c45ef
Compare
e6c3d6a to
acbd897
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #5589 (review only the last commit here until that one merges).
Changes
Stacked on the dashboard-etag quick fix PR.
bundle/direct/dresources/resources.yml+resources.generated.yml) instead of the hand-maintainedserverSideDefaultstable inbundle/configsync/defaults.go:ignore_remote_changesmatches are dropped from sync results unconditionally — for sync, "ignore remote changes" means "this field never belongs in configuration". This covers output-only fields (etag),etag_basedfields (serialized_dashboard), andinput_onlyfields (dataset_catalog/dataset_schemacan no longer produceremoveoperations).backend_defaultsmatches are dropped only when the field is absent from config, respecting the optionalvaluesconstraint.OverrideChangeDeschooks: those answer "what will deploy do", not "what belongs in config" (e.g. dashboard etag drift is re-promoted toUpdatefor modified-remotely detection). This is engine-agnostic and requires no adapters.edit_mode,queuereset values, name-prefix stripping).FindMatchingRule/MatchesAllowedValue/MatchesBackendDefaultmoved (unchanged) frombundle/direct/bundle_plan.gointo thedresourcespackage so configsync reuses them instead of duplicating the matching logic.resources.ymlupdates:etagadded toignore_remote_changes(inert for deploy —OverrideChangeDescruns after the metadata chain).performance_targetbackend default (PERFORMANCE_OPTIMIZED; the backend returns it for all jobs when unset, not only serverless),tasks[*].timeout_seconds/disableddefaults,email_notifications.no_alert_for_skipped_runsdefaults (job/task/for_each levels; the real backend returns it when notifications aren't configured),single_user_namedefaults for task/job clusters.driver/worker_node_type_flexibilitybackend defaults (computed by some backends for standalone clusters).node_type_idbackend-default entries: the backend cannot default it (creating a cluster without it fails with 400 "Unknown node type id"), and sync's nested filtering must not strip it from remotely added clusters or the synced config becomes undeployable.Two intentional semantic changes, pinned by unit tests: (a) when a config field's remote value filters entirely to defaults, sync now emits
removeinstead ofreplacewith{}; (b)backend_defaultsentries withoutvaluesconstraints strip any remote value during sync when the field is absent from config, matching the deploy plan's backend-default semantics (the old table only stripped one specific value).Why
The hardcoded table constantly lagged behind
resources.yml— every new resource or output-only field had to be added in two places, and gaps leaked server-side values into users' YAML (the dashboard etag being the worst case: a hardcoded stale etag makes config/state/remote permanently inconsistent). With this change, lifecycle metadata added for the deploy plan automatically protects config-remote-sync, including future output-only fields flowing in via codegen.Tests
no_alert_for_skipped_runsform that previously leaked), array-arity preservation, replace→remove semantics, moved dresources helpers,MustParsePattern.acceptance/bundle/config-remote-synctests pass on both engines, locally and on real workspaces (cloud acceptance run);acceptance/bundle/resources/dashboards(including modified-remotely detection) unchanged and passing.