Skip to content

sandbox: Bound 503 retries and clarify the error#5578

Closed
anwell-db wants to merge 4 commits into
databricks:mainfrom
anwell-db:sandbox-503-region-unavailable
Closed

sandbox: Bound 503 retries and clarify the error#5578
anwell-db wants to merge 4 commits into
databricks:mainfrom
anwell-db:sandbox-503-region-unavailable

Conversation

@anwell-db

@anwell-db anwell-db commented Jun 12, 2026

Copy link
Copy Markdown

Changes

Sandbox API requests retry a 503 at most 3 times, then report: the Databricks Sandbox feature is not available in your region, or the service is temporarily unavailable.

Retries are capped by count (a per-request counter consumed in the client's ErrorRetriable hook) rather than a deadline, so the final 503 deterministically surfaces as the API error. Other errors keep the SDK's default retry behavior.

Tests

  • Acceptance test for the persistent-503 path; unit tests for the message rewrite and retry budget.

This pull request and its description were written by Isaac.

A 503 from the /api/2.0/lakebox routes means the sandbox service is
not deployed for the workspace's region, but the raw gateway error
gives users no hint of that. Translate it at the API-wrapper level so
every sandbox command reports the actual cause.

Co-authored-by: Isaac
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/cmd/sandbox/ - needs approval

4 files changed
Suggested: @pietern
Also eligible: @shuochen0311, @akshaysingla-db

/cmd/sandbox/ - needs approval

Files: cmd/sandbox/api.go, cmd/sandbox/api_test.go
Suggested: @pietern
Also eligible: @shuochen0311, @akshaysingla-db

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

The SDK retries 503s for up to five minutes, so in regions without the
sandbox service every command hung before erroring. Counting attempts in
the ErrorRetriable hook (rather than racing a context deadline) halts the
retry loop with the real APIError, so the translated message is
deterministic. Since a few transient 503s are still retried, the message
hedges between region unavailability and a temporary outage.

Co-authored-by: Isaac
Co-authored-by: Isaac
@anwell-db anwell-db changed the title sandbox: Show a clear error when Sandboxes is unavailable in the region sandbox: Bound 503 retries and clarify the error Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

An authorized user can trigger integration tests manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 5578
  • Commit SHA: b30a09daad24aaea931008f97c5590a37d2521af

Checks will be approved automatically on success.

@anwell-db

Copy link
Copy Markdown
Author

Replaced by #5582 (same change, in-repo branch).

@anwell-db anwell-db closed this Jun 12, 2026
@anwell-db anwell-db deleted the sandbox-503-region-unavailable branch June 12, 2026 19:21
@anwell-db anwell-db temporarily deployed to test-trigger-is June 12, 2026 19:22 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant