Skip to content

sandbox: Bound 503 retries and clarify the error#5582

Open
anwell-db wants to merge 6 commits into
mainfrom
anwell/sandbox-503-unavailable
Open

sandbox: Bound 503 retries and clarify the error#5582
anwell-db wants to merge 6 commits into
mainfrom
anwell/sandbox-503-unavailable

Conversation

@anwell-db

Copy link
Copy Markdown

Changes

Sandbox API requests retry a 503 at most 3 times, then report: the Databricks Sandbox feature is not available in your region, or the service is temporarily unavailable.

Retries are capped by count (a per-request counter consumed in the client's ErrorRetriable hook) rather than a deadline, so the final 503 deterministically surfaces as the API error. Other errors keep the SDK's default retry behavior.

Tests

  • Acceptance test for the persistent-503 path; unit tests for the message rewrite and retry budget.

This pull request and its description were written by Isaac.

A 503 from the /api/2.0/lakebox routes means the sandbox service is
not deployed for the workspace's region, but the raw gateway error
gives users no hint of that. Translate it at the API-wrapper level so
every sandbox command reports the actual cause.

Co-authored-by: Isaac
The SDK retries 503s for up to five minutes, so in regions without the
sandbox service every command hung before erroring. Counting attempts in
the ErrorRetriable hook (rather than racing a context deadline) halts the
retry loop with the real APIError, so the translated message is
deterministic. Since a few transient 503s are still retried, the message
hedges between region unavailability and a temporary outage.

Co-authored-by: Isaac
Co-authored-by: Isaac
@github-actions

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/cmd/sandbox/ - needs approval

4 files changed
Suggested: @pietern
Also eligible: @shuochen0311, @akshaysingla-db

/cmd/sandbox/ - needs approval

Files: cmd/sandbox/api.go, cmd/sandbox/api_test.go
Suggested: @pietern
Also eligible: @shuochen0311, @akshaysingla-db

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@anwell-db anwell-db temporarily deployed to test-trigger-is June 12, 2026 19:22 — with GitHub Actions Inactive
Co-authored-by: Isaac
@anwell-db anwell-db temporarily deployed to test-trigger-is June 12, 2026 19:26 — with GitHub Actions Inactive
Co-authored-by: Isaac
@anwell-db anwell-db temporarily deployed to test-trigger-is June 12, 2026 23:13 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant