feat: add POST /v1/databases/{id}/upgrade endpoint#418
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 2 medium |
🟢 Metrics 49 complexity · 13 duplication
Metric Results Complexity 49 Duplication 13
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
|
@coderabbitai pls review this PR |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
# Get PR details and file list
gh pr view 418 --repo pgEdge/control-plane --json files,additions,deletions,changedFiles 2>/dev/null || echo "Could not fetch PR details"
# Get the list of changed files
gh pr diff 418 --repo pgEdge/control-plane --name-only 2>/dev/null || echo "Could not fetch file list"Length of output: 5025 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Get full diff
gh pr diff 418 --repo pgEdge/control-plane 2>/dev/null | head -3000Length of output: 122777 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Fetch the key non-generated source files
echo "=== server/internal/database/service.go ==="
cat server/internal/database/service.go | grep -A 100 "ApplyUpgrade"
echo "=== server/internal/database/orchestrator.go ==="
cat server/internal/database/orchestrator.go
echo "=== server/internal/orchestrator/swarm/orchestrator.go ==="
cat server/internal/orchestrator/swarm/orchestrator.go | grep -A 80 "FindUpgrade"
echo "=== server/internal/orchestrator/swarm/images.go ==="
cat server/internal/orchestrator/swarm/images.go
echo "=== server/internal/orchestrator/systemd/orchestrator.go ==="
cat server/internal/orchestrator/systemd/orchestrator.go | tail -30
echo "=== server/internal/task/task.go ==="
cat server/internal/task/task.go | grep -B2 -A5 "TypeUpgrade\|TypeUpdate"
echo "=== server/internal/workflows/service.go ==="
cat server/internal/workflows/service.go | grep -A 40 "UpgradeDatabase"
echo "=== server/internal/api/apiv1/post_init_handlers.go ==="
cat server/internal/api/apiv1/post_init_handlers.go | grep -A 60 "ApplyUpgrade"
echo "=== server/internal/api/apiv1/pre_init_handlers.go ==="
cat server/internal/api/apiv1/pre_init_handlers.go | tail -20
echo "=== server/internal/api/apiv1/errors.go ==="
cat server/internal/api/apiv1/errors.go | tail -20
echo "=== server/internal/orchestrator/swarm/find_upgrade_test.go ==="
cat server/internal/orchestrator/swarm/find_upgrade_test.goLength of output: 34975 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== errors.go full new section ==="
gh pr diff 418 --repo pgEdge/control-plane -- server/internal/api/apiv1/errors.go
echo "=== pre_init_handlers.go ApplyUpgrade stub ==="
gh pr diff 418 --repo pgEdge/control-plane -- server/internal/api/apiv1/pre_init_handlers.go
echo "=== systemd orchestrator FindUpgrade ==="
gh pr diff 418 --repo pgEdge/control-plane -- server/internal/orchestrator/systemd/orchestrator.go
echo "=== database service.go full ApplyUpgrade context (broader) ==="
grep -n "UpdateDatabaseState\|ErrUpgradeNotAvailable\|prevState\|rollback" server/internal/database/service.go || true
grep -n "UpdateDatabaseState\|ErrUpgradeNotAvailable\|prevState\|rollback" server/internal/api/apiv1/post_init_handlers.go || trueLength of output: 1181 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== errors.go (full) ==="
cat server/internal/api/apiv1/errors.go
echo "=== pre_init_handlers.go ApplyUpgrade grep ==="
grep -n "ApplyUpgrade\|ErrUninitialized" server/internal/api/apiv1/pre_init_handlers.go || true
echo "=== systemd orchestrator.go FindUpgrade + AvailableUpgrades ==="
grep -n "FindUpgrade\|AvailableUpgrades\|ErrUpgradeNotAvailable" server/internal/orchestrator/systemd/orchestrator.go || true
echo "=== post_init_handlers.go state rollback lines ==="
grep -n "prevState\|UpdateDatabaseState\|rollback\|restorationErr" server/internal/api/apiv1/post_init_handlers.go || trueLength of output: 7906
SummaryThe feature is well-structured: clean API design, correct Goa type generation, solid swarm-level validation logic, good test coverage for 🔴 High: Spec update is not rolled back on workflow trigger failureFile:
// ApplyUpgrade atomically writes: spec.PostgresVersion=17.10 + state=modifying
db, err := s.dbSvc.ApplyUpgrade(ctx, databaseID, req.Request.Image)
t, err := s.workflowSvc.UpgradeDatabase(ctx, db)
if err != nil {
// ✅ state rolled back: modifying → available
// ❌ spec NOT rolled back: spec still says PostgresVersion=17.10
s.dbSvc.UpdateDatabaseState(ctx, databaseID, database.DatabaseStateModifying, prevState)
return nil, apiErr(err)
}Impact: After this failure path, the database appears Suggested fix options:
🟡 Medium: Pre-flight
|
moizpgedge
left a comment
There was a problem hiding this comment.
I ran this end to end on a dev cluster. The endpoint works: upgrade returned 200, all four containers moved to 17.10 and came back healthy, and the error cases were correct (empty image 400, unknown db 404, fake image 400).
But the upgrade finished in a failed state, twice, with the same error:
failed to update subscription on node n1:
ERROR: cannot assign TransactionIds during recovery (SQLSTATE XX000)
the upgrade reuses the UpdateDatabase workflow, which restarts each node and then reconciles Spock subscriptions. A just restarted node is briefly in recovery (read only), so the subscription write is rejected and the whole upgrade is marked failed, even though every container already upgraded and is healthy. Nothing waits for the node to leave recovery and the step does not retry.
The logs show the trigger clearly: the upgrade does a switchover on each node, and the subscription write then runs against the just promoted primary while it is still finishing recovery.
Result of my two runs:
| author log | my run 1 | my run 2 | |
|---|---|---|---|
| containers on 17.10 | yes | yes | yes |
| final db state | available | failed | failed |
Thanks @moizpgedge! That issue is now resolved. Please check now. |
c11eb50 to
6286ea4
Compare
moizpgedge
left a comment
There was a problem hiding this comment.
Approved.It looks good to me
| // | ||
| // Both conditions are transient and clear within seconds of promotion | ||
| // completing. Callers should retry until this function returns false. | ||
| func IsRecoveryError(err error) bool { |
There was a problem hiding this comment.
What is the use of IsRecoveryError? I could not find any callers in the code or tests
a010c48 to
b75d238
Compare
b75d238 to
12f219a
Compare
|
@moizpgedge Pls reverify this PR and the issue we saw will be addressed as part of PLAT-665 |
Summary
This PR adds POST
/v1/databases/{id}/upgrade, a dedicated endpoint for applying minor-version image upgrades to a database. Validation ensures the target image is a stable manifest entry, belongs to the same PostgreSQL major and Spock major version bucket, and is strictly newer than the database's current version.Changes
apply-upgrademethod withApplyUpgradeRequest/ApplyUpgradeResponseGoa types; regenerated OpenAPI spec and transportlayer
TypeUpgradefor audit-trail distinction fromTypeUpdateFindUpgrade(current, targetImage)method +ErrUpgradeNotAvailablesentinel errorVersions.FindByImagereverse-lookup +FindUpgradevalidation (same PG major, same Spock major, strictly newer, stable
stability only)
FindUpgradereturnsErrUpgradeNotAvailabledirectingusers to the OS package manager
ApplyUpgradevalidates, clears per-node versionoverrides, sets state to
modifying, and commits atomically to etcdUpgradeDatabasecreates aTypeUpgradetask andreuses the existing
UpdateDatabaseworkflowApplyUpgradeinPostInitHandlerswithprevStaterollback if workflow trigger fails; pre-init stub returns
ErrUninitializedTesting
Verification:
create_db_lower_image.jsonChecklist
PLAT-603