Skip to content

Add ClickhouseBackup Plugin Support#1798

Draft
jitendra1411 wants to merge 1 commit into
Altinity:0.25.4from
jitendra1411:feature/backup
Draft

Add ClickhouseBackup Plugin Support#1798
jitendra1411 wants to merge 1 commit into
Altinity:0.25.4from
jitendra1411:feature/backup

Conversation

@jitendra1411

Copy link
Copy Markdown

Overview

This PR introduces a new Backup plugin architecture to the ClickHouse Operator, enabling automated backup operations via an external plugin service. Issue: #1795


Current PR Includes

1. Operator Enhancements

  • CRDs

    • ClickhouseBackup (CHB) — manage one-time backup requests with S3 configuration, whitelist/blacklist filtering, and method/plugin settings.
  • Controller Updates

    • Operator reconciles CHB/CHSB CRs and communicates with a gRPC backup plugin using JSON-serialized request data.
    • Adds plugin discovery via CR labels/annotations.
    • Implements lifecycle management and status reporting.

2. Go Plugin Skeleton

  • Introduces a gRPC interface for backup operations using BackupRequest and BackupResult messages.
  • Plugin architecture allows future extension for operations like restore, encryption, incremental backups, etc.

Motivation & Benefits

  • Modular Design: Backup behavior is decoupled from the operator — update, replace, or extend the plugin without touching core operator logic.
  • Extensibility: Support for diverse backup implementations (e.g., S3, local disk, incremental, etc.).
  • Production-Ready: Enables defined workflows for both one-off and scheduled backups, ensuring reliable defense against data loss.

TODO

  • Restore Workflow: Implement a corresponding restore service via plugin.
  • Test Cases: Implement the test cases
  • ClickhouseScheduledBackup (CHSB) Implement code for scheduled backup
  • Generic gRPC Client: Implement a reusable gRPC client inside the operator for communication with different plugins.

Testing Notes

  • Validate CHB and CHSB workflows manually using a mock plugin server.
  • Status updates should reflect backup success or failure states.
  • Scheduling logic (cron parsing, suspension, etc.) requires validation.

Signed-off-by: jitendra1411 <jackrh2017@gmail.com>
@manishrawat1992

Copy link
Copy Markdown

Will this get accepted.

@discostur

Copy link
Copy Markdown
Contributor

Would be really cool to see this getting merged

@sunsingerus sunsingerus added hold This issue has been put on hold ongoing discussion Issue is under discussion, no decision made so far labels Dec 3, 2025
discostur added a commit to discostur/clickhouse-operator that referenced this pull request Jun 29, 2026
Introduces operator-managed backup and restore for ClickHouse using
clickhouse-backup, exposed through three new custom resources in the
clickhouse.altinity.com/v1 API group:

- ClickHouseBackup (chb): one-off backup -> Kubernetes Job
- ClickHouseBackupSchedule (chbs): recurring backup -> managed CronJob
- ClickHouseRestore (chr): one-off restore -> Kubernetes Job

The controllers follow the existing ClickHouseKeeper controller-runtime
pattern. clickhouse-backup runs as a sidecar (a documented prerequisite);
the generated jobs trigger it remotely through the system.backup_actions
integration table, so no backup logic is reimplemented in the operator.

Cluster-aware: backs up one replica per shard for Replicated* tables
(AllReplicas opt-in for non-replicated data); on restore it applies the
schema on all replicas and the data on the first replica of each shard,
letting native replication synchronize the rest.

Restore safety follows the conventions of mature DB operators: preflight
validation (target CHI Completed, topology reachable) and an overwrite
guard that refuses a non-empty target unless overwrite=true.

Includes the CRDs, RBAC (incl. batch jobs/cronjobs), regenerated install
bundles and Helm chart, documentation and examples, Go unit tests and a
TestFlows e2e test.

Refs Altinity#1795, Altinity#862. Supersedes the gRPC-plugin approach of Altinity#1798.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Kilian Ries <mail@kilian-ries.de>
discostur added a commit to discostur/clickhouse-operator that referenced this pull request Jun 30, 2026
Introduces operator-managed backup and restore for ClickHouse using
clickhouse-backup, exposed through three new custom resources in the
clickhouse.altinity.com/v1 API group:

- ClickHouseBackup (chb): one-off backup -> Kubernetes Job
- ClickHouseBackupSchedule (chbs): recurring backup -> managed CronJob
- ClickHouseRestore (chr): one-off restore -> Kubernetes Job

The controllers follow the existing ClickHouseKeeper controller-runtime
pattern. clickhouse-backup runs as a sidecar (a documented prerequisite);
the generated jobs trigger it remotely through the system.backup_actions
integration table, so no backup logic is reimplemented in the operator.

Cluster-aware: backs up one replica per shard for Replicated* tables
(AllReplicas opt-in for non-replicated data); on restore it applies the
schema on the first replica per shard via ON CLUSTER (requires the
sidecar's restore_schema_on_cluster) and the data on the first replica,
letting native replication synchronize the remaining replicas.

Restore safety follows the conventions of mature DB operators: preflight
validation (target CHI Completed, topology reachable) and an overwrite
guard that refuses a non-empty target unless overwrite=true.

Also adds: selective (tables/partitions) and incremental
(--diff-from-remote) backups; remote-backup retention (keepLastRemote);
optional post-backup verification; Prometheus metrics on the operator's
existing :9999 endpoint plus Kubernetes Events; and annotation-driven
bootstrap-from-backup for new installations. Compression and encryption
are documented as clickhouse-backup sidecar settings.

Includes the CRDs, RBAC (incl. batch jobs/cronjobs), regenerated install
bundles and Helm chart, documentation and examples, Go unit tests and a
TestFlows e2e test.

Refs Altinity#1795, Altinity#862. Supersedes the gRPC-plugin approach of Altinity#1798.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Kilian Ries <mail@kilian-ries.de>
discostur added a commit to discostur/clickhouse-operator that referenced this pull request Jun 30, 2026
Introduces operator-managed backup and restore for ClickHouse using
clickhouse-backup, exposed through three new custom resources in the
clickhouse.altinity.com/v1 API group:

- ClickHouseBackup (chb): one-off backup -> Kubernetes Job
- ClickHouseBackupSchedule (chbs): recurring backup -> managed CronJob
- ClickHouseRestore (chr): one-off restore -> Kubernetes Job

The controllers follow the existing ClickHouseKeeper controller-runtime
pattern. clickhouse-backup runs as a sidecar (a documented prerequisite);
the generated jobs trigger it remotely through the system.backup_actions
integration table, so no backup logic is reimplemented in the operator.

Cluster-aware: backs up one replica per shard for Replicated* tables
(AllReplicas opt-in for non-replicated data); on restore it applies the
schema on the first replica per shard via ON CLUSTER (requires the
sidecar's restore_schema_on_cluster) and the data on the first replica,
letting native replication synchronize the remaining replicas.

Restore safety follows the conventions of mature DB operators: preflight
validation (target CHI Completed, topology reachable) and an overwrite
guard that refuses a non-empty target unless overwrite=true.

Also adds: selective (tables/partitions) and incremental
(--diff-from-remote) backups; remote-backup retention (keepLastRemote);
optional post-backup verification; Prometheus metrics on the operator's
existing :9999 endpoint plus Kubernetes Events; and annotation-driven
bootstrap-from-backup for new installations. Compression and encryption
are documented as clickhouse-backup sidecar settings.

Includes the CRDs, RBAC (incl. batch jobs/cronjobs), regenerated install
bundles and Helm chart, documentation and examples, Go unit tests and a
TestFlows e2e test.

Refs Altinity#1795, Altinity#862. Supersedes the gRPC-plugin approach of Altinity#1798.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Kilian Ries <mail@kilian-ries.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold This issue has been put on hold ongoing discussion Issue is under discussion, no decision made so far

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants