-
Notifications
You must be signed in to change notification settings - Fork 2
docs: restructure deployment guide #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ma-hartma
wants to merge
23
commits into
main
Choose a base branch
from
deployment-guide-restructure
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
bef6f46
docs: restructure deployment guide
ma-hartma 8f30ec0
fix links
ma-hartma 7ee8f74
move images
ma-hartma ada4e1a
fix link
ma-hartma c141369
fix link
ma-hartma 1fb7d83
sidebar titles
ma-hartma 8a4954c
remove outdated gardener docs and cleanup kubernetes concepts
ma-hartma e715a4f
move gpu workers to deployment guide
ma-hartma 2afc027
add tip about autonomous control plane
ma-hartma bdb6362
control-plane title
ma-hartma 92c3782
improve initial cluster deployment guide
ma-hartma a2164b5
mention other kclm solutions but focus on gardener
ma-hartma 89fbfdd
review
ma-hartma 921f06c
newest ubuntu images and reference
ma-hartma 1f6dbc5
mention gateway api alternative
ma-hartma d3e1688
seperate concepts from deployment
ma-hartma 97bc638
kamaji warning
ma-hartma 0cd8afd
depth
ma-hartma efd95f8
no ansible not easy
ma-hartma dbff18f
wording
ma-hartma 93f04c8
reference concepts
ma-hartma 258acf8
capi development status
ma-hartma f629c66
remove redundant tip
ma-hartma File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| --- | ||
| slug: /deployment-guide | ||
| title: Guide | ||
| sidebar_position: 1 | ||
| --- | ||
|
|
||
| # Deployment Guide | ||
|
|
||
| metal-stack is a Metal-as-a-Service (MaaS) platform that turns on-premises bare-metal servers into an elastic, self-managed cloud infrastructure. At its core, metal-stack consists of two components: | ||
|
|
||
| - **Control Plane** — The central management layer (APIs, databases, scheduling) | ||
| - **Partitions** — The data center infrastructure (servers, leaf switches, BMCs) | ||
|
|
||
| These two components alone provide a fully functional **Bare-Metal as a Service (MaaS)** platform. You can allocate machines, manage networks, configure firewalls, and operate servers via REST/gRPC APIs and the `metalctl` CLI — without any Kubernetes cluster lifecycle management. | ||
|
|
||
| ## Kubernetes Cluster Lifecycle Management | ||
|
|
||
| If you need to provision **Kubernetes clusters** on top of your bare-metal infrastructure, metal-stack integrates with KCLM solutions like [Gardener](../../05-Concepts/04-Kubernetes/01-gardener.md) or [Cluster-API](../../05-Concepts/04-Kubernetes/02-cluster-api.md). These are **optional layers** built on top of the MaaS foundation — not prerequisites. | ||
|
|
||
| :::tip | ||
| You can use metal-stack as a pure MaaS platform without any KCLM integration. The KCLM layer is only needed if you want to automate Kubernetes cluster provisioning on your bare-metal infrastructure. | ||
| ::: | ||
|
|
||
| ## Deployment Approaches | ||
|
|
||
| We are bootstrapping the [metal control plane](../../05-Concepts/01-architecture.mdx#metal-control-plane) as well as our [partitions](../../05-Concepts/01-architecture.mdx#partitions) with [Ansible](https://www.ansible.com/) through CI. | ||
|
|
||
| We recommend using the same Ansible roles that we use to deploy metal-stack in our own environments. They are available in the [metal-roles](https://github.com/metal-stack/metal-roles) repository. | ||
|
|
||
| To simplify dependency management, we provide a dedicated [deployment base image](https://github.com/metal-stack/metal-deployment-base/pkgs/container/metal-deployment-base) on GitHub Container Registry. This Docker image bundles all required tools and libraries, keeping your deployment environment lean and easy to maintain. | ||
|
|
||
| The remainder of this guide assumes you are using our Ansible roles and the deployment base image. Make sure you have [Docker](https://docs.docker.com/get-started/get-docker/) installed. Every deployment environment is unique, so this guide describes how we set up metal-stack in our own infrastructure. You will need to adapt the steps to match your specific requirements. | ||
|
|
||
| :::warning | ||
| Probably you need to learn writing Ansible playbooks if you want to be able to deploy the metal-stack as presented in this documentation. However, even when starting without any knowledge about Ansible it should be possible to follow these docs. In case you need further explanations regarding Ansible please refer to [docs.ansible.com](https://docs.ansible.com/). | ||
| ::: | ||
|
|
||
| :::info | ||
| If you do not want to use Ansible for deployment, you need to come up with a deployment mechanism by yourself. You can still learn from our [metal-roles](https://github.com/metal-stack/metal-roles) repository and [Helm charts](https://github.com/metal-stack/helm-charts/) — but be aware that the Ansible roles tie everything together (variable management, dependency ordering, environment-specific configurations), so building an equivalent from scratch will require significant effort. | ||
| ::: | ||
|
|
||
| :::tip | ||
| You can use the [mini-lab](https://github.com/metal-stack/mini-lab) as a template project for your own deployment. It uses the same approach as described in this document. | ||
| ::: | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| --- | ||
| slug: /deployment/initial-cluster | ||
| title: Initial Cluster | ||
| sidebar_position: 2 | ||
| --- | ||
|
|
||
| # Initial Cluster | ||
|
|
||
| An initial Kubernetes cluster is always required for metal-stack deployments using this guide, as the [metal control plane](../../05-Concepts/01-architecture.mdx#metal-control-plane) is deployed on Kubernetes. | ||
|
|
||
| The initial cluster(s) serve as the bootstrap infrastructure for the **metal-stack control plane**. | ||
|
|
||
| The number and placement of initial clusters depend on whether you use a KCLM solution, as well as your availability and autonomy requirements. | ||
|
|
||
| If you only need **Bare-Metal as a Service** (allocating machines, managing networks, configuring firewalls via API) without KCLM, you need at least **one cluster** for the [Control Plane](./03_control-plane.mdx). | ||
|
|
||
| ## KCLM Solutions | ||
|
|
||
| metal-stack supports three Kubernetes Cluster Lifecycle Management solutions. Each has different maturity levels and capabilities. | ||
| See the [Kubernetes Concepts Section](../../05-Concepts/04-Kubernetes/01-gardener.md) for a detailed view on the solutions. | ||
|
|
||
| ### Gardener (Recommended) | ||
|
|
||
| [Gardener](../../05-Concepts/04-Kubernetes/01-gardener.md) is the **recommended** path for Kubernetes cluster lifecycle management. It is battle-tested in production for over seven years at financial-sector customers and bundles more day-2 capabilities natively (DNS, backup, audit). Gardener manages entire clusters as Kubernetes-native resources with a strong separation between platform operators and end-users. | ||
|
|
||
| :::tip | ||
| Gardener is the recommended solution for production environments. See the [Gardener concept doc](../../05-Concepts/04-Kubernetes/01-gardener.md) for terminology and architecture details. | ||
| ::: | ||
|
|
||
| ### Cluster-API | ||
|
|
||
| [Cluster-API](../../05-Concepts/04-Kubernetes/02-cluster-api.md) is a CNCF project maintained by a Kubernetes SIG that provides declarative cluster management through a management cluster. The metal-stack provider (CAPMS) is **under development** and not yet production-ready. | ||
|
|
||
| :::warning | ||
| Cluster-API with metal-stack is in development and not advised for production use. Please use Gardener for production workloads. | ||
| ::: | ||
|
|
||
| #### Kamaji | ||
|
|
||
| [Kamaji](../../05-Concepts/04-Kubernetes/02-cluster-api.md#kamaji) allows a similar control plane hosting model as Gardener, where the control plane runs on dedicated infrastructure separate from worker nodes. | ||
| Kamaji therefore uses ClusterAPI to support different infrastructure- and control-plane-providers. | ||
| However, Kamaji integrations with metal-stack **have not been evaluated in production-grade scenarios** by metal-stack. | ||
|
|
||
| :::warning | ||
| Kamaji with metal-stack is in development and not advised for production use. Please use Gardener for production workloads. | ||
| ::: | ||
|
|
||
| ## Deployment Options | ||
|
|
||
| There are three supported approaches for hosting the initial cluster: | ||
|
|
||
| ### Option 1: Shared Initial Cluster | ||
|
|
||
| It is possible to use a **single initial cluster** for both metal-stack and the KCLM solution. This approach is technically feasible but **not recommended** for production environments. Sharing a single cluster mixes platform infrastructure with lifecycle management, which can complicate operational boundaries and failure isolation. | ||
|
|
||
| ### Option 2: Dedicated Clusters | ||
|
|
||
| We recommend using **dedicated (initial) clusters** for metal-stack and the KCLM solution — one cluster for the metal-stack control plane and a separate cluster for the KCLM. | ||
|
|
||
| This approach provides clearer operational boundaries, better isolation and simplified failure boundaries. | ||
|
|
||
| ### Option 3: Autonomous Control Plane | ||
|
|
||
| For self-hosted deployments, metal-stack can be set up with an [Autonomous Control Plane](/community/MEP-18-autonomous-control-plane) cluster. This approach is the best choice for organizations that require full digital sovereignty and autonomy over their entire infrastructure stack. | ||
|
|
||
| The autonomous control-plane cluster serves as a minimal control plane whose sole purpose is to host the production control plane cluster (the "Matryoshka principle"). | ||
| This brings several advantages like failure isolation, separate operational responsibility, minimal resource requirements and full control and ownership. | ||
|
|
||
| The only requirement from metal-stack is that your partitions can establish network connections to the metal control plane. | ||
|
|
||
| ## Suggestions for the Initial Cluster | ||
|
|
||
| ### For Options 1 & 2: Cloud-Hosted Clusters | ||
|
|
||
| For the shared and dedicated cluster approaches, the initial cluster can be hosted anywhere — a hyperscaler, metalstack.cloud, or any other managed Kubernetes provider. Some common options: | ||
|
|
||
| - **metalstack.cloud** — A Kubernetes cluster can be created via [UI](https://metalstack.cloud/de/documentation/UserManual#creating-a-cluster), CLI, or Terraform. | ||
| - **GCP/GKE** — A GCP account is required. The Ansible [gcp-auth role](https://github.com/metal-stack/ansible-common/tree/master/roles/gcp-auth) can be used for authentication, and the [gcp-create role](https://github.com/metal-stack/ansible-common/tree/master/roles/gcp-create) for creating a GKE cluster. | ||
| - Suggested defaults: `gcp_machine_type`: e2-standard-8, `gcp_autoscaling_min_nodes`: 1, `gcp_autoscaling_max_nodes`: 3 | ||
|
ma-hartma marked this conversation as resolved.
|
||
|
|
||
| :::tip | ||
| For metal-stack it does not matter where your control plane Kubernetes cluster is located. You can of course use a cluster managed by a hyperscaler. This has the advantage of not having to setup Kubernetes by yourself and could even become beneficial in terms of fail-safe operation. If you are interested, you can find a reasoning behind this deployment decision [here](../../05-Concepts/01-architecture.mdx#target-deployment-platforms). | ||
| ::: | ||
|
|
||
| ### For Option 3: Autonomous Control Plane with k3s | ||
|
|
||
| For the autonomous control plane approach, [MEP-18](/community/MEP-18-autonomous-control-plane) proposes using [k3s](https://k3s.io/) as the initial cluster. This is because KCLM solutions are not yet able to create an initial cluster themselves (though this may change with implementations like [GEP-28](https://github.com/gardener/gardener/blob/master/docs/proposals/28-autonomous-shoot-clusters.md) for Gardener). | ||
|
|
||
| The k3s nodes can be either bare metal machines or virtual machines. For a minimal setup, a single node with 8–16 cores, 64GB RAM, and two NVMe drives of 1TB is a good starting point. For high availability, a clustered k3s configuration across multiple nodes is recommended, with ETCD replication and backup-restore mechanisms configured for metal-stack and KCLM components. | ||
|
|
||
| See the [Autonomous Control Plane](/community/MEP-18-autonomous-control-plane) proposal for detailed architecture, failure scenarios, and implementation guidance. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something here? What can be re-used from the metal-roles repo when not using ansible?
For example the metal-control-plane helm chart is in https://github.com/metal-stack/helm-charts/ not metal-roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it more explicit, please re-check!