Skip to content

OSAC-1323: Update cluster-fulfillment.md to match current codebase#32

Open
tzvatot wants to merge 5 commits into
osac-project:mainfrom
tzvatot:feat/OSAC-1323
Open

OSAC-1323: Update cluster-fulfillment.md to match current codebase#32
tzvatot wants to merge 5 commits into
osac-project:mainfrom
tzvatot:feat/OSAC-1323

Conversation

@tzvatot

@tzvatot tzvatot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

OSAC-1323: Update cluster-fulfillment.md to match current codebase

Jira: https://redhat.atlassian.net/browse/OSAC-1323

Summary

The cluster fulfillment architecture doc had significant gaps compared to the current codebase: missing documentation for catalog items and the public API, incorrect status condition names, outdated component names (cloudkit -> osac), and wrong field/path references. This PR brings the document in line with the current state of fulfillment-service, osac-operator, and osac-aap.

Changes

New content:

  • Document cluster catalog items (field definitions, publication controls, tenant scoping, relationship to templates)
  • Document public API operations including GetKubeconfig and GetPassword with REST endpoints

Corrections:

  • Fix status conditions from ClusterAvailable/NodesReady/InfrastructureReady/TemplateApplied to actual types (PROGRESSING/READY/FAILED/DEGRADED for API, Accepted/Progressing/ControlPlaneAvailable/Available for CRD)
  • Fix proto paths to include osac/ directory level
  • Replace cloudkit-controller with osac-operator, cloudkit-aap with osac-aap
  • Fix host_class to host_type, clarify node_sets uses map keys not a name field
  • Add Signal RPC to private API operations
  • Fix playbook names and env var references
  • Update ClusterOrder CRD spec/status fields to match current code (including pullSecret, sshPublicKey, releaseImage, network, jobs history, desiredConfigVersion)

Testing

Documentation-only change. All referenced file paths verified to exist in the current codebase.

Acceptance Criteria

  • Document catalog items (field definitions, publication, tenant scoping)
  • Document public API including GetKubeconfig and GetPassword
  • Fix status condition names to match codebase
  • Fix proto paths to include osac/ directory level
  • Replace cloudkit-controller with osac-operator
  • Replace cloudkit-aap with osac-aap
  • Fix host_class to host_type
  • Fix node_sets name field (is map key, not a field)
  • Add Signal RPC to API operations
  • Fix playbook name references

Summary by CodeRabbit

  • Documentation
    • Updated cluster API specifications with new operations and request/status fields, including kubeconfig and password endpoints.
    • Revised ClusterOrder resource structure with expanded configuration options (pull secrets, SSH keys, release images, networking).
    • Added Cluster Catalog Items documentation defining catalog structure, field definitions, and API surfaces.
    • Updated fulfillment service status conditions and cluster order status conditions.

tzvatot added 5 commits June 8, 2026 19:56
- Replace cloudkit-controller references with osac-operator
- Fix proto paths to include osac/ directory level
- Fix host_class to host_type
- Fix node_sets: name is a map key, not a field
- Fix playbook name from playbook_cloudkit_ to playbook_osac_
- Fix CLOUDKIT_TEMPLATE_COLLECTIONS to osac_template_collections
- Fix cloudkit.service.metallb_ingress to osac.service.metallb_ingress
- Add Signal RPC to private API operations list

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Elad Tabak <etabak@redhat.com>
Replace incorrect ClusterAvailable/NodesReady/InfrastructureReady/
TemplateApplied with actual condition types from the codebase:
- ClusterOrder CRD: Accepted, Progressing, ControlPlaneAvailable, Available
- Fulfillment Service API: PROGRESSING, READY, FAILED, DEGRADED

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Elad Tabak <etabak@redhat.com>
Add new section documenting cluster catalog items, the primary
user-facing abstraction for ordering clusters. Covers field
definitions, publication controls, tenant scoping, and the
relationship between catalog items and underlying templates.

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Elad Tabak <etabak@redhat.com>
…illment.md

Add public API section documenting GetKubeconfig, GetPassword, and
other tenant-facing operations with their REST endpoints. Fix cluster
status to distinguish state (3 values) from conditions (4 types),
add node_sets to status fields, and mark hub as private API only.

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Elad Tabak <etabak@redhat.com>
…nt.md

Update spec fields to match current code: use JSON tag casing,
document templateParameters as JSON string, nodeRequests with
resourceClass/numberOfNodes, and add pullSecret, sshPublicKey,
releaseImage, and network fields. Add status section documenting
phase, conditions, clusterReference, jobs history, and
desiredConfigVersion.

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Elad Tabak <etabak@redhat.com>
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR updates architecture/cluster-fulfillment.md to document OSAC's current cluster-fulfillment API and orchestration model, replacing outdated cloudkit references. Key changes include revised private/public API operations, updated ClusterOrder CRD specification, new catalog item abstraction documentation, and corrected status conditions and infrastructure integration details.

Changes

Cluster-Fulfillment Documentation Alignment

Layer / File(s) Summary
Private and public cluster API operations and types
architecture/cluster-fulfillment.md
Private/public cluster service API operations now document Create/Get/List/Update/Delete plus gRPC-only Signal, updated cluster request fields using template_parameters and node_sets with host_type/size, and public endpoints for GetKubeconfig/GetPassword with HTTP routes and content types.
ClusterOrder CRD contract and fields
architecture/cluster-fulfillment.md
ClusterOrder CRD specification updated from cloudkit to osac-operator contract with renamed fields (templateID, templateParameters, nodeRequests using resourceClass/numberOfNodes), expanded optional spec (pull secrets, SSH key, release image, networking), and new status structures (jobs, desired config version, updated condition names).
Cluster catalog items and field definitions
architecture/cluster-fulfillment.md
New "Cluster Catalog Items" section documents the catalog item abstraction, including proto-backed field definitions with validation schema, private CRUD vs public read-only API surfaces, and the relationship between catalog items and underlying templates.
Status conditions, workflow, and infrastructure integration
architecture/cluster-fulfillment.md
Status conditions enumeration replaced with explicit ClusterOrder CRD conditions (Accepted/Progressing/ControlPlaneAvailable/Available) and fulfillment-service condition types (PROGRESSING/READY/FAILED/DEGRADED). Workflow documentation updated with OSAC playbook path (osac-aap/playbook_osac_create_hosted_cluster.yml) and osac_template_collections variable. MetalLB ingress role updated to osac.service.metallb_ingress.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Risk Assessment: ⚠️ Low severity, potential medium impact. This is documentation-only with no code changes. However, the documentation describes API contracts and CRD specifications that downstream implementations must follow. Risk factors:

  • Incorrect API specification documentation could lead to implementation errors in services consuming these APIs.
  • Outdated or ambiguous CRD field documentation may cause confusion during controller updates or cluster provisioning.
  • Missing or unclear condition type documentation could result in improper status monitoring and operational observability gaps.

Mitigation: Verify that actual proto files, CRD types in osac-operator, and orchestration playbooks match these documented specifications before merging.

Possibly related PRs

  • osac-project/docs#31: Aligns documented OSAC controller/orchestration flow by replacing EDA webhook-based descriptions with AAP job-template approach.

Poem

🏗️ Architecture in prose, now aligned with code,
API contracts and schemas find their abode,
OSAC's blueprint crystalline and clear,
Where catalog items and conditions appear,
Documentation guides the path to orchestration's cheer. 🎯

🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: updating the cluster-fulfillment documentation to reflect the current codebase. It is specific, concise, and accurately represents the primary objective of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Hardcoded-Secrets ✅ Passed No hardcoded secrets found. File is pure markdown documentation with no code blocks, embedded credentials, API keys, tokens, passwords, or base64-encoded secrets.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time comparisons found in the documentation changes.
No-Injection-Vectors ✅ Passed PR is documentation with one shell script utility; no SQL concat, eval/exec, pickle, yaml.load, or injection patterns found in any modified files.
Container-Privileges ✅ Passed PR modifies only architecture/cluster-fulfillment.md—a markdown documentation file with no container manifests, security contexts, privileged configurations, or infrastructure code.
No-Sensitive-Data-In-Logs ✅ Passed Documentation-only PR. No code logging sensitive data introduced, no code snippets present, and docs don't prescribe logging passwords, tokens, API keys, or PII.
Ai-Attribution ✅ Passed Commit 16c0831 (the primary commit in this PR) uses proper Red Hat attribution with "Assisted-by: Claude Code noreply@anthropic.com" trailer; no Co-Authored-By misuse for AI tools detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hpdempsey

Copy link
Copy Markdown

The reason these docs are out of sync is because many features have been added without appropriate enhancement requests. For a trivial example, there has been no discussion or agreement on the catalog feature carried out with the public OSAC team. This review should be done before document updates, not after.

@tzvatot

tzvatot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

many features have been added without appropriate enhancement requests

The catalog items EP exists (enhancement-proposals#17) and was merged. The doc update simply brings the architecture doc in line with what's already implemented and merged.

@hpdempsey

Copy link
Copy Markdown

I realize that an enhancement proposal was created for the catalog. However, there was no open discussion and review of this as part of the public project before it was merged. Open items such as testing were not resolved, which would certainly be unacceptable to the MOC. The fact that it was merged doesn't justify updating the architecture document that is public without discussion with at least the MOC people who are participating in the public deployment of OSAC.

@tzvatot

tzvatot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I realize that an enhancement proposal was created for the catalog. However, there was no open discussion and review of this as part of the public project before it was merged. Open items such as testing were not resolved, which would certainly be unacceptable to the MOC. The fact that it was merged doesn't justify updating the architecture document that is public without discussion with at least the MOC people who are participating in the public deployment of OSAC.

Thanks for the feedback, Heidi.

The catalog items enhancement proposal (enhancement-proposals#17) was open for over 3 months before being merged. AFAIK we don't hold a dedicated meeting or discussion for every EP - the PR review process is the discussion. That was the window to raise concerns, request changes, or flag gaps in the design.

That said, if there are specific gaps or concerns you'd like to address now, the best path forward would be to either open a new enhancement proposal with the proposed changes, or create a Jira issue so we can discuss the details and plan accordingly. Happy to collaborate on either.

Regarding this PR specifically - it's a doc update to bring the architecture document in line with what's already been implemented and merged. Keeping the docs out of sync doesn't help anyone.

@vladikr

vladikr commented Jun 9, 2026

Copy link
Copy Markdown

Thanks @tzvatot
/lgtm

@hpdempsey

Copy link
Copy Markdown

Thanks for your collaboration. We are working on Jira issues for all of our MOC requirements.

@tzvatot tzvatot marked this pull request as ready for review June 10, 2026 06:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
architecture/cluster-fulfillment.md (1)

160-172: ⚠️ Potential issue | 🟠 Major

Fix stale Ansible playbook/variable references in cluster fulfillment docs (will break provisioning).

In architecture/cluster-fulfillment.md (lines 160-172), the doc references osac-aap/playbook_osac_create_hosted_cluster.yml and the osac_template_collections variable, but this repository does not contain the referenced playbook file and osac_template_collections does not appear in any *.yml/*.yaml.

  • Risk (major): operators following these instructions may execute a non-existent playbook / miss required variable wiring, causing hosted cluster creation to fail and extending downtime.
  • Update the playbook path to the correct location (or link the correct repository) and ensure osac_template_collections is defined/used consistently with the referenced templates.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@architecture/cluster-fulfillment.md` around lines 160 - 172, The doc
references a non-existent playbook and an undefined variable: update the Cluster
Creation Playbook reference and variable usage so operators can actually run
provisioning; specifically replace or correct the playbook path
`osac-aap/playbook_osac_create_hosted_cluster.yml` with the real playbook
location (or add a link to the external repo that contains it) and ensure the
variable `osac_template_collections` is defined and consistently named where
templates are included (the section describing Template Execution and any
dynamic include statements that reference the template collection), or rename
references to match the actual variable used in the codebase so the docs and
playbook variable wiring align.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@architecture/cluster-fulfillment.md`:
- Line 228: The doc currently defines validation_schema but doesn't state
where/when it's enforced; update the cluster-fulfillment.md entry for the
validation_schema field to explicitly state the enforcement point(s) (e.g.,
validated at API ingress via admission webhook, validated again during
controller reconciliation, and/or validated in the CLI before submit), describe
the security boundary (server-side vs client-side), and note any differences in
behavior (errors returned at request time vs reconciliation/failure states) so
integrators know whether client validation is sufficient and what to expect when
constraints are violated.
- Around line 60-61: Add a new "Security" subsection documenting authentication,
authorization, and audit requirements for the credential endpoints
(GetKubeconfig, GetPassword and their HTTP variants GetKubeconfigViaHttp,
GetPasswordViaHttp): specify supported authentication mechanisms (bearer token
and/or mTLS), require strong token validation and session expiry, define an
authorization model (RBAC roles and tenant-scoped checks ensuring only cluster
owners/admins can call these methods), mandate audit logging of all calls
including caller identity, timestamp, cluster ID and outcome, and recommend
additional controls (MFA for UI access, IP allowlists, least-privilege roles,
and short-lived credentials) plus examples of required HTTP headers and error
responses for unauthorized/forbidden access.
- Around line 35-41: The documentation references a non-existent proto file and
an unverified "Signal" RPC in the Private API Operations list; update the
"Private API Operations" section to remove or mark the Signal RPC as unverified
and either (a) point to the correct proto file if it exists elsewhere or (b)
state that Signal is deprecated/unsupported until the proto
(clusters_service.proto) is located and its RPCs confirmed; specifically edit
the "Private API Operations" list (the entry mentioning
`fulfillment-service/proto/private/osac/private/v1/clusters_service.proto` and
the `Signal` RPC) to reflect the verified contract or remove the misleading
claim so clients/servers aren’t misled.
- Around line 52-63: The document references a missing proto
(fulfillment-service/proto/public/osac/public/v1/clusters_service.proto) and
lists RPCs like GetKubeconfig, GetKubeconfigViaHttp, GetPassword,
GetPasswordViaHttp and HTTP mappings
(/api/fulfillment/v1/clusters/{id}/kubeconfig and /password) that can’t be
validated; fix by adding the authoritative clusters_service.proto into the repo
at that path (or update the document to the correct existing proto path), ensure
the proto contains the listed RPCs and options (including any HTTP annotations
and response types), and then update architecture/cluster-fulfillment.md to
exactly match the proto message and HTTP binding names (List, Get, Create,
Update, Delete, optimistic lock field) so the documented secret-bearing
endpoints and authorization mappings are identical to the source.

---

Outside diff comments:
In `@architecture/cluster-fulfillment.md`:
- Around line 160-172: The doc references a non-existent playbook and an
undefined variable: update the Cluster Creation Playbook reference and variable
usage so operators can actually run provisioning; specifically replace or
correct the playbook path `osac-aap/playbook_osac_create_hosted_cluster.yml`
with the real playbook location (or add a link to the external repo that
contains it) and ensure the variable `osac_template_collections` is defined and
consistently named where templates are included (the section describing Template
Execution and any dynamic include statements that reference the template
collection), or rename references to match the actual variable used in the
codebase so the docs and playbook variable wiring align.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 49edf2d5-a9d0-405c-9f0e-48edd7fdc1a6

📥 Commits

Reviewing files that changed from the base of the PR and between c747211 and 16c0831.

📒 Files selected for processing (1)
  • architecture/cluster-fulfillment.md

Comment thread architecture/cluster-fulfillment.md
Comment thread architecture/cluster-fulfillment.md
Comment thread architecture/cluster-fulfillment.md
Comment thread architecture/cluster-fulfillment.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants