CORS-4508: Add e2e-gcd installer CI job for Google Cloud Dedicated#81238
CORS-4508: Add e2e-gcd installer CI job for Google Cloud Dedicated#81238rochacbruno wants to merge 11 commits into
Conversation
Add step-registry components and CI job for testing OpenShift installations on Google Cloud Dedicated using Workload Identity Federation. New step-registry components: - ipi-conf-gcd-wif-auth ref: sets up WIF auth via AWS identity chain - ipi-conf-gcd chain: GCP config with WIF auth step - ipi-gcd-pre chain: full GCD pre-install chain - openshift-e2e-gcd workflow: end-to-end test workflow for GCD The auth flow uses AWS identity chaining: ci-ops injects an AWS config file, the step uses the hub profile (trusted by GCD WIF pool), and the WIF credential config stored as gce.json in the cluster profile secret enables the Google SDK to exchange AWS credentials for GCD tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@rochacbruno: This pull request references CORS-4508 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds GCD CI step-registry entries for WIF authentication validation, chained pre-configuration, an ChangesGCD E2E Workflow with WIF Auth
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh`:
- Around line 21-23: The auth set up in gcloud auth login only affects the
current step container, so the next step will still be unauthenticated. Update
the wif-auth command flow to persist the login state needed by the downstream
ipi-conf-gcp consumer using ${SHARED_DIR} rather than relying on the transient
gcloud auth login session; use the existing wif-auth command script and the
shared-step handoff pattern to make the credentials available across steps
without logging sensitive values.
- Around line 5-6: The WIF auth script is hardcoding AWS_CONFIG_FILE and
AWS_PROFILE, which overrides the configured values and ignores future overrides.
Update the wif-auth command script to honor the existing environment variables
instead of re-exporting fixed values, and keep the configuration sourced from
the wif-auth ref and installer wiring. Use the identifiers wif-auth-commands.sh,
AWS_CONFIG_FILE, and AWS_PROFILE to locate the assignments and remove or gate
them so externally provided values win.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a7eeddba-3c5d-4476-9a63-188c19373fef
⛔ Files ignored due to path filters (1)
ci-operator/jobs/openshift/installer/openshift-installer-main-presubmits.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (14)
ci-operator/config/openshift/installer/openshift-installer-main.yamlci-operator/step-registry/ipi/conf/gcd/OWNERSci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.jsonci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yamlci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERSci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.shci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.jsonci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.yamlci-operator/step-registry/ipi/gcd/pre/OWNERSci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.jsonci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.yamlci-operator/step-registry/openshift/e2e/gcd/OWNERSci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.jsonci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.yaml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use ${VAR:-default} pattern so env vars from ref/config are honored
- Remove gcloud auth login since it doesn't persist across steps;
each step authenticates independently via GOOGLE_CLOUD_KEYFILE_JSON
pointing to gce.json from the cluster profile mount
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GCD only supports private DNS zones, so the cluster must be published internally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename e2e-gcd to e2e-gcd-techpreview, add FEATURE_SET: TechPreviewNoUpgrade - Switch from build13 (GCP) to build07 (AWS) for STS role assumption - Add PUBLISH: Internal (GCD only supports private DNS zones) - Remove AWS_CONFIG_FILE/AWS_PROFILE env overrides (injected by ci-operator) - Simplify WIF auth step to validate credential config and declare GOOGLE_CLOUD_UNIVERSE_DOMAIN env var - Add barbacbd to OWNERS files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The GCD cluster profile uses a WIF credential config (type external_account) instead of a service account key. Update shared GCP scripts to detect the credential type and handle both: - ipi-conf-gcp-zones: use gcloud auth login --cred-file for WIF, read project ID from cluster profile instead of gce.json - ipi-install-install: set GOOGLE_APPLICATION_CREDENTIALS for WIF - ipi-deprovision-deprovision: same as install - gather-gcp-console: use gcloud auth login --cred-file for WIF Also restore AWS_PROFILE=hub in the CI config - GCD WIF trusts only the hub account, not the default target account. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rochacbruno The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh`:
- Around line 13-17: The credential-type check in the wif-auth command script
reads .type from CRED_FILE with jq, but malformed gce.json causes an uncaught
parser failure under set -e. Update the validation logic around the CRED_TYPE
assignment in the script so jq parsing is handled explicitly first, and emit a
clear step failure message when the credential file is invalid JSON before
checking for external_account.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4d137b13-138e-4876-867b-768bf91e31f6
⛔ Files ignored due to path filters (1)
ci-operator/jobs/openshift/installer/openshift-installer-main-presubmits.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (13)
ci-operator/config/openshift/installer/openshift-installer-main.yamlci-operator/step-registry/ipi/conf/gcd/OWNERSci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.jsonci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yamlci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERSci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.shci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.jsonci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.yamlci-operator/step-registry/ipi/gcd/OWNERSci-operator/step-registry/ipi/gcd/pre/OWNERSci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.jsonci-operator/step-registry/openshift/e2e/gcd/OWNERSci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.json
✅ Files skipped from review due to trivial changes (8)
- ci-operator/step-registry/ipi/conf/gcd/OWNERS
- ci-operator/step-registry/ipi/gcd/pre/OWNERS
- ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.json
- ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.json
- ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.json
- ci-operator/step-registry/ipi/gcd/OWNERS
- ci-operator/step-registry/openshift/e2e/gcd/OWNERS
- ci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERS
🚧 Files skipped from review as they are similar to previous changes (2)
- ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.json
- ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yaml
Wrap jq parsing in an explicit check so invalid JSON produces a clear error message instead of a raw parser failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set BASE_DOMAIN to ci.gcd.devcluster.openshift.com for internal DNS records. The public_hosted_zone file must also be added to the vault secret with the same value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| - ref: ipi-conf-gcp | ||
| - ref: ipi-conf-gcp-zones |
There was a problem hiding this comment.
| - ref: ipi-conf-gcp | |
| - ref: ipi-conf-gcp-zones | |
| - ref: ipi-conf-gcp | |
| - ref: ipi-conf-gcp-zones | |
| - ref: ipi-conf-gcp-osimage # <--- I think this should work? |
We don't have any published OS images in GCD, right? Similar to AWS EUSC, we may define a custom OS image.
On the job config, we can define environment variables (see reference):
COMPUTE_OSIMAGE # OS image for compute nodes
CONTROL_PLANE_OSIMAGE # OS image for control plane nodes
DEFAULT_MACHINE_OSIMAGE # default for bothThere was a problem hiding this comment.
Good point on the OS image. I'll add ipi-conf-gcp-osimage to the chain. Do you know which OS image name we should use for GCD? The brief mentions rhcos10 but we deferred that.
There was a problem hiding this comment.
I guess we, ourselves, need to upload/publish a custom OS image in the GCD project? Not sure how it's done on the GCP side, but for AWS, we have to build an AMI from a VM image (like this).
There was a problem hiding this comment.
I am trying with the rhcos10 that is available on GCD
| COMPUTE_NODE_REPLICAS: "2" | ||
| FEATURE_SET: TechPreviewNoUpgrade | ||
| GOOGLE_CLOUD_UNIVERSE_DOMAIN: apis-berlin-build0.goog | ||
| PUBLISH: Internal |
There was a problem hiding this comment.
For private cluster, we also need to setup an existing network, bastion VM and proxy because CI pods can't reach internal API endpoint. We can reference the gcp-private pre-submit job 👇
It may look something like commit tthvo@da9ed32, allowing overrding OS image and instance type. WDYT?
release/ci-operator/config/openshift/installer/openshift-installer-main.yaml
Lines 1444 to 1462 in 58781bc
There was a problem hiding this comment.
Thanks. I am making changes based on your reference commit
There was a problem hiding this comment.
Can you push that as a suggestion or co-author a commit?
Ohh yes, it should be available in my fork https://github.com/tthvo/release at branch gcd. I committed on the top of yours so you can cherry-pick without issue.
There was a problem hiding this comment.
Applied your reference commit - added VPC/bastion/proxy to ipi-gcd-pre chain, teardown in post, WIF auth in all provision/deprovision scripts, and configurable bastion image/machine type. Also restored AWS_PROFILE=hub.
| OS_IMAGE_STREAM: rhel-10 | ||
| workflow: openshift-e2e-aws | ||
| - always_run: false | ||
| as: e2e-gcd-techpreview |
There was a problem hiding this comment.
IIRC, once we finalize the job config, we need to copy it to 5.0, 5.1 and 4.23 release branch too, rite? For example, gcp-private has a 5.0 variant:
There was a problem hiding this comment.
TBH, it's never clear to me which branches we need to copy to. In general I think config on an existing branch will be carried over to the next branch on branching day. But again, not sure, so it might be good to lock this down.
There was a problem hiding this comment.
In general I think config on an existing branch will be carried over to the next branch on branching day
☝️ Right, I notice the config on -main.yaml is the one that will be copied over to next branch.
There was a problem hiding this comment.
Right, the config on main will be copied to the next branch on branching day. We can add it to existing release branches once the job is validated.
|
/pj-rehearse pull-ci-openshift-installer-main-e2e-gcd-techpreview |
|
@patrickdillon: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
For private GCD clusters, add VPC provisioning, bastion host, and proxy to the pre-install chain, with teardown in post steps. Based on tthvo's reference commit. - Add WIF external_account auth to gcp-provision-vpc, gcp-deprovision-vpc, gcp-provision-bastionhost, gcp-deprovision-bastionhost scripts - Make bastion machine type and image configurable via env vars (BASTION_MACHINE_TYPE, BASTION_IMAGE_NAME, BASTION_IMAGE_PROJECT) - Update ipi-gcd-pre chain with VPC, bastion, and proxy steps - Update openshift-e2e-gcd workflow with bastion and VPC teardown - Restore AWS_PROFILE: hub in CI config (required for GCD WIF) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ipi-conf-gcp-osimage step to the GCD chain and set DEFAULT_MACHINE_OSIMAGE to rhcos10, which is the RHCOS image published to GCD. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/pj-rehearse pull-ci-openshift-installer-main-e2e-gcd-techpreview |
|
@rochacbruno: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
| CRED_TYPE=$(jq -r .type "${GCP_SHARED_CREDENTIALS_FILE}") | ||
| if [[ "${CRED_TYPE}" == "external_account" ]]; then | ||
| GOOGLE_PROJECT_ID="$(< ${CLUSTER_PROFILE_DIR}/openshift_gcp_project)" | ||
| gcloud auth login --cred-file="${GCP_SHARED_CREDENTIALS_FILE}" --quiet |
There was a problem hiding this comment.
Hmm, gcloud is contacting AWS EC2 IMDS endpoint, but it shouldn't be 🤔
(gcloud.config.set) There was a problem refreshing your current auth tokens:
HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with
url: /latest/meta-data/placement/availability-zone (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7f12db7b5198>: Failed to establish a new connection: [Errno 111] Connection refused',))
I'm not sure what info gcloud needs that it must query IMDS metadata. Maybe, we can try providing as many details as possible:
- Providing AWS region:
export AWS_REGION="$LEASED_RESOURCE" # Getting from Boskos lease- Assume the IAM role via AWS CLI and directly provide the env vars:
export AWS_PROFILE=hub # Already set in job
eval $(aws configure export-credentials --format env)There was a problem hiding this comment.
I think we are missing DPTP adding the home_role_arn to build07's aws-sts-cluster-config. Once STS is activated, ci-operator will inject the AWS credentials, and the Google SDK will be able to use them instead of trying to hit the EC2 metadata endpoint.
There was a problem hiding this comment.
Ooh, cool! Thanks for the details! I guess we'll wait a bit then 😁
barbacbd
left a comment
There was a problem hiding this comment.
@rochacbruno I think we are missing documentation for WIF Setup
Issue: While the PR mentions cluster-secrets-gcd must be populated, there's no documentation about:
- Exact format of the gce.json credential file
- How to generate WIF credentials for GCD
- What the hub role ARN is and how to obtain it
Recommendation: Add a README covering:
- Prerequisites for running GCD jobs
- WIF identity chain setup process
- Example credential format
- Troubleshooting guide for auth failures
It is possible that we do not need this, but the process should be documented for the installer team or others.
| if [ -f "${SHARED_DIR}/gcp_min_permissions.json" ]; then | ||
| if [[ "$(jq -r .type ${GOOGLE_CLOUD_KEYFILE_JSON} 2>/dev/null)" == "external_account" ]]; then | ||
| echo "$(date -u --rfc-3339=seconds) - Using WIF external_account credentials for GCD..." | ||
| export GOOGLE_APPLICATION_CREDENTIALS="${GOOGLE_CLOUD_KEYFILE_JSON}" |
There was a problem hiding this comment.
# Validate universe domain is set for sovereign cloud
if [[ -z "${GOOGLE_CLOUD_UNIVERSE_DOMAIN:-}" ]]; then
echo "WARNING: GOOGLE_CLOUD_UNIVERSE_DOMAIN not set for WIF credentials"
fi
The install script sets GOOGLE_APPLICATION_CREDENTIALS for WIF auth but doesn't validate GOOGLE_CLOUD_UNIVERSE_DOMAIN. While it's set in the job config, adding validation here would catch configuration errors earlier.
There was a problem hiding this comment.
GOOGLE_CLOUD_UNIVERSE_DOMAIN validation is not needed here - the universe domain will soon be discovered automatically by the SDK based on the project ID format (eu0: prefix). Patrick mentioned this recently as part of the installer's sovereign cloud detection. Adding validation for something that's going away would just be churn.
| if ! gcloud auth list | grep -E "\*\s+${sa_email}" | ||
| then | ||
| gcloud auth activate-service-account --key-file="${GCP_SHARED_CREDENTIALS_FILE}" | ||
| CRED_TYPE=$(jq -r .type "${GCP_SHARED_CREDENTIALS_FILE}" 2>/dev/null) |
There was a problem hiding this comment.
I think there is some common gcloud auth logic. Can we make a common file ?
There was a problem hiding this comment.
Agreed - the WIF external_account detection pattern is repeated across several scripts now. We'll extract it into a shared helper as a follow-up once we confirm the current implementation works end-to-end.
Document the WIF authentication flow, cluster profile secret format, credential config generation, environment variables, and troubleshooting guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@barbacbd Added a README at |
|
@rochacbruno: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[REHEARSALNOTIFIER]
A total of 34900 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
Summary
ipi-conf-gcd-wif-authstep-registry ref that sets up WIF authentication via AWS identity chainipi-conf-gcdchain,ipi-gcd-prechain, andopenshift-e2e-gcdworkflowe2e-gcdtest toopenshift/installerCI config, pinned tobuild13, optional, never auto-runAuth flow
ci-ops injects an AWS config file at
/var/run/secrets/aws/config/config. The step setsAWS_PROFILE=hubto use the hub account (320297955214) that GCD's WIF pool trusts. The WIF credential config is stored asgce.jsonin the cluster profile secret so the existing install/deprovision steps pick it up automatically viaGOOGLE_CLOUD_KEYFILE_JSON.Companion PRs
ClusterProfileGCD(merged)gcdcluster profile, boskos, secrets (merged)Test plan
cluster-secrets-gcdpopulated with WIF credential config asgce.json/test e2e-gcdon an openshift/installer PRGenerated with Claude Code
Summary by CodeRabbit
This PR updates openshift/installer CI configuration in
ci-operator/config/openshift/installer/openshift-installer-main.yamlto add a new optional GCD E2E installer job namede2e-gcd-techpreview. The job runs not automatically (always_run: false,optional: true), targets clusterbuild07, uses thegcdcluster profile, and passes TechPreview settings plus GCD-specific env (COMPUTE_NODE_REPLICAS=2,FEATURE_SET=TechPreviewNoUpgrade,GOOGLE_CLOUD_UNIVERSE_DOMAIN=apis-berlin-build0.goog,PUBLISH=Internal). It executes the newopenshift-e2e-gcdworkflow.To support that workflow, the PR adds new step-registry chains and components:
ipi-conf-gcd: builds the GCDinstall-config.yamlby composingipi-conf,ipi-conf-telemetry,ipi-conf-gcd-wif-auth, thenipi-conf-gcp+ipi-conf-gcp-zones, andipi-install-monitoringpvc.ipi-conf-gcd-wif-auth: validates the presence and format of the WIF credential at${CLUSTER_PROFILE_DIR}/gce.json, ensuring it is a{"type": "external_account", ...}credential (and fails fast if missing/incorrect). It also sets a default universe domain.ipi-gcd-pre: chainsipi-conf-gcd,rhcos-conf-osstream, andipi-install.It also introduces a new E2E workflow
ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.yaml(openshift-e2e-gcd) that runs:ipi-gcd-preopenshift-e2e-testgather-core-dumpandipi-gcp-post(best-effort post steps enabled)Finally, it wires ownership metadata and approval routing for the new GCD step-registry/workflow components (adding/updating approvers in the relevant
OWNERSand*.metadata.jsonfiles).