Skip to content

CORS-4508: Add e2e-gcd installer CI job for Google Cloud Dedicated#81238

Open
rochacbruno wants to merge 11 commits into
openshift:mainfrom
rochacbruno:CORS-4508/add-gcd-installer-ci-job
Open

CORS-4508: Add e2e-gcd installer CI job for Google Cloud Dedicated#81238
rochacbruno wants to merge 11 commits into
openshift:mainfrom
rochacbruno:CORS-4508/add-gcd-installer-ci-job

Conversation

@rochacbruno

@rochacbruno rochacbruno commented Jun 29, 2026

Copy link
Copy Markdown
Member

Summary

  • Add ipi-conf-gcd-wif-auth step-registry ref that sets up WIF authentication via AWS identity chain
  • Add ipi-conf-gcd chain, ipi-gcd-pre chain, and openshift-e2e-gcd workflow
  • Add e2e-gcd test to openshift/installer CI config, pinned to build13, optional, never auto-run

Auth flow

ci-ops injects an AWS config file at /var/run/secrets/aws/config/config. The step sets AWS_PROFILE=hub to use the hub account (320297955214) that GCD's WIF pool trusts. The WIF credential config is stored as gce.json in the cluster profile secret so the existing install/deprovision steps pick it up automatically via GOOGLE_CLOUD_KEYFILE_JSON.

Companion PRs

Test plan

  • CI checks pass on this PR
  • Vault secret cluster-secrets-gcd populated with WIF credential config as gce.json
  • /test e2e-gcd on an openshift/installer PR

Generated with Claude Code

Summary by CodeRabbit

This PR updates openshift/installer CI configuration in ci-operator/config/openshift/installer/openshift-installer-main.yaml to add a new optional GCD E2E installer job named e2e-gcd-techpreview. The job runs not automatically (always_run: false, optional: true), targets cluster build07, uses the gcd cluster profile, and passes TechPreview settings plus GCD-specific env (COMPUTE_NODE_REPLICAS=2, FEATURE_SET=TechPreviewNoUpgrade, GOOGLE_CLOUD_UNIVERSE_DOMAIN=apis-berlin-build0.goog, PUBLISH=Internal). It executes the new openshift-e2e-gcd workflow.

To support that workflow, the PR adds new step-registry chains and components:

  • ipi-conf-gcd: builds the GCD install-config.yaml by composing ipi-conf, ipi-conf-telemetry, ipi-conf-gcd-wif-auth, then ipi-conf-gcp + ipi-conf-gcp-zones, and ipi-install-monitoringpvc.
  • ipi-conf-gcd-wif-auth: validates the presence and format of the WIF credential at ${CLUSTER_PROFILE_DIR}/gce.json, ensuring it is a {"type": "external_account", ...} credential (and fails fast if missing/incorrect). It also sets a default universe domain.
  • ipi-gcd-pre: chains ipi-conf-gcd, rhcos-conf-osstream, and ipi-install.

It also introduces a new E2E workflow ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.yaml (openshift-e2e-gcd) that runs:

  • pre: ipi-gcd-pre
  • test: openshift-e2e-test
  • post: gather-core-dump and ipi-gcp-post (best-effort post steps enabled)

Finally, it wires ownership metadata and approval routing for the new GCD step-registry/workflow components (adding/updating approvers in the relevant OWNERS and *.metadata.json files).

Add step-registry components and CI job for testing OpenShift
installations on Google Cloud Dedicated using Workload Identity
Federation.

New step-registry components:
- ipi-conf-gcd-wif-auth ref: sets up WIF auth via AWS identity chain
- ipi-conf-gcd chain: GCP config with WIF auth step
- ipi-gcd-pre chain: full GCD pre-install chain
- openshift-e2e-gcd workflow: end-to-end test workflow for GCD

The auth flow uses AWS identity chaining: ci-ops injects an AWS config
file, the step uses the hub profile (trusted by GCD WIF pool), and the
WIF credential config stored as gce.json in the cluster profile secret
enables the Google SDK to exchange AWS credentials for GCD tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 29, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@rochacbruno: This pull request references CORS-4508 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add ipi-conf-gcd-wif-auth step-registry ref that sets up WIF authentication via AWS identity chain
  • Add ipi-conf-gcd chain, ipi-gcd-pre chain, and openshift-e2e-gcd workflow
  • Add e2e-gcd test to openshift/installer CI config, pinned to build13, optional, never auto-run

Auth flow

ci-ops injects an AWS config file at /var/run/secrets/aws/config/config. The step sets AWS_PROFILE=hub to use the hub account (320297955214) that GCD's WIF pool trusts. The WIF credential config is stored as gce.json in the cluster profile secret so the existing install/deprovision steps pick it up automatically via GOOGLE_CLOUD_KEYFILE_JSON.

Companion PRs

Test plan

  • CI checks pass on this PR
  • Vault secret cluster-secrets-gcd populated with WIF credential config as gce.json
  • /test e2e-gcd on an openshift/installer PR

Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds GCD CI step-registry entries for WIF authentication validation, chained pre-configuration, an openshift-e2e-gcd workflow, and an optional installer job using the GCD cluster profile.

Changes

GCD E2E Workflow with WIF Auth

Layer / File(s) Summary
WIF auth step
ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh, ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.yaml, ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.json, ci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERS
Adds the GCD WIF auth validation script, step reference, metadata, and approvers.
GCD config chains
ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yaml, ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.json, ci-operator/step-registry/ipi/conf/gcd/OWNERS, ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.yaml, ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.json, ci-operator/step-registry/ipi/gcd/pre/OWNERS
Adds the GCD config chain and pre-chain that order WIF auth before install-config, OS stream, and install steps.
E2E workflow and job registration
ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.yaml, ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.json, ci-operator/step-registry/openshift/e2e/gcd/OWNERS, ci-operator/config/openshift/installer/openshift-installer-main.yaml, ci-operator/step-registry/ipi/gcd/OWNERS
Adds the openshift-e2e-gcd workflow and registers the optional e2e-gcd-techpreview installer job using the gcd cluster profile.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • openshift/release#80743: Shares the GCD cluster-profile wiring used by the new GCD-targeted installer job and workflow.
🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change by introducing an e2e-gcd installer CI job for Google Cloud Dedicated.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR only changes CI YAML/sh/OWNERS files; no Go test files or Ginkgo titles were added, and scans found no It/Describe/Context/When calls.
Test Structure And Quality ✅ Passed No Ginkgo test files or test logic were changed; this PR only adds CI config, step-registry metadata, and shell/YAML definitions, so the check is not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo test code was added; the PR only changes CI YAML and a shell validator, so MicroShift-specific compatibility isn’t implicated.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Go/Ginkgo test code was added; the PR only wires CI/step-registry YAML and scripts, so SNO test-compatibility checks are not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PASS: The PR only adds CI config/step-registry; changed files contain no scheduling constraints, node selectors, affinity, spread rules, or PDB/replica logic.
Ote Binary Stdout Contract ✅ Passed No OTE binary process-level code changed; PR only adds ci-operator config/step-registry YAML, metadata, OWNERS, and a shell step script.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PASS: The PR only adds CI/workflow wiring and metadata; no new Ginkgo test bodies or IPv4-only parsing/host URL construction appear in touched paths.
No-Weak-Crypto ✅ Passed No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons were added; only unrelated RSA/ECDSA config appears.
Container-Privileges ✅ Passed The new GCD job and step-registry files add no privileged, hostPID/Network/IPC, SYS_ADMIN, root, or allowPrivilegeEscalation settings.
No-Sensitive-Data-In-Logs ✅ Passed No new logs expose secrets or PII; the only echoes print a file path, credential type, and universe domain, not passwords/tokens.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from rwsu and sadasu June 29, 2026 19:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh`:
- Around line 21-23: The auth set up in gcloud auth login only affects the
current step container, so the next step will still be unauthenticated. Update
the wif-auth command flow to persist the login state needed by the downstream
ipi-conf-gcp consumer using ${SHARED_DIR} rather than relying on the transient
gcloud auth login session; use the existing wif-auth command script and the
shared-step handoff pattern to make the credentials available across steps
without logging sensitive values.
- Around line 5-6: The WIF auth script is hardcoding AWS_CONFIG_FILE and
AWS_PROFILE, which overrides the configured values and ignores future overrides.
Update the wif-auth command script to honor the existing environment variables
instead of re-exporting fixed values, and keep the configuration sourced from
the wif-auth ref and installer wiring. Use the identifiers wif-auth-commands.sh,
AWS_CONFIG_FILE, and AWS_PROFILE to locate the assignments and remove or gate
them so externally provided values win.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a7eeddba-3c5d-4476-9a63-188c19373fef

📥 Commits

Reviewing files that changed from the base of the PR and between 55615d9 and 5ebd08b.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/installer/openshift-installer-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (14)
  • ci-operator/config/openshift/installer/openshift-installer-main.yaml
  • ci-operator/step-registry/ipi/conf/gcd/OWNERS
  • ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.json
  • ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yaml
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERS
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.json
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.yaml
  • ci-operator/step-registry/ipi/gcd/pre/OWNERS
  • ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.json
  • ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.yaml
  • ci-operator/step-registry/openshift/e2e/gcd/OWNERS
  • ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.json
  • ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.yaml

Comment thread ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh Outdated
Comment thread ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh Outdated
rochacbruno and others added 2 commits June 29, 2026 21:16
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use ${VAR:-default} pattern so env vars from ref/config are honored
- Remove gcloud auth login since it doesn't persist across steps;
  each step authenticates independently via GOOGLE_CLOUD_KEYFILE_JSON
  pointing to gce.json from the cluster profile mount

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rochacbruno

Copy link
Copy Markdown
Member Author

/cc @patrickdillon @tthvo @barbacbd

Comment thread ci-operator/step-registry/openshift/e2e/gcd/OWNERS
Comment thread ci-operator/config/openshift/installer/openshift-installer-main.yaml Outdated
Comment thread ci-operator/config/openshift/installer/openshift-installer-main.yaml Outdated
Comment thread ci-operator/config/openshift/installer/openshift-installer-main.yaml Outdated
Comment thread ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.yaml Outdated
rochacbruno and others added 3 commits June 30, 2026 10:36
GCD only supports private DNS zones, so the cluster must be published
internally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename e2e-gcd to e2e-gcd-techpreview, add FEATURE_SET: TechPreviewNoUpgrade
- Switch from build13 (GCP) to build07 (AWS) for STS role assumption
- Add PUBLISH: Internal (GCD only supports private DNS zones)
- Remove AWS_CONFIG_FILE/AWS_PROFILE env overrides (injected by ci-operator)
- Simplify WIF auth step to validate credential config and declare
  GOOGLE_CLOUD_UNIVERSE_DOMAIN env var
- Add barbacbd to OWNERS files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The GCD cluster profile uses a WIF credential config (type
external_account) instead of a service account key. Update shared GCP
scripts to detect the credential type and handle both:

- ipi-conf-gcp-zones: use gcloud auth login --cred-file for WIF,
  read project ID from cluster profile instead of gce.json
- ipi-install-install: set GOOGLE_APPLICATION_CREDENTIALS for WIF
- ipi-deprovision-deprovision: same as install
- gather-gcp-console: use gcloud auth login --cred-file for WIF

Also restore AWS_PROFILE=hub in the CI config - GCD WIF trusts only
the hub account, not the default target account.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rochacbruno
Once this PR has been reviewed and has the lgtm label, please assign patrickdillon, stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh`:
- Around line 13-17: The credential-type check in the wif-auth command script
reads .type from CRED_FILE with jq, but malformed gce.json causes an uncaught
parser failure under set -e. Update the validation logic around the CRED_TYPE
assignment in the script so jq parsing is handled explicitly first, and emit a
clear step failure message when the credential file is invalid JSON before
checking for external_account.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4d137b13-138e-4876-867b-768bf91e31f6

📥 Commits

Reviewing files that changed from the base of the PR and between 484d340 and f5b5480.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/installer/openshift-installer-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (13)
  • ci-operator/config/openshift/installer/openshift-installer-main.yaml
  • ci-operator/step-registry/ipi/conf/gcd/OWNERS
  • ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.json
  • ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yaml
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERS
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.json
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.yaml
  • ci-operator/step-registry/ipi/gcd/OWNERS
  • ci-operator/step-registry/ipi/gcd/pre/OWNERS
  • ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.json
  • ci-operator/step-registry/openshift/e2e/gcd/OWNERS
  • ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.json
✅ Files skipped from review due to trivial changes (8)
  • ci-operator/step-registry/ipi/conf/gcd/OWNERS
  • ci-operator/step-registry/ipi/gcd/pre/OWNERS
  • ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.metadata.json
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-ref.metadata.json
  • ci-operator/step-registry/openshift/e2e/gcd/openshift-e2e-gcd-workflow.metadata.json
  • ci-operator/step-registry/ipi/gcd/OWNERS
  • ci-operator/step-registry/openshift/e2e/gcd/OWNERS
  • ci-operator/step-registry/ipi/conf/gcd/wif-auth/OWNERS
🚧 Files skipped from review as they are similar to previous changes (2)
  • ci-operator/step-registry/ipi/gcd/pre/ipi-gcd-pre-chain.metadata.json
  • ci-operator/step-registry/ipi/conf/gcd/ipi-conf-gcd-chain.yaml

Comment thread ci-operator/step-registry/ipi/conf/gcd/wif-auth/ipi-conf-gcd-wif-auth-commands.sh Outdated
rochacbruno and others added 2 commits June 30, 2026 11:10
Wrap jq parsing in an explicit check so invalid JSON produces a clear
error message instead of a raw parser failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set BASE_DOMAIN to ci.gcd.devcluster.openshift.com for internal DNS
records. The public_hosted_zone file must also be added to the vault
secret with the same value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +7 to +8
- ref: ipi-conf-gcp
- ref: ipi-conf-gcp-zones

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ref: ipi-conf-gcp
- ref: ipi-conf-gcp-zones
- ref: ipi-conf-gcp
- ref: ipi-conf-gcp-zones
- ref: ipi-conf-gcp-osimage # <--- I think this should work?

We don't have any published OS images in GCD, right? Similar to AWS EUSC, we may define a custom OS image.

On the job config, we can define environment variables (see reference):

COMPUTE_OSIMAGE # OS image for compute nodes
CONTROL_PLANE_OSIMAGE # OS image for control plane nodes
DEFAULT_MACHINE_OSIMAGE # default for both

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the OS image. I'll add ipi-conf-gcp-osimage to the chain. Do you know which OS image name we should use for GCD? The brief mentions rhcos10 but we deferred that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we, ourselves, need to upload/publish a custom OS image in the GCD project? Not sure how it's done on the GCP side, but for AWS, we have to build an AMI from a VM image (like this).

cc @patrickdillon

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying with the rhcos10 that is available on GCD

COMPUTE_NODE_REPLICAS: "2"
FEATURE_SET: TechPreviewNoUpgrade
GOOGLE_CLOUD_UNIVERSE_DOMAIN: apis-berlin-build0.goog
PUBLISH: Internal

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For private cluster, we also need to setup an existing network, bastion VM and proxy because CI pods can't reach internal API endpoint. We can reference the gcp-private pre-submit job 👇

It may look something like commit tthvo@da9ed32, allowing overrding OS image and instance type. WDYT?

- as: gcp-private
optional: true
run_if_changed: (gcp|google)
steps:
cluster_profile: openshift-org-gcp
env:
PUBLISH: Internal
post:
- chain: cucushift-installer-rehearse-gcp-ipi-private-deprovision
pre:
- ref: gcp-provision-vpc
- ref: ignition-bastionhost
- ref: gcp-provision-bastionhost
- ref: proxy-config-generate
- chain: ipi-conf-gcp
- chain: ipi-install
- ref: cucushift-installer-check-gcp-private
test:
- chain: cucushift-installer-check-cluster-health

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am making changes based on your reference commit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push that as a suggestion or co-author a commit?

Ohh yes, it should be available in my fork https://github.com/tthvo/release at branch gcd. I committed on the top of yours so you can cherry-pick without issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your reference commit - added VPC/bastion/proxy to ipi-gcd-pre chain, teardown in post, WIF auth in all provision/deprovision scripts, and configurable bastion image/machine type. Also restored AWS_PROFILE=hub.

OS_IMAGE_STREAM: rhel-10
workflow: openshift-e2e-aws
- always_run: false
as: e2e-gcd-techpreview

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, once we finalize the job config, we need to copy it to 5.0, 5.1 and 4.23 release branch too, rite? For example, gcp-private has a 5.0 variant:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, it's never clear to me which branches we need to copy to. In general I think config on an existing branch will be carried over to the next branch on branching day. But again, not sure, so it might be good to lock this down.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think config on an existing branch will be carried over to the next branch on branching day

☝️ Right, I notice the config on -main.yaml is the one that will be copied over to next branch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the config on main will be copied to the next branch on branching day. We can add it to existing release branches once the job is validated.

@patrickdillon

Copy link
Copy Markdown
Contributor

/pj-rehearse pull-ci-openshift-installer-main-e2e-gcd-techpreview

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@patrickdillon: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

rochacbruno and others added 2 commits July 1, 2026 20:46
For private GCD clusters, add VPC provisioning, bastion host, and proxy
to the pre-install chain, with teardown in post steps. Based on tthvo's
reference commit.

- Add WIF external_account auth to gcp-provision-vpc, gcp-deprovision-vpc,
  gcp-provision-bastionhost, gcp-deprovision-bastionhost scripts
- Make bastion machine type and image configurable via env vars
  (BASTION_MACHINE_TYPE, BASTION_IMAGE_NAME, BASTION_IMAGE_PROJECT)
- Update ipi-gcd-pre chain with VPC, bastion, and proxy steps
- Update openshift-e2e-gcd workflow with bastion and VPC teardown
- Restore AWS_PROFILE: hub in CI config (required for GCD WIF)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ipi-conf-gcp-osimage step to the GCD chain and set
DEFAULT_MACHINE_OSIMAGE to rhcos10, which is the RHCOS image
published to GCD.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rochacbruno

Copy link
Copy Markdown
Member Author

/pj-rehearse pull-ci-openshift-installer-main-e2e-gcd-techpreview

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@rochacbruno: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

CRED_TYPE=$(jq -r .type "${GCP_SHARED_CREDENTIALS_FILE}")
if [[ "${CRED_TYPE}" == "external_account" ]]; then
GOOGLE_PROJECT_ID="$(< ${CLUSTER_PROFILE_DIR}/openshift_gcp_project)"
gcloud auth login --cred-file="${GCP_SHARED_CREDENTIALS_FILE}" --quiet

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, gcloud is contacting AWS EC2 IMDS endpoint, but it shouldn't be 🤔

(gcloud.config.set) There was a problem refreshing your current auth tokens: 
HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with 
url: /latest/meta-data/placement/availability-zone (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 
0x7f12db7b5198>: Failed to establish a new connection: [Errno 111] Connection refused',))

I'm not sure what info gcloud needs that it must query IMDS metadata. Maybe, we can try providing as many details as possible:

  1. Providing AWS region:
export AWS_REGION="$LEASED_RESOURCE" # Getting from Boskos lease
  1. Assume the IAM role via AWS CLI and directly provide the env vars:
export AWS_PROFILE=hub # Already set in job
eval $(aws configure export-credentials --format env)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing DPTP adding the home_role_arn to build07's aws-sts-cluster-config. Once STS is activated, ci-operator will inject the AWS credentials, and the Google SDK will be able to use them instead of trying to hit the EC2 metadata endpoint.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, cool! Thanks for the details! I guess we'll wait a bit then 😁

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting on #81400

@barbacbd barbacbd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rochacbruno I think we are missing documentation for WIF Setup

Issue: While the PR mentions cluster-secrets-gcd must be populated, there's no documentation about:

  • Exact format of the gce.json credential file
  • How to generate WIF credentials for GCD
  • What the hub role ARN is and how to obtain it

Recommendation: Add a README covering:

  1. Prerequisites for running GCD jobs
  2. WIF identity chain setup process
  3. Example credential format
  4. Troubleshooting guide for auth failures

It is possible that we do not need this, but the process should be documented for the installer team or others.

if [ -f "${SHARED_DIR}/gcp_min_permissions.json" ]; then
if [[ "$(jq -r .type ${GOOGLE_CLOUD_KEYFILE_JSON} 2>/dev/null)" == "external_account" ]]; then
echo "$(date -u --rfc-3339=seconds) - Using WIF external_account credentials for GCD..."
export GOOGLE_APPLICATION_CREDENTIALS="${GOOGLE_CLOUD_KEYFILE_JSON}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    # Validate universe domain is set for sovereign cloud
    if [[ -z "${GOOGLE_CLOUD_UNIVERSE_DOMAIN:-}" ]]; then
      echo "WARNING: GOOGLE_CLOUD_UNIVERSE_DOMAIN not set for WIF credentials"
    fi

The install script sets GOOGLE_APPLICATION_CREDENTIALS for WIF auth but doesn't validate GOOGLE_CLOUD_UNIVERSE_DOMAIN. While it's set in the job config, adding validation here would catch configuration errors earlier.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOOGLE_CLOUD_UNIVERSE_DOMAIN validation is not needed here - the universe domain will soon be discovered automatically by the SDK based on the project ID format (eu0: prefix). Patrick mentioned this recently as part of the installer's sovereign cloud detection. Adding validation for something that's going away would just be churn.

if ! gcloud auth list | grep -E "\*\s+${sa_email}"
then
gcloud auth activate-service-account --key-file="${GCP_SHARED_CREDENTIALS_FILE}"
CRED_TYPE=$(jq -r .type "${GCP_SHARED_CREDENTIALS_FILE}" 2>/dev/null)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some common gcloud auth logic. Can we make a common file ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - the WIF external_account detection pattern is repeated across several scripts now. We'll extract it into a shared helper as a follow-up once we confirm the current implementation works end-to-end.

Document the WIF authentication flow, cluster profile secret format,
credential config generation, environment variables, and
troubleshooting guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rochacbruno

Copy link
Copy Markdown
Member Author

@barbacbd Added a README at ci-operator/step-registry/ipi/conf/gcd/README.md covering the WIF authentication flow, cluster profile secret format, credential config generation, environment variables, and troubleshooting guide.

@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@rochacbruno: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/installer/main/e2e-gcd-techpreview cb78111 link unknown /pj-rehearse pull-ci-openshift-installer-main-e2e-gcd-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@rochacbruno: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-cloud-bulldozer-orion-main-payload-control-plane-6nodes cloud-bulldozer/orion presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-main-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.1-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.0-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.23-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.22-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.21-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.20-operator-e2e-aws openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-main-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-main-operator-e2e-azure-rhcos10-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-main-operator-e2e-azure-rhcos10 openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.1-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.1-operator-e2e-azure-rhcos10-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.1-operator-e2e-azure-rhcos10 openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.0-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.0-operator-e2e-azure-rhcos10-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-5.0-operator-e2e-azure-rhcos10 openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.23-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.23-operator-e2e-azure-rhcos10-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.23-operator-e2e-azure-rhcos10 openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.22-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.22-operator-e2e-azure-rhcos10-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.22-operator-e2e-azure-rhcos10 openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.21-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed
pull-ci-openshift-secrets-store-csi-driver-operator-release-4.20-operator-e2e-fips openshift/secrets-store-csi-driver-operator presubmit Registry content changed

A total of 34900 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants