Skip to content

OCPBUGS-77404: bugfix: mount /etc/containers so image controllers respect cluster-wide image registry configurations#439

Open
everettraven wants to merge 9 commits into
openshift:masterfrom
everettraven:ocpbugs-77404
Open

OCPBUGS-77404: bugfix: mount /etc/containers so image controllers respect cluster-wide image registry configurations#439
everettraven wants to merge 9 commits into
openshift:masterfrom
everettraven:ocpbugs-77404

Conversation

@everettraven

@everettraven everettraven commented Jun 8, 2026

Copy link
Copy Markdown

Description

Fixes OCPBUGS-77404 by updating the openshift-controller-manager deployment to mount the /etc/containers directory from the host that the image controllers using the containers/image package will read for registry configurations.

The Machine Configuration Operator is responsible for syncing cluster configurations through the image APIs to this directory on the nodes.

In order to do this, the SCC needed to be adjusted to one that allowed host path mounting. To limit the changes as much as possible I updated the SCC to hostaccess from restricted-v3 so that we can mount the host path but prevent running the container as root, maintaining the user namespace security boundary.

Summary by CodeRabbit

  • Chores
    • Controller now requests the host-access security constraint; new role and rolebinding grant that access.
    • Operator synchronizes the cluster pull-secret into the controller namespace at startup.
    • Controller namespace annotated for privileged pod-security.
  • New Features
    • Controller mounts host container configuration read-only and exposes a registry auth file via an environment variable for image pulls.
  • Tests
    • Tests updated to expect the new registry auth env var and adjusted formatting in expected configs.

…de image registry configurations

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 8, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@everettraven: This pull request references Jira Issue OCPBUGS-77404, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Description

Fixes OCPBUGS-77404 by updating the openshift-controller-manager deployment to mount the /etc/containers directory from the host that the image controllers using the containers/image package will read for registry configurations.

The Machine Configuration Operator is responsible for syncing cluster configurations through the image APIs to this directory on the nodes.

In order to do this, the SCC needed to be adjusted to one that allowed host path mounting. To limit the changes as much as possible I updated the SCC to hostaccess from restricted-v3 so that we can mount the host path but prevent running the container as root, maintaining the user namespace security boundary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ec2703f-0fe5-44f1-85d3-4b7a172d8a31

📥 Commits

Reviewing files that changed from the base of the PR and between 038c418 and 28a62c9.

📒 Files selected for processing (1)
  • bindata/assets/openshift-controller-manager/deploy.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • bindata/assets/openshift-controller-manager/deploy.yaml

Walkthrough

Deployment and operator changes: controller-manager now requests the hostaccess SCC, mounts host /etc/containers read-only, sets REGISTRY_AUTH_FILE=/var/run/secrets/image-auth/auth.json, adds RBAC for hostaccess, syncs the global pull-secret into the operator namespace, updates tests, and marks the namespace pod-security as privileged.

Changes

Controller manager hostaccess and related wiring

Layer / File(s) Summary
Security context and host volume mount
bindata/assets/openshift-controller-manager/deploy.yaml, bindata/assets/openshift-controller-manager/ns.yaml
Pod annotation changed to request hostaccess; added REGISTRY_AUTH_FILE env; added read-only etc-containers volumeMount at /etc/containers; defined etc-containers hostPath volume; namespace pod-security annotations set to privileged.
RBAC Role and RoleBinding
bindata/assets/openshift-controller-manager/scc-role.yaml, bindata/assets/openshift-controller-manager/scc-rolebinding.yaml
Added system:openshift-controller-manager:hostaccess-role Role permitting use on securitycontextconstraints/hostaccess and a RoleBinding for openshift-controller-manager-sa.
Operator startup and static assets
pkg/operator/starter.go
Configure resourceSyncer.SyncSecret for openshift-config/pull-secret → operator namespace; include new SCC Role and RoleBinding manifests in static resource controller assets.
Tests: expected config and env updates
pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go
Reformat expected controller lists in config tests and add REGISTRY_AUTH_FILE entries to expected env in TestDeploymentWithProxy cases; small formatting edits.

🎯 4 (Complex) | ⏱️ ~45 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Topology-Aware Scheduling Compatibility ⚠️ Warning bindata/assets/openshift-controller-manager/deploy.yaml sets nodeSelector node-role.kubernetes.io/master and required podAntiAffinity (requiredDuringSchedulingIgnoredDuringExecution) without any Co... Make controller-manager scheduling topology-aware: remove/replace node-role.kubernetes.io/master nodeSelector and required hostname anti-affinity (or adjust with topology-aware hooks and safe maxUnavailable), so it works on SNO/TNF/TNA/H...
No-Sensitive-Data-In-Logs ❓ Inconclusive Could not access PR diff to scan all changed files; in pkg/operator/starter.go the new pull-secret sync only returns fmt.Errorf and does not log secret contents. Authenticate/obtain PR diff and scan all modified files for log/printf/klog/zap calls that include or interpolate secrets/tokens/PII.
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely describes the main change: mounting /etc/containers in the controller-manager deployment to enable cluster-wide image registry configuration support. The title is specific, directly relates to the core changeset modifications, and uses the issue reference (OCPBUGS-77404) for tracking.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Repo Ginkgo tests are only in test/e2e/*.go; their It/Describe titles are static strings (e.g., [Skipped:MicroShift], [Timeout:30m]) with no pod/node/namespace/IP/timestamps.
Test Structure And Quality ✅ Passed The repo/PR changes include no Ginkgo/Gomega tests (no Describe/It/BeforeEach/Eventually/Consistently/Expect); the modified test file uses standard Go testing funcs only.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests; PR only updates manifests and a Go unit test file, and the updated test file contains no “ginkgo” usage. citeturn0view0turn5view0
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR updates operator manifests and a unit test file; inspected sync_openshiftcontrollermanager_v311_00_test.go and found no Ginkgo e2e (Describe/It/Context) usage, so SNO assumptions check is not ap...
Ote Binary Stdout Contract ✅ Passed PR changes are YAML and starter.go/test formatting; in the touched Go files, there are no fmt.Print/os.Stdout writes, only klog usage (no new stdout contract violation detected).
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR #439 updates bindata SCC/deployment manifests and a pkg/operator unit test; no new Ginkgo e2e tests were added/touched, so the IPv6/disconnected test-compatibility check is not applicable.
No-Weak-Crypto ✅ Passed Scanned PR-mentioned files (starter.go, test, and SCC/deploy/ns YAMLs) for MD5/SHA1/DES/RC4/3DES/Blowfish/ECB and crypto package indicators; no weak-crypto keywords/imports found.
Container-Privileges ✅ Passed controller-manager container securityContext sets allowPrivilegeEscalation: false and drops ALL capabilities; no privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN fields present in deploy.yaml.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kaleemsiddiqu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

path: tls-ca-bundle.pem
- emptyDir: {}
name: tmp
- hostPath:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open Question: Does this automatically handle pull secrets for private registries as well?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, but I've opened openshift/openshift-controller-manager#447 to help with this.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
bindata/assets/openshift-controller-manager/deploy.yaml (2)

60-63: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add resource limits for memory and CPU.

The coding guidelines require resource limits on every container. Currently, only resource requests are defined. Without limits, the controller-manager pod could consume unbounded resources, potentially impacting control plane stability.

📊 Proposed fix to add resource limits
         resources:
           requests:
             memory: 100Mi
             cpu: 100m
+          limits:
+            memory: 200Mi
+            cpu: 200m

Note: Adjust limit values based on observed resource usage patterns.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/assets/openshift-controller-manager/deploy.yaml` around lines 60 -
63, The resources block currently only sets requests (memory: 100Mi, cpu: 100m);
add a limits section alongside requests to enforce upper bounds (e.g., limits:
memory: 200Mi, cpu: 200m or other values tuned to observed usage) so the
openshift-controller-manager container cannot consume unbounded resources;
update the same resources mapping (the resources key for the controller-manager
container) to include a limits subsection with both memory and cpu entries.

Source: Coding guidelines


49-54: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit runAsNonRoot: true to the container securityContext.

While the hostaccess SCC may enforce non-root execution, the manifest should explicitly declare runAsNonRoot: true for defense in depth and to match the PR's stated goal of "preventing running the container as root." Without this setting, the security guarantee depends entirely on the SCC, which could change.

🔒 Proposed fix to add explicit runAsNonRoot
       - name: controller-manager
         securityContext:
+          runAsNonRoot: true
           readOnlyRootFilesystem: true
           allowPrivilegeEscalation: false
           capabilities:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/assets/openshift-controller-manager/deploy.yaml` around lines 49 -
54, The container's securityContext is missing an explicit runAsNonRoot setting;
update the container spec's securityContext (the existing securityContext block
that contains readOnlyRootFilesystem, allowPrivilegeEscalation, and
capabilities.drop) to add runAsNonRoot: true so the pod manifest explicitly
prevents running as UID 0 (defense in depth) while preserving the other keys
(readOnlyRootFilesystem, allowPrivilegeEscalation, capabilities.drop).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bindata/assets/openshift-controller-manager/deploy.yaml`:
- Line 36: The Deployment currently sets openshift.io/required-scc: hostaccess
which is too permissive—either replace it with the restricted-v3 SCC or wire a
purpose-built custom SCC and document why elevated host access is required;
additionally, in the podSpec/container securityContext ensure runAsNonRoot: true
is set, explicitly set automountServiceAccountToken: false at the podSpec level,
and add resources.limits for the controller container to complement existing
resources.requests so limits are enforced.

---

Outside diff comments:
In `@bindata/assets/openshift-controller-manager/deploy.yaml`:
- Around line 60-63: The resources block currently only sets requests (memory:
100Mi, cpu: 100m); add a limits section alongside requests to enforce upper
bounds (e.g., limits: memory: 200Mi, cpu: 200m or other values tuned to observed
usage) so the openshift-controller-manager container cannot consume unbounded
resources; update the same resources mapping (the resources key for the
controller-manager container) to include a limits subsection with both memory
and cpu entries.
- Around line 49-54: The container's securityContext is missing an explicit
runAsNonRoot setting; update the container spec's securityContext (the existing
securityContext block that contains readOnlyRootFilesystem,
allowPrivilegeEscalation, and capabilities.drop) to add runAsNonRoot: true so
the pod manifest explicitly prevents running as UID 0 (defense in depth) while
preserving the other keys (readOnlyRootFilesystem, allowPrivilegeEscalation,
capabilities.drop).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0f6912d5-6a19-4d54-a6d9-64aca0dbe28d

📥 Commits

Reviewing files that changed from the base of the PR and between 34f95b0 and a7aa392.

📒 Files selected for processing (1)
  • bindata/assets/openshift-controller-manager/deploy.yaml

annotations:
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
openshift.io/required-scc: restricted-v3
openshift.io/required-scc: hostaccess

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for existing custom SCC definitions in the repository
rg -n "kind:\s*SecurityContextConstraints" -A 20

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 81


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="bindata/assets/openshift-controller-manager/deploy.yaml"

echo "== File size =="
wc -l "$FILE"

echo "== SCC annotation occurrences (context) =="
rg -n "openshift\.io/required-scc|required-scc" "$FILE" -n

echo "== runAsNonRoot occurrences =="
rg -n "runAsNonRoot" "$FILE" || true

echo "== resources.limits occurrences =="
rg -n "resources:|limits:" "$FILE"

echo "== Extract: container securityContext/resources/volumes around likely sections =="
# show around where the reviewer cited lines (roughly); adjust if file differs
sed -n '1,130p' "$FILE" | nl -ba | sed -n '1,130p'
sed -n '90,180p' "$FILE" | nl -ba | sed -n '1,120p'

echo "== hostPath volume /etc/containers mount check =="
rg -n "etc/containers|hostPath|readOnly" "$FILE"

echo "== Anti-affinity / topology-related scheduling constraints =="
rg -n "antiAffinity|requiredDuringSchedulingIgnoredDuringExecution|topologyKey|hostname" "$FILE" || true

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 499


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for any custom SCC definitions/usage in repo =="
rg -n "SecurityContextConstraints" -S . || true
rg -n "hostaccess|openshift\.io/required-scc" -S . || true
rg -n "SecurityContextConstraint" -S . || true

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 19745


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="bindata/assets/openshift-controller-manager/deploy.yaml"

echo "== Extract around SCC (lines 1-60) =="
awk 'NR<=60{print NR ":" $0}' "$FILE"

echo "== Extract around resources (55-95) =="
awk 'NR>=55 && NR<=95{print NR ":" $0}' "$FILE"

echo "== Extract around hostPath/volumes (95-140) =="
awk 'NR>=95 && NR<=140{print NR ":" $0}' "$FILE"

echo "== Extract last lines (141-200) =="
awk 'NR>=141 && NR<=200{print NR ":" $0}' "$FILE"

echo "== Field presence checks =="
for pat in \
  "runAsNonRoot" \
  "allowPrivilegeEscalation" \
  "readOnlyRootFilesystem" \
  "capabilities:" \
  "automountServiceAccountToken" \
  "limits:" \
  "livenessProbe" \
  "readinessProbe" \
  "hostPID" \
  "hostNetwork" \
  "hostIPC" \
  "privileged" \
  "antiAffinity" \
  "topologyKey"; do
  echo "-- $pat"
  rg -n "$pat" "$FILE" || true
done

echo "== Search for custom SCC manifests outside vendor =="
rg -n --glob '!vendor/**' "kind:\s*SecurityContextConstraints" -S . || true

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 6288


Major security hardening gaps in controller-manager OpenShift Deployment

        openshift.io/required-scc: hostaccess
  • hostaccess is less restricted than restricted-v3 and doesn’t align with the guideline requiring a restricted/custom-scoped SCC—either justify the need or wire a custom SCC.
  • Missing runAsNonRoot: true at pod/container securityContext (defense-in-depth).
  • Missing resources.limits (only resources.requests are set).
  • automountServiceAccountToken is not explicitly set to false.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/assets/openshift-controller-manager/deploy.yaml` at line 36, The
Deployment currently sets openshift.io/required-scc: hostaccess which is too
permissive—either replace it with the restricted-v3 SCC or wire a purpose-built
custom SCC and document why elevated host access is required; additionally, in
the podSpec/container securityContext ensure runAsNonRoot: true is set,
explicitly set automountServiceAccountToken: false at the podSpec level, and add
resources.limits for the controller container to complement existing
resources.requests so limits are enforced.

Source: Coding guidelines

by syncing the global pull-secret to the openshift-controller-manager namespace
mounted as a volume on the openshift-controller-manager pods and setting the
REGISTRY_AUTH_FILE environment variable to the mounted file.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to remove this as e2e-upgrade reports

event happened 36 times, something is wrong: namespace/openshift-controller-manager hmsg/827f4eb8d4 replicaset/controller-manager-99bbc9cc8 - reason/FailedCreate Error creating: pods "controller-manager-99bbc9cc8-" is forbidden: unable to validate against any security context constraint: [metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: Forbidden: seccomp may not be set, metadata.annotations[container.seccomp.security.alpha.kubernetes.io/controller-manager]: Forbidden: seccomp may not be set] (21:59:34Z) result=reject }

@ehearne-redhat

Copy link
Copy Markdown

@bpalmer The changes in general look fine to me, the seccomp concern is the only one I have. Could you take a look at that and let me know if this is an issue? I dont see any mention https://docs.redhat.com/en/documentation/openshift_container_platform/4.22/html/authentication_and_authorization/managing-pod-security-policies#default-sccs_configuring-internal-oauth about it not being allowed.

However it says privileged containers cannot use seccompprofile at https://docs.redhat.com/en/documentation/openshift_container_platform/4.22/html/security_and_compliance/seccomp-profiles .

…ment

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go (1)

752-755: ⚡ Quick win

Assert the new mount contract alongside REGISTRY_AUTH_FILE.

These updates only verify the env var. The feature also depends on the matching secret volume/volumeMount for /var/run/secrets/image-auth/auth.json and the /etc/containers host mount, so those regressions would still pass this suite.

This follows the PR objective to mount both the global pull-secret and /etc/containers into the deployment.

Also applies to: 788-791, 819-822

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go` around lines
752 - 755, The test currently only asserts the REGISTRY_AUTH_FILE env var; also
assert that the deployment spec includes a Volume referencing the global
pull-secret (a Secret volume whose SecretName equals the cluster pull secret)
and a corresponding VolumeMount mounting that secret to
/var/run/secrets/image-auth/auth.json, and assert a HostPath (or projected host)
Volume and VolumeMount for /etc/containers so the host config is mounted into
the container; update the assertions around REGISTRY_AUTH_FILE (and the other
similar blocks) to check for a Volume with the secret name, a VolumeMount with
MountPath "/var/run/secrets/image-auth" (or the exact file mount), and a
Volume/VolumeMount that mounts "/etc/containers" from the host into the pod spec
to ensure both secret and host mounts are present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bindata/assets/openshift-controller-manager/scc-rolebinding.yaml`:
- Around line 7-9: The RoleBinding's roleRef is missing the apiGroup; update the
RoleBinding where roleRef.name is
"system:openshift-controller-manager:hostaccess-role" (the hostaccess
RoleBinding) to include roleRef.apiGroup: rbac.authorization.k8s.io so roleRef
has keys kind, name, and apiGroup like other RoleBindings (e.g.,
route-controller-manager-leader-rolebinding); modify the manifest defining
roleRef to add the apiGroup field accordingly.

---

Nitpick comments:
In `@pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go`:
- Around line 752-755: The test currently only asserts the REGISTRY_AUTH_FILE
env var; also assert that the deployment spec includes a Volume referencing the
global pull-secret (a Secret volume whose SecretName equals the cluster pull
secret) and a corresponding VolumeMount mounting that secret to
/var/run/secrets/image-auth/auth.json, and assert a HostPath (or projected host)
Volume and VolumeMount for /etc/containers so the host config is mounted into
the container; update the assertions around REGISTRY_AUTH_FILE (and the other
similar blocks) to check for a Volume with the secret name, a VolumeMount with
MountPath "/var/run/secrets/image-auth" (or the exact file mount), and a
Volume/VolumeMount that mounts "/etc/containers" from the host into the pod spec
to ensure both secret and host mounts are present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1df28a42-a6e6-4738-bd20-49f2979e7e19

📥 Commits

Reviewing files that changed from the base of the PR and between a7aa392 and 1ed86f9.

📒 Files selected for processing (5)
  • bindata/assets/openshift-controller-manager/deploy.yaml
  • bindata/assets/openshift-controller-manager/scc-role.yaml
  • bindata/assets/openshift-controller-manager/scc-rolebinding.yaml
  • pkg/operator/starter.go
  • pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • bindata/assets/openshift-controller-manager/deploy.yaml

Comment thread bindata/assets/openshift-controller-manager/scc-rolebinding.yaml
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@everettraven

Copy link
Copy Markdown
Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 11, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@everettraven: This pull request references Jira Issue OCPBUGS-77404, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jitsingh@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@everettraven

Copy link
Copy Markdown
Author

This last failure has a really odd failure. It says the openshift-controller-manager-sa service account doesn't exist but I can see an event from when it was created and the audit logs show no sign of that service account being deleted (or created).

Going to see if maybe this was a fluke.

/retest-required

@ehearne-redhat

Copy link
Copy Markdown

@everettraven - do we need to add pod-security.kubernetes.io annotations to https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/bindata/assets/openshift-controller-manager/ns.yaml as the pod is now privileged ?

It says here that pods that

Pods that do not comply with the pod security admission defined globally or at the namespace level are not admitted to the cluster and cannot run.

It says here that

All system-created namespaces that are prefixed with openshift- , except for openshift-operators

have pod security admission synchronization disabled permanently.

Looking at the must-gather, I can see pod security admission is set to restricted.

spec:
  logLevel: Normal
  managementState: Managed
  observedConfig:
    admission:
      pluginConfig:
        PodSecurity:
          configuration:
            defaults:
              audit: restricted
              audit-version: latest
              enforce: restricted
              enforce-version: latest
              warn: restricted
              warn-version: latest

Kubernetes docs states under this configuration, we need RuntimeDefault seccomp profile. Since we are running in hostaccess SCC now, we cannot use this profile.

Looking at the same doc, the minimum we can set pod-security.kubernetes.io annotations to is privileged because of the hostaccess SCC.

So, we will probably need to change the namespace to use these labels:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-controller-manager
  annotations:
    openshift.io/node-selector: ""
    workload.openshift.io/allowed: "management"
  labels:
    openshift.io/cluster-monitoring: "true"
    openshift.io/run-level: "" # specify no run-level turns it off on install and upgrades
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged

I have found an example of an openshift-* namespace using these labels here.

@everettraven

Copy link
Copy Markdown
Author

@ehearne-redhat Yeah, we will need to do that but I don't think that is what is causing the failures here. I'm trying to sort out why we are seeing an issue with pod creation related to non-existence of the service account. Something seems to be removing the service account before we attempt to create the pods from the deployment here.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@everettraven: This pull request references Jira Issue OCPBUGS-77404, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jitsingh@redhat.com), skipping review request.

Details

In response to this:

Description

Fixes OCPBUGS-77404 by updating the openshift-controller-manager deployment to mount the /etc/containers directory from the host that the image controllers using the containers/image package will read for registry configurations.

The Machine Configuration Operator is responsible for syncing cluster configurations through the image APIs to this directory on the nodes.

In order to do this, the SCC needed to be adjusted to one that allowed host path mounting. To limit the changes as much as possible I updated the SCC to hostaccess from restricted-v3 so that we can mount the host path but prevent running the container as root, maintaining the user namespace security boundary.

Summary by CodeRabbit

  • Chores
  • Controller now requests the host-access security constraint and is granted that access via a new role and rolebinding.
  • Operator synchronizes the cluster pull-secret into the controller namespace at startup.
  • New Features
  • Controller mounts host container configuration read-only and exposes a registry auth file via an environment variable for image pulls.
  • Chores
  • Controller namespace annotated for privileged pod-security.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Comment thread pkg/operator/starter.go
Comment on lines 314 to 315
go staticResourceController.Run(ctx, 1)
go operator.Run(ctx, 1)

@ehearne-redhat ehearne-redhat Jun 12, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@everettraven I believe there is a race condition as to when the manifests are applied. They appear to be applied concurrently, not sequentially, as the deployment and SA/SCCs/etc are managed by two separate controllers.

We probably need to add a check somewhere near https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/sync_openshiftcontrollermanager_v311_00.go#L134 to check if the service account + related resources are present before applying the deployment manifest.

Adding something like...

// Verify service account exists before creating deployment
  _, err = c.kubeClient.CoreV1().ServiceAccounts(util.TargetNamespace).Get(context.TODO(),
  "openshift-controller-manager-sa", metav1.GetOptions{})
  if err != nil {
      ocmErrors = append(ocmErrors, fmt.Errorf("%q: waiting for service account: %w", operandName, err))
      return syncReturn(c, ocmErrors, rcmErrors, originalOperatorConfig, operatorConfig)
  }

// repeat the same for scc + rolebinding too. 

should ensure we don't hit the race condition.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a race condition is the problem we are seeing.

I'm seeing an event propagate that says the openshift-controller-manager-sa service account is getting created but there appears to be no evidence of it existing in the must gather that is collected from the test clusters.

Even if the controller enters a degraded state for some amount of time, eventual consistency should ensure that eventually the service account we are waiting for exists and the deployment will continue to roll out.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when I check must gather I never see serviceaccounts.yaml or something like that in most namespaces. Only some namespaces I have seen this file.

I see this in /core/events.yaml

 message: 'Error creating: pods "controller-manager-f4dc56774-" is forbidden: error
    looking up service account openshift-controller-manager/openshift-controller-manager-sa:
    serviceaccount "openshift-controller-manager-sa" not found'
message: 'Error creating: pods "controller-manager-f4dc56774-" is forbidden: unable
    to validate against any security context constraint: provider "hostaccess": Forbidden:
    not usable by user or serviceaccount'

but then it follows up with

message: 'Created pod: controller-manager-f4dc56774-xbj7h'

There are also messages saying it can't find the config configMap, serving-cert secret too.

And also

message: 'MountVolume.SetUp failed for volume "image-auth" : references non-existent
    secret key: .dockercfgjson'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's interesting is the 19s difference in timestamp between pod creation and deletion.

lastTimestamp: "2026-06-12T13:37:31Z"
  message: 'Created pod: controller-manager-f4dc56774-xbj7h'
lastTimestamp: "2026-06-12T13:37:50Z"
  message: 'Deleted pod: controller-manager-f4dc56774-xbj7h'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it wouldn't hurt to add checks to make sure these resources exist before we apply the deploy.yaml manifest. The worst thing that could happen is that we discover the controller works fine. :D

For me, the logs show that manifests are missing which cause errors. If we can make sure the manifests are there first before applying, we can at least rule out that scenario. We could be looking at different logs however... :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think you found the smoking gun with those events. I hadn't yet found time to dig deeper into the events timeline yet so thanks for that. The wrong key reference definitely could cause this.

Also TIL that the service accounts don't seem to show up in the must gathers :). I wonder why that is?

@ehearne-redhat ehearne-redhat Jun 15, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a little time investigating why this is, and it actually comes down to not even using the must-gather command at all!

So, the interesting thing is, service accounts are captured in some namespaces.

ehearne-mac:quay-proxy-ci-openshift-org-openshift-ci-sha256-9da0e7fd8e430ccec3c8da980b0718e0f826c599f580a98189679e9c584370f1 ehearne$ find . -path "*/serviceaccounts/*"
./namespaces/openshift-cluster-olm-operator/core/serviceaccounts/cluster-olm-operator.yaml
./namespaces/openshift-cloud-network-config-controller/core/serviceaccounts/cloud-network-config-controller.yaml
./namespaces/openshift-operator-controller/core/serviceaccounts/operator-controller-controller-manager.yaml
./namespaces/openshift-cluster-csi-drivers/core/serviceaccounts/aws-ebs-csi-driver-operator.yaml
./namespaces/openshift-multus/core/serviceaccounts/multus.yaml
./namespaces/openshift-multus/core/serviceaccounts/multus-ancillary-tools.yaml
./namespaces/openshift-multus/core/serviceaccounts/metrics-daemon-sa.yaml
./namespaces/openshift-multus/core/serviceaccounts/multus-ac.yaml
./namespaces/openshift-multus/core/serviceaccounts/multus-node-identity.yaml
./namespaces/openshift-network-operator/core/serviceaccounts/iptables-alerter.yaml
./namespaces/openshift-network-console/core/serviceaccounts/networking-console-plugin.yaml
./namespaces/openshift-network-node-identity/core/serviceaccounts/network-node-identity.yaml
./namespaces/openshift-cloud-credential-operator/core/serviceaccounts/pod-identity-webhook.yaml
./namespaces/openshift-cluster-storage-operator/core/serviceaccounts/csi-snapshot-controller-operator.yaml
./namespaces/openshift-insights/core/serviceaccounts/gather.yaml
./namespaces/openshift-insights/core/serviceaccounts/operator.yaml
./namespaces/openshift-ovn-kubernetes/core/serviceaccounts/ovn-kubernetes-control-plane.yaml
./namespaces/openshift-ovn-kubernetes/core/serviceaccounts/ovn-kubernetes-node.yaml
./namespaces/openshift-network-diagnostics/core/serviceaccounts/network-check-target.yaml
./namespaces/openshift-network-diagnostics/core/serviceaccounts/network-diagnostics.yaml
ehearne-mac:quay-proxy-ci-openshift-org-openshift-ci-sha256-9da0e7fd8e430ccec3c8da980b0718e0f826c599f580a98189679e9c584370f1 ehearne$ 

It comes down to what namespaces use oc adm inspect --namespace as oc adm inspect when ns is specified will gather all resources into the dump, including service accounts.

https://github.com/openshift/oc/blob/40ce70fca070aafb0273563ce5a7f0a5ba1fcdb2/pkg/cli/admin/inspect/namespace.go#L22

Any namespace that has update-service-operator will get serviceaccounts included to inspect in must gather as a result.

https://github.com/openshift/must-gather/blob/16ac27eedcc79f5b57d77eb01e6b187b84fe7daa/collection-scripts/gather_osus#L9

Funnily enough, this was included as part of the first PR for must-gather :) openshift/must-gather#1 .

secret:
secretName: pull-secret
items:
- key: .dockercfgjson

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@everettraven: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 28a62c9 link true /test e2e-aws-ovn
ci/prow/e2e-upgrade 28a62c9 link true /test e2e-upgrade
ci/prow/e2e-aws-operator 28a62c9 link true /test e2e-aws-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants