OCPBUGS-77404: bugfix: mount /etc/containers so image controllers respect cluster-wide image registry configurations by everettraven · Pull Request #439 · openshift/cluster-openshift-controller-manager-operator

everettraven · 2026-06-08T15:21:41Z

Description

Fixes OCPBUGS-77404 by updating the openshift-controller-manager deployment to mount the /etc/containers directory from the host that the image controllers using the containers/image package will read for registry configurations.

The Machine Configuration Operator is responsible for syncing cluster configurations through the image APIs to this directory on the nodes.

In order to do this, the SCC needed to be adjusted to one that allowed host path mounting. To limit the changes as much as possible I updated the SCC to hostaccess from restricted-v3 so that we can mount the host path but prevent running the container as root, maintaining the user namespace security boundary.

Summary by CodeRabbit

Chores
- Controller now requests the host-access security constraint; new role and rolebinding grant that access.
- Operator synchronizes the cluster pull-secret into the controller namespace at startup.
- Controller namespace annotated for privileged pod-security.
New Features
- Controller mounts host container configuration read-only and exposes a registry auth file via an environment variable for image pulls.
Tests
- Tests updated to expect the new registry auth env var and adjusted formatting in expected configs.

…de image registry configurations Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

openshift-ci-robot · 2026-06-08T15:21:49Z

@everettraven: This pull request references Jira Issue OCPBUGS-77404, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Description

Fixes OCPBUGS-77404 by updating the openshift-controller-manager deployment to mount the /etc/containers directory from the host that the image controllers using the containers/image package will read for registry configurations.

The Machine Configuration Operator is responsible for syncing cluster configurations through the image APIs to this directory on the nodes.

In order to do this, the SCC needed to be adjusted to one that allowed host path mounting. To limit the changes as much as possible I updated the SCC to hostaccess from restricted-v3 so that we can mount the host path but prevent running the container as root, maintaining the user namespace security boundary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-06-08T15:21:56Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ec2703f-0fe5-44f1-85d3-4b7a172d8a31

📥 Commits

Reviewing files that changed from the base of the PR and between 038c418 and 28a62c9.

📒 Files selected for processing (1)

bindata/assets/openshift-controller-manager/deploy.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

bindata/assets/openshift-controller-manager/deploy.yaml

Walkthrough

Deployment and operator changes: controller-manager now requests the hostaccess SCC, mounts host /etc/containers read-only, sets REGISTRY_AUTH_FILE=/var/run/secrets/image-auth/auth.json, adds RBAC for hostaccess, syncs the global pull-secret into the operator namespace, updates tests, and marks the namespace pod-security as privileged.

Changes

Controller manager hostaccess and related wiring

Layer / File(s)	Summary
Security context and host volume mount `bindata/assets/openshift-controller-manager/deploy.yaml`, `bindata/assets/openshift-controller-manager/ns.yaml`	Pod annotation changed to request `hostaccess`; added `REGISTRY_AUTH_FILE` env; added read-only `etc-containers` volumeMount at `/etc/containers`; defined `etc-containers` hostPath volume; namespace pod-security annotations set to `privileged`.
RBAC Role and RoleBinding `bindata/assets/openshift-controller-manager/scc-role.yaml`, `bindata/assets/openshift-controller-manager/scc-rolebinding.yaml`	Added `system:openshift-controller-manager:hostaccess-role` Role permitting `use` on `securitycontextconstraints/hostaccess` and a RoleBinding for `openshift-controller-manager-sa`.
Operator startup and static assets `pkg/operator/starter.go`	Configure `resourceSyncer.SyncSecret` for `openshift-config/pull-secret` → operator namespace; include new SCC Role and RoleBinding manifests in static resource controller assets.
Tests: expected config and env updates `pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go`	Reformat expected controller lists in config tests and add `REGISTRY_AUTH_FILE` entries to expected env in `TestDeploymentWithProxy` cases; small formatting edits.

🎯 4 (Complex) | ⏱️ ~45 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Topology-Aware Scheduling Compatibility	⚠️ Warning	bindata/assets/openshift-controller-manager/deploy.yaml sets nodeSelector node-role.kubernetes.io/master and required podAntiAffinity (requiredDuringSchedulingIgnoredDuringExecution) without any Co...	Make controller-manager scheduling topology-aware: remove/replace node-role.kubernetes.io/master nodeSelector and required hostname anti-affinity (or adjust with topology-aware hooks and safe maxUnavailable), so it works on SNO/TNF/TNA/H...
No-Sensitive-Data-In-Logs	❓ Inconclusive	Could not access PR diff to scan all changed files; in pkg/operator/starter.go the new pull-secret sync only returns fmt.Errorf and does not log secret contents.	Authenticate/obtain PR diff and scan all modified files for log/printf/klog/zap calls that include or interpolate secrets/tokens/PII.

✅ Passed checks (12 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and concisely describes the main change: mounting /etc/containers in the controller-manager deployment to enable cluster-wide image registry configuration support. The title is specific, directly relates to the core changeset modifications, and uses the issue reference (OCPBUGS-77404) for tracking.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	Repo Ginkgo tests are only in test/e2e/*.go; their It/Describe titles are static strings (e.g., [Skipped:MicroShift], [Timeout:30m]) with no pod/node/namespace/IP/timestamps.
Test Structure And Quality	✅ Passed	The repo/PR changes include no Ginkgo/Gomega tests (no Describe/It/BeforeEach/Eventually/Consistently/Expect); the modified test file uses standard Go `testing` funcs only.
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests; PR only updates manifests and a Go unit test file, and the updated test file contains no “ginkgo” usage. citeturn0view0turn5view0
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR updates operator manifests and a unit test file; inspected sync_openshiftcontrollermanager_v311_00_test.go and found no Ginkgo e2e (Describe/It/Context) usage, so SNO assumptions check is not ap...
Ote Binary Stdout Contract	✅ Passed	PR changes are YAML and starter.go/test formatting; in the touched Go files, there are no fmt.Print/os.Stdout writes, only klog usage (no new stdout contract violation detected).
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR `#439` updates bindata SCC/deployment manifests and a pkg/operator unit test; no new Ginkgo e2e tests were added/touched, so the IPv6/disconnected test-compatibility check is not applicable.
No-Weak-Crypto	✅ Passed	Scanned PR-mentioned files (starter.go, test, and SCC/deploy/ns YAMLs) for MD5/SHA1/DES/RC4/3DES/Blowfish/ECB and crypto package indicators; no weak-crypto keywords/imports found.
Container-Privileges	✅ Passed	controller-manager container securityContext sets allowPrivilegeEscalation: false and drops ALL capabilities; no privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN fields present in deploy.yaml.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-08T15:22:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kaleemsiddiqu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

everettraven · 2026-06-08T15:25:31Z

              path: tls-ca-bundle.pem
      - emptyDir: {}
        name: tmp
+      - hostPath:


Open Question: Does this automatically handle pull secrets for private registries as well?

It doesn't, but I've opened openshift/openshift-controller-manager#447 to help with this.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

bindata/assets/openshift-controller-manager/deploy.yaml (2)
60-63: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add resource limits for memory and CPU.

The coding guidelines require resource limits on every container. Currently, only resource requests are defined. Without limits, the controller-manager pod could consume unbounded resources, potentially impacting control plane stability.
📊 Proposed fix to add resource limits
         resources:
           requests:
             memory: 100Mi
             cpu: 100m
+          limits:
+            memory: 200Mi
+            cpu: 200m
Note: Adjust limit values based on observed resource usage patterns.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/assets/openshift-controller-manager/deploy.yaml` around lines 60 -
63, The resources block currently only sets requests (memory: 100Mi, cpu: 100m);
add a limits section alongside requests to enforce upper bounds (e.g., limits:
memory: 200Mi, cpu: 200m or other values tuned to observed usage) so the
openshift-controller-manager container cannot consume unbounded resources;
update the same resources mapping (the resources key for the controller-manager
container) to include a limits subsection with both memory and cpu entries.
Source: Coding guidelines

49-54: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit runAsNonRoot: true to the container securityContext.

While the hostaccess SCC may enforce non-root execution, the manifest should explicitly declare runAsNonRoot: true for defense in depth and to match the PR's stated goal of "preventing running the container as root." Without this setting, the security guarantee depends entirely on the SCC, which could change.
🔒 Proposed fix to add explicit runAsNonRoot
       - name: controller-manager
         securityContext:
+          runAsNonRoot: true
           readOnlyRootFilesystem: true
           allowPrivilegeEscalation: false
           capabilities:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/assets/openshift-controller-manager/deploy.yaml` around lines 49 -
54, The container's securityContext is missing an explicit runAsNonRoot setting;
update the container spec's securityContext (the existing securityContext block
that contains readOnlyRootFilesystem, allowPrivilegeEscalation, and
capabilities.drop) to add runAsNonRoot: true so the pod manifest explicitly
prevents running as UID 0 (defense in depth) while preserving the other keys
(readOnlyRootFilesystem, allowPrivilegeEscalation, capabilities.drop).
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bindata/assets/openshift-controller-manager/deploy.yaml`:
- Line 36: The Deployment currently sets openshift.io/required-scc: hostaccess
which is too permissive—either replace it with the restricted-v3 SCC or wire a
purpose-built custom SCC and document why elevated host access is required;
additionally, in the podSpec/container securityContext ensure runAsNonRoot: true
is set, explicitly set automountServiceAccountToken: false at the podSpec level,
and add resources.limits for the controller container to complement existing
resources.requests so limits are enforced.

---

Outside diff comments:
In `@bindata/assets/openshift-controller-manager/deploy.yaml`:
- Around line 60-63: The resources block currently only sets requests (memory:
100Mi, cpu: 100m); add a limits section alongside requests to enforce upper
bounds (e.g., limits: memory: 200Mi, cpu: 200m or other values tuned to observed
usage) so the openshift-controller-manager container cannot consume unbounded
resources; update the same resources mapping (the resources key for the
controller-manager container) to include a limits subsection with both memory
and cpu entries.
- Around line 49-54: The container's securityContext is missing an explicit
runAsNonRoot setting; update the container spec's securityContext (the existing
securityContext block that contains readOnlyRootFilesystem,
allowPrivilegeEscalation, and capabilities.drop) to add runAsNonRoot: true so
the pod manifest explicitly prevents running as UID 0 (defense in depth) while
preserving the other keys (readOnlyRootFilesystem, allowPrivilegeEscalation,
capabilities.drop).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0f6912d5-6a19-4d54-a6d9-64aca0dbe28d

📥 Commits

Reviewing files that changed from the base of the PR and between 34f95b0 and a7aa392.

📒 Files selected for processing (1)

bindata/assets/openshift-controller-manager/deploy.yaml

coderabbitai · 2026-06-08T15:28:49Z

      annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
-        openshift.io/required-scc: restricted-v3
+        openshift.io/required-scc: hostaccess


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for existing custom SCC definitions in the repository rg -n "kind:\s*SecurityContextConstraints" -A 20

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 81

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="bindata/assets/openshift-controller-manager/deploy.yaml" echo "== File size ==" wc -l "$FILE" echo "== SCC annotation occurrences (context) ==" rg -n "openshift\.io/required-scc|required-scc" "$FILE" -n echo "== runAsNonRoot occurrences ==" rg -n "runAsNonRoot" "$FILE" || true echo "== resources.limits occurrences ==" rg -n "resources:|limits:" "$FILE" echo "== Extract: container securityContext/resources/volumes around likely sections ==" # show around where the reviewer cited lines (roughly); adjust if file differs sed -n '1,130p' "$FILE" | nl -ba | sed -n '1,130p' sed -n '90,180p' "$FILE" | nl -ba | sed -n '1,120p' echo "== hostPath volume /etc/containers mount check ==" rg -n "etc/containers|hostPath|readOnly" "$FILE" echo "== Anti-affinity / topology-related scheduling constraints ==" rg -n "antiAffinity|requiredDuringSchedulingIgnoredDuringExecution|topologyKey|hostname" "$FILE" || true

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 499

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for any custom SCC definitions/usage in repo ==" rg -n "SecurityContextConstraints" -S . || true rg -n "hostaccess|openshift\.io/required-scc" -S . || true rg -n "SecurityContextConstraint" -S . || true

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 19745

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="bindata/assets/openshift-controller-manager/deploy.yaml" echo "== Extract around SCC (lines 1-60) ==" awk 'NR<=60{print NR ":" $0}' "$FILE" echo "== Extract around resources (55-95) ==" awk 'NR>=55 && NR<=95{print NR ":" $0}' "$FILE" echo "== Extract around hostPath/volumes (95-140) ==" awk 'NR>=95 && NR<=140{print NR ":" $0}' "$FILE" echo "== Extract last lines (141-200) ==" awk 'NR>=141 && NR<=200{print NR ":" $0}' "$FILE" echo "== Field presence checks ==" for pat in \ "runAsNonRoot" \ "allowPrivilegeEscalation" \ "readOnlyRootFilesystem" \ "capabilities:" \ "automountServiceAccountToken" \ "limits:" \ "livenessProbe" \ "readinessProbe" \ "hostPID" \ "hostNetwork" \ "hostIPC" \ "privileged" \ "antiAffinity" \ "topologyKey"; do echo "-- $pat" rg -n "$pat" "$FILE" || true done echo "== Search for custom SCC manifests outside vendor ==" rg -n --glob '!vendor/**' "kind:\s*SecurityContextConstraints" -S . || true

Repository: openshift/cluster-openshift-controller-manager-operator

Length of output: 6288

Major security hardening gaps in controller-manager OpenShift Deployment

openshift.io/required-scc: hostaccess

hostaccess is less restricted than restricted-v3 and doesn’t align with the guideline requiring a restricted/custom-scoped SCC—either justify the need or wire a custom SCC.

Missing runAsNonRoot: true at pod/container securityContext (defense-in-depth).

Missing resources.limits (only resources.requests are set).

automountServiceAccountToken is not explicitly set to false.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bindata/assets/openshift-controller-manager/deploy.yaml` at line 36, The Deployment currently sets openshift.io/required-scc: hostaccess which is too permissive—either replace it with the restricted-v3 SCC or wire a purpose-built custom SCC and document why elevated host access is required; additionally, in the podSpec/container securityContext ensure runAsNonRoot: true is set, explicitly set automountServiceAccountToken: false at the podSpec level, and add resources.limits for the controller container to complement existing resources.requests so limits are enforced.

Source: Coding guidelines

by syncing the global pull-secret to the openshift-controller-manager namespace mounted as a volume on the openshift-controller-manager pods and setting the REGISTRY_AUTH_FILE environment variable to the mounted file. Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

ehearne-redhat · 2026-06-09T10:57:26Z

I think we need to remove this as e2e-upgrade reports

event happened 36 times, something is wrong: namespace/openshift-controller-manager hmsg/827f4eb8d4 replicaset/controller-manager-99bbc9cc8 - reason/FailedCreate Error creating: pods "controller-manager-99bbc9cc8-" is forbidden: unable to validate against any security context constraint: [metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: Forbidden: seccomp may not be set, metadata.annotations[container.seccomp.security.alpha.kubernetes.io/controller-manager]: Forbidden: seccomp may not be set] (21:59:34Z) result=reject }

ehearne-redhat · 2026-06-09T11:02:03Z

@bpalmer The changes in general look fine to me, the seccomp concern is the only one I have. Could you take a look at that and let me know if this is an issue? I dont see any mention https://docs.redhat.com/en/documentation/openshift_container_platform/4.22/html/authentication_and_authorization/managing-pod-security-policies#default-sccs_configuring-internal-oauth about it not being allowed.

However it says privileged containers cannot use seccompprofile at https://docs.redhat.com/en/documentation/openshift_container_platform/4.22/html/security_and_compliance/seccomp-profiles .

…ment Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go (1)
752-755: ⚡ Quick win

Assert the new mount contract alongside REGISTRY_AUTH_FILE.

These updates only verify the env var. The feature also depends on the matching secret volume/volumeMount for /var/run/secrets/image-auth/auth.json and the /etc/containers host mount, so those regressions would still pass this suite.

This follows the PR objective to mount both the global pull-secret and /etc/containers into the deployment.

Also applies to: 788-791, 819-822
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go` around lines
752 - 755, The test currently only asserts the REGISTRY_AUTH_FILE env var; also
assert that the deployment spec includes a Volume referencing the global
pull-secret (a Secret volume whose SecretName equals the cluster pull secret)
and a corresponding VolumeMount mounting that secret to
/var/run/secrets/image-auth/auth.json, and assert a HostPath (or projected host)
Volume and VolumeMount for /etc/containers so the host config is mounted into
the container; update the assertions around REGISTRY_AUTH_FILE (and the other
similar blocks) to check for a Volume with the secret name, a VolumeMount with
MountPath "/var/run/secrets/image-auth" (or the exact file mount), and a
Volume/VolumeMount that mounts "/etc/containers" from the host into the pod spec
to ensure both secret and host mounts are present.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bindata/assets/openshift-controller-manager/scc-rolebinding.yaml`:
- Around line 7-9: The RoleBinding's roleRef is missing the apiGroup; update the
RoleBinding where roleRef.name is
"system:openshift-controller-manager:hostaccess-role" (the hostaccess
RoleBinding) to include roleRef.apiGroup: rbac.authorization.k8s.io so roleRef
has keys kind, name, and apiGroup like other RoleBindings (e.g.,
route-controller-manager-leader-rolebinding); modify the manifest defining
roleRef to add the apiGroup field accordingly.

---

Nitpick comments:
In `@pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go`:
- Around line 752-755: The test currently only asserts the REGISTRY_AUTH_FILE
env var; also assert that the deployment spec includes a Volume referencing the
global pull-secret (a Secret volume whose SecretName equals the cluster pull
secret) and a corresponding VolumeMount mounting that secret to
/var/run/secrets/image-auth/auth.json, and assert a HostPath (or projected host)
Volume and VolumeMount for /etc/containers so the host config is mounted into
the container; update the assertions around REGISTRY_AUTH_FILE (and the other
similar blocks) to check for a Volume with the secret name, a VolumeMount with
MountPath "/var/run/secrets/image-auth" (or the exact file mount), and a
Volume/VolumeMount that mounts "/etc/containers" from the host into the pod spec
to ensure both secret and host mounts are present.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1df28a42-a6e6-4738-bd20-49f2979e7e19

📥 Commits

Reviewing files that changed from the base of the PR and between a7aa392 and 1ed86f9.

📒 Files selected for processing (5)

bindata/assets/openshift-controller-manager/deploy.yaml
bindata/assets/openshift-controller-manager/scc-role.yaml
bindata/assets/openshift-controller-manager/scc-rolebinding.yaml
pkg/operator/starter.go
pkg/operator/sync_openshiftcontrollermanager_v311_00_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

bindata/assets/openshift-controller-manager/deploy.yaml

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

everettraven · 2026-06-11T13:58:04Z

/jira refresh

openshift-ci-robot · 2026-06-11T13:58:10Z

@everettraven: This pull request references Jira Issue OCPBUGS-77404, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jitsingh@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

everettraven · 2026-06-11T20:11:56Z

This last failure has a really odd failure. It says the openshift-controller-manager-sa service account doesn't exist but I can see an event from when it was created and the audit logs show no sign of that service account being deleted (or created).

Going to see if maybe this was a fluke.

/retest-required

ehearne-redhat · 2026-06-12T09:46:59Z

@everettraven - do we need to add pod-security.kubernetes.io annotations to https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/bindata/assets/openshift-controller-manager/ns.yaml as the pod is now privileged ?

It says here that pods that

Pods that do not comply with the pod security admission defined globally or at the namespace level are not admitted to the cluster and cannot run.

It says here that

All system-created namespaces that are prefixed with openshift- , except for openshift-operators

have pod security admission synchronization disabled permanently.

Looking at the must-gather, I can see pod security admission is set to restricted.

spec:
  logLevel: Normal
  managementState: Managed
  observedConfig:
    admission:
      pluginConfig:
        PodSecurity:
          configuration:
            defaults:
              audit: restricted
              audit-version: latest
              enforce: restricted
              enforce-version: latest
              warn: restricted
              warn-version: latest

Kubernetes docs states under this configuration, we need RuntimeDefault seccomp profile. Since we are running in hostaccess SCC now, we cannot use this profile.

Looking at the same doc, the minimum we can set pod-security.kubernetes.io annotations to is privileged because of the hostaccess SCC.

So, we will probably need to change the namespace to use these labels:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-controller-manager
  annotations:
    openshift.io/node-selector: ""
    workload.openshift.io/allowed: "management"
  labels:
    openshift.io/cluster-monitoring: "true"
    openshift.io/run-level: "" # specify no run-level turns it off on install and upgrades
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged

I have found an example of an openshift-* namespace using these labels here.

everettraven · 2026-06-12T12:31:47Z

@ehearne-redhat Yeah, we will need to do that but I don't think that is what is causing the failures here. I'm trying to sort out why we are seeing an issue with pod creation related to non-existence of the service account. Something seems to be removing the service account before we attempt to create the pods from the deployment here.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

openshift-ci-robot · 2026-06-12T12:33:49Z

@everettraven: This pull request references Jira Issue OCPBUGS-77404, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jitsingh@redhat.com), skipping review request.

Details

In response to this:

Description

Fixes OCPBUGS-77404 by updating the openshift-controller-manager deployment to mount the /etc/containers directory from the host that the image controllers using the containers/image package will read for registry configurations.

The Machine Configuration Operator is responsible for syncing cluster configurations through the image APIs to this directory on the nodes.

In order to do this, the SCC needed to be adjusted to one that allowed host path mounting. To limit the changes as much as possible I updated the SCC to hostaccess from restricted-v3 so that we can mount the host path but prevent running the container as root, maintaining the user namespace security boundary.

Summary by CodeRabbit

Chores

Controller now requests the host-access security constraint and is granted that access via a new role and rolebinding.

Operator synchronizes the cluster pull-secret into the controller namespace at startup.

New Features

Controller mounts host container configuration read-only and exposes a registry auth file via an environment variable for image pulls.

Chores

Controller namespace annotated for privileged pod-security.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ehearne-redhat · 2026-06-12T14:54:07Z

 	go staticResourceController.Run(ctx, 1)
 	go operator.Run(ctx, 1)


@everettraven I believe there is a race condition as to when the manifests are applied. They appear to be applied concurrently, not sequentially, as the deployment and SA/SCCs/etc are managed by two separate controllers.

We probably need to add a check somewhere near https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/sync_openshiftcontrollermanager_v311_00.go#L134 to check if the service account + related resources are present before applying the deployment manifest.

Adding something like...

// Verify service account exists before creating deployment _, err = c.kubeClient.CoreV1().ServiceAccounts(util.TargetNamespace).Get(context.TODO(), "openshift-controller-manager-sa", metav1.GetOptions{}) if err != nil { ocmErrors = append(ocmErrors, fmt.Errorf("%q: waiting for service account: %w", operandName, err)) return syncReturn(c, ocmErrors, rcmErrors, originalOperatorConfig, operatorConfig) } // repeat the same for scc + rolebinding too.

should ensure we don't hit the race condition.

I don't think a race condition is the problem we are seeing.

I'm seeing an event propagate that says the openshift-controller-manager-sa service account is getting created but there appears to be no evidence of it existing in the must gather that is collected from the test clusters.

Even if the controller enters a degraded state for some amount of time, eventual consistency should ensure that eventually the service account we are waiting for exists and the deployment will continue to roll out.

Usually when I check must gather I never see serviceaccounts.yaml or something like that in most namespaces. Only some namespaces I have seen this file.

I see this in /core/events.yaml

message: 'Error creating: pods "controller-manager-f4dc56774-" is forbidden: error looking up service account openshift-controller-manager/openshift-controller-manager-sa: serviceaccount "openshift-controller-manager-sa" not found'

message: 'Error creating: pods "controller-manager-f4dc56774-" is forbidden: unable to validate against any security context constraint: provider "hostaccess": Forbidden: not usable by user or serviceaccount'

but then it follows up with

message: 'Created pod: controller-manager-f4dc56774-xbj7h'

There are also messages saying it can't find the config configMap, serving-cert secret too.

And also

message: 'MountVolume.SetUp failed for volume "image-auth" : references non-existent secret key: .dockercfgjson'

What's interesting is the 19s difference in timestamp between pod creation and deletion.

lastTimestamp: "2026-06-12T13:37:31Z" message: 'Created pod: controller-manager-f4dc56774-xbj7h'

lastTimestamp: "2026-06-12T13:37:50Z" message: 'Deleted pod: controller-manager-f4dc56774-xbj7h'

I'm thinking it wouldn't hurt to add checks to make sure these resources exist before we apply the deploy.yaml manifest. The worst thing that could happen is that we discover the controller works fine. :D

For me, the logs show that manifests are missing which cause errors. If we can make sure the manifests are there first before applying, we can at least rule out that scenario. We could be looking at different logs however... :)

Ah, I think you found the smoking gun with those events. I hadn't yet found time to dig deeper into the events timeline yet so thanks for that. The wrong key reference definitely could cause this.

Also TIL that the service accounts don't seem to show up in the must gathers :). I wonder why that is?

I spent a little time investigating why this is, and it actually comes down to not even using the must-gather command at all!

So, the interesting thing is, service accounts are captured in some namespaces.

ehearne-mac:quay-proxy-ci-openshift-org-openshift-ci-sha256-9da0e7fd8e430ccec3c8da980b0718e0f826c599f580a98189679e9c584370f1 ehearne$ find . -path "*/serviceaccounts/*" ./namespaces/openshift-cluster-olm-operator/core/serviceaccounts/cluster-olm-operator.yaml ./namespaces/openshift-cloud-network-config-controller/core/serviceaccounts/cloud-network-config-controller.yaml ./namespaces/openshift-operator-controller/core/serviceaccounts/operator-controller-controller-manager.yaml ./namespaces/openshift-cluster-csi-drivers/core/serviceaccounts/aws-ebs-csi-driver-operator.yaml ./namespaces/openshift-multus/core/serviceaccounts/multus.yaml ./namespaces/openshift-multus/core/serviceaccounts/multus-ancillary-tools.yaml ./namespaces/openshift-multus/core/serviceaccounts/metrics-daemon-sa.yaml ./namespaces/openshift-multus/core/serviceaccounts/multus-ac.yaml ./namespaces/openshift-multus/core/serviceaccounts/multus-node-identity.yaml ./namespaces/openshift-network-operator/core/serviceaccounts/iptables-alerter.yaml ./namespaces/openshift-network-console/core/serviceaccounts/networking-console-plugin.yaml ./namespaces/openshift-network-node-identity/core/serviceaccounts/network-node-identity.yaml ./namespaces/openshift-cloud-credential-operator/core/serviceaccounts/pod-identity-webhook.yaml ./namespaces/openshift-cluster-storage-operator/core/serviceaccounts/csi-snapshot-controller-operator.yaml ./namespaces/openshift-insights/core/serviceaccounts/gather.yaml ./namespaces/openshift-insights/core/serviceaccounts/operator.yaml ./namespaces/openshift-ovn-kubernetes/core/serviceaccounts/ovn-kubernetes-control-plane.yaml ./namespaces/openshift-ovn-kubernetes/core/serviceaccounts/ovn-kubernetes-node.yaml ./namespaces/openshift-network-diagnostics/core/serviceaccounts/network-check-target.yaml ./namespaces/openshift-network-diagnostics/core/serviceaccounts/network-diagnostics.yaml ehearne-mac:quay-proxy-ci-openshift-org-openshift-ci-sha256-9da0e7fd8e430ccec3c8da980b0718e0f826c599f580a98189679e9c584370f1 ehearne$

It comes down to what namespaces use oc adm inspect --namespace as oc adm inspect when ns is specified will gather all resources into the dump, including service accounts.

https://github.com/openshift/oc/blob/40ce70fca070aafb0273563ce5a7f0a5ba1fcdb2/pkg/cli/admin/inspect/namespace.go#L22

Any namespace that has update-service-operator will get serviceaccounts included to inspect in must gather as a result.

https://github.com/openshift/must-gather/blob/16ac27eedcc79f5b57d77eb01e6b187b84fe7daa/collection-scripts/gather_osus#L9

Funnily enough, this was included as part of the first PR for must-gather :) openshift/must-gather#1 .

ehearne-redhat · 2026-06-12T16:30:15Z

+        secret:
+          secretName: pull-secret
+          items:
+            - key: .dockercfgjson


should this be .dockerconfigjson ?

https://github.com/openshift/openshift-controller-manager/blob/master/vendor/k8s.io/api/core/v1/types.go#L7946-L7953

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

openshift-ci · 2026-06-12T20:02:09Z

@everettraven: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn	`28a62c9`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-upgrade	`28a62c9`	link	true	`/test e2e-upgrade`
ci/prow/e2e-aws-operator	`28a62c9`	link	true	`/test e2e-aws-operator`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

bugfix: mount /etc/containers so image controllers respect cluster-wi…

a7aa392

…de image registry configurations Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

openshift-ci Bot requested review from bertinatto and ricardomaraschini June 8, 2026 15:22

everettraven commented Jun 8, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

everettraven added 3 commits June 8, 2026 13:44

assets: add hostaccess use permissions

b41e22c

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

testing: update tests with deployment changes

78b69b8

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

ehearne-redhat reviewed Jun 9, 2026

View reviewed changes

fixup! remove seccompprofile from openshift-controller-manager deploy…

1ed86f9

…ment Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread bindata/assets/openshift-controller-manager/scc-rolebinding.yaml

everettraven added 2 commits June 10, 2026 11:07

fixup! add apigroup to rolebinding

38f570c

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

fixup! inverted pull secret syncing

1643cc1

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 11, 2026

fixup! Add PSA privileged labels to ns

038c418

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

ehearne-redhat reviewed Jun 12, 2026

View reviewed changes

fixup! dockerconfigjson

28a62c9

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>

		go staticResourceController.Run(ctx, 1)
		go operator.Run(ctx, 1)

Uh oh!

Conversation

everettraven commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Jun 8, 2026

Description

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Pre-merge checks failed

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

openshift-ci Bot commented Jun 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehearne-redhat commented Jun 9, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

everettraven commented Jun 11, 2026

Uh oh!

openshift-ci-robot commented Jun 11, 2026

Uh oh!

everettraven commented Jun 11, 2026

Uh oh!

ehearne-redhat commented Jun 12, 2026

Uh oh!

everettraven commented Jun 12, 2026

Uh oh!

openshift-ci-robot commented Jun 12, 2026

Description

Summary by CodeRabbit

Uh oh!

ehearne-redhat Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehearne-redhat Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

everettraven commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

ehearne-redhat Jun 12, 2026 •

edited

Loading

ehearne-redhat Jun 15, 2026 •

edited

Loading