☁️ Cloud & DevOps

Kubernetes Policy as Code: Kyverno & Argo CD

Marcus Cole
Marcus Cole
Cloud & DevOps Lead

Platform engineer who's been through every infrastructure era — bare metal, VMs, containers, serverless. Has strong opinions about YAML files and even stronger opinions about over-engineering.

GitOps with Argo CDKyverno policiesKubernetes securityadmission controller

We need to talk about the dark side of GitOps.

If you've spent any time in the modern infrastructure ecosystem, you've likely adopted GitOps with Argo CD. You've been told that declarative infrastructure is the promised land. You push a commit, Argo CD spots the drift, and your cluster state magically aligns with your repository. It feels incredibly smooth—until a tired engineer accidentally pushes a Deployment YAML that requests 500 CPUs, runs as root, and pulls an untested latest image tag.

Because Argo CD is dutiful and efficient, it will immediately sync that grenade into production. Your nodes will thrash, your pods will evict, and your phone will ring at 3 AM.

This is the reality check: GitOps without guardrails is just a highly efficient mechanism for deploying outages. The bottleneck in our infrastructure isn't how fast we can deploy code; it's how safely we can trust the code being deployed. We need Kubernetes policy as code.

The Core Problem: The Blind Forklift

Argo CD is not a security tool. It is a delivery mechanism.

Think of your Kubernetes cluster as a busy commercial harbor, and Argo CD is an automated forklift. The forklift's only job is to read a manifest (the shipping manifest in Git) and move containers off the ship and onto the dock. It doesn't care if the container is leaking toxic chemicals or if it's stacked upside down. If the paperwork says "move it," the forklift moves it.

When we rely solely on Git reviews to catch misconfigurations, we are relying on human perfection. Humans get tired. Humans copy-paste from Stack Overflow. We need a system that sits between the forklift and the dock—a pragmatic, automated inspector that enforces the laws of physics in our cluster.

Under the Hood: The Admission Controller Checkpoint

Before we start installing tools, let's look at how Kubernetes actually processes a deployment. We need to understand the 'magic' before we use it.

When Argo CD (or a human typing kubectl apply) sends a YAML manifest to Kubernetes, it hits the API Server. The API Server is essentially a glorified database with a REST interface. But before it saves that YAML into etcd (the cluster's brain), the request passes through a gauntlet called Admission Controllers.

Admission controllers are like the building inspectors of Kubernetes. They look at the blueprints before the concrete is poured. There are two types:
1. Mutating Admission Webhooks: These can modify the request (e.g., "You forgot to add a sidecar container, let me inject that for you").
2. Validating Admission Webhooks: These simply vote Yes or No. ("This pod is trying to run as root. Rejected.")

Kyverno is a policy engine that plugs directly into this webhook system.

Argo CD GitOps Sync YAML Kube API Server (The Front Door) Kyverno Admission Webhook If Validated etcd Cluster State If Rejected Deployment Fails

The Pragmatic Solution: Kyverno + Argo CD

Why Kyverno instead of OPA Gatekeeper? Pragmatism. Gatekeeper requires you to learn Rego, a specialized logic language. If a server is burning down at 3 AM, I do not want to parse a custom logic language to figure out why a pod won't schedule. Kyverno uses standard Kubernetes YAML. If you can write a Deployment, you can write a Kyverno policy.

Let's build a checkpoint. We are going to deploy Kyverno via Argo CD, and then we are going to write a policy that bans the latest image tag in our cluster.

Prerequisites

Before we start, you need:
  • A running Kubernetes cluster (Minikube, Kind, or a managed cloud cluster).
  • kubectl installed and configured.
  • Argo CD installed on your cluster.
  • A Git repository connected to your Argo CD instance.

Step 1: Deploying Kyverno via Argo CD

We don't install infrastructure manually. We use Argo CD to manage Kyverno itself. This ensures our policy engine is version-controlled and recoverable.

Why are we using an Argo CD Application custom resource here? Because it tells Argo CD exactly where to find the Helm chart for Kyverno and how to configure it. We are using the official Kyverno Helm repository.

Create a file named kyverno-app.yaml in your Git repository:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kyverno
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://kyverno.github.io/kyverno/
    targetRevision: 3.1.4 # Always pin your versions!
    chart: kyverno
    helm:
      parameters:
        - name: admissionController.replicas
          value: "3" # High availability is mandatory for admission controllers
  destination:
    server: https://kubernetes.default.svc
    namespace: kyverno
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Commit this file to your repository, and let Argo CD sync it. You now have the checkpoint installed.

Step 2: Writing a Pragmatic Policy

Now we need to write the law.

Why ban the latest tag? Because latest is mutable. If you deploy nginx:latest today, and nginx:latest tomorrow, you might get two completely different versions of Nginx. If tomorrow's version breaks your app, you cannot rollback, because rolling back just pulls latest again. It is an operational nightmare.

Here is how we tell Kyverno to block it. We will use a ClusterPolicy, which applies to all namespaces.

Create a file named disallow-latest-tag.yaml in your Git repository:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-latest-tag
  annotations:
    policies.kyverno.io/title: Disallow Latest Tag
    policies.kyverno.io/severity: medium
    policies.kyverno.io/description: >-
      The :latest tag is mutable and can lead to unexpected errors.
      This policy validates that the image tag is not latest.
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: require-image-tag
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "You cannot use the 'latest' tag. Pin your image to a specific version."
      pattern:
        spec:
          containers:
          - image: "*:!latest"

Let's break down the 'Why' in this YAML:

  • validationFailureAction: Enforce: This is the teeth of the policy. If set to Audit, Kyverno would just log a warning but let the pod run. We are setting it to Enforce to block the deployment entirely.

  • match: We are targeting Pod resources.

  • pattern: This is where Kyverno shines. We don't need to write complex code. We just provide a YAML structure that mimics a Pod. *:!latest means "any image registry/name, but the tag MUST NOT equal 'latest'."


Commit this file and let Argo CD deploy it.

Step 3: Verification

Let's prove the system works. We will try to bypass the checkpoint.

Create a file called bad-pod.yaml locally (do not put this in Git yet):

apiVersion: v1
kind: Pod
metadata:
  name: bad-nginx
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx:latest

Run kubectl apply -f bad-pod.yaml.

You should immediately see this error:

Error from server: error when creating "bad-pod.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Pod/default/bad-nginx was blocked due to the following policies 

disallow-latest-tag:
  require-image-tag: "You cannot use the 'latest' tag. Pin your image to a specific version."

The admission controller caught it. The API server rejected it. etcd remains clean. If Argo CD tried to deploy this, Argo CD would show a SyncFailed status with this exact error message, alerting the developer that their manifest is invalid.

Now, test the happy path. Change the image to nginx:1.24.0 and apply it. The pod will spin up without issue.

Troubleshooting

When you introduce admission controllers, you are putting a piece of software directly in the critical path of your cluster's API. If it breaks, deployments break.

Pitfall 1: Webhook Timeout Errors

  • Symptom: You try to deploy a pod and get an error like context deadline exceeded or failed calling webhook validate.kyverno.svc.

  • The Fix: This usually means the API server cannot reach the Kyverno pods. Check your network policies, ensure the Kyverno pods are actually running (kubectl get pods -n kyverno), and verify that you deployed Kyverno with multiple replicas for high availability.


Pitfall 2: Developers are angry because everything is blocked
  • Symptom: You applied 20 policies at once, and now no one can deploy anything.

  • The Fix: Pragmatism. When introducing Kyverno to an existing cluster, ALWAYS set validationFailureAction: Audit first. Let it run for a week. Check the Kyverno policy reports to see who is violating the rules. Work with those teams to fix their manifests, and only switch to Enforce when the audit logs are clean.


What You Built

You didn't just install a tool; you built a trust boundary. By combining Argo CD's declarative deployment model with Kyverno's policy enforcement, you've created a system where developers can move fast, but the cluster defends itself against configuration drift and human error. You've ensured that the "best practices" aren't just written in a wiki somewhere—they are laws enforced by the cluster itself.

FAQ

Why use Kyverno instead of OPA Gatekeeper? Kyverno uses native Kubernetes YAML for writing policies, whereas OPA Gatekeeper requires learning Rego. For teams that want to minimize the cognitive load and stick to tools they already know, Kyverno is much more pragmatic and easier to troubleshoot during an incident.
Will Kyverno slow down my cluster deployments? Technically, yes, because every API request must be validated by the Kyverno webhook. However, this latency is typically measured in milliseconds. Unless you have thousands of complex policies or an under-resourced cluster, the delay is imperceptible to users and CI/CD pipelines.
What happens if the Kyverno pods crash? By default, Kubernetes admission webhooks have a failurePolicy. If set to Fail, no resources can be created if Kyverno is down (secure but brittle). If set to Ignore, resources will bypass validation if Kyverno is down (highly available but less secure). Always run Kyverno with multiple replicas to mitigate this risk.
Can Kyverno modify resources instead of just blocking them? Yes. Kyverno supports 'Mutate' policies. For example, if a developer forgets to add a specific label or a sidecar container, Kyverno can automatically inject it into the manifest before it is saved to the cluster.

There is no perfect system. There are only recoverable systems.

📚 Sources

Related Posts

☁️ Cloud & DevOps
Kubernetes Policy Management: GitOps, Kyverno, and AI
Apr 2, 2026
☁️ Cloud & DevOps
Kubernetes Autoscaling: Karpenter vs Cluster Autoscaler
Apr 1, 2026
☁️ Cloud & DevOps
Mastering Kubernetes Observability for Autoscaling
Mar 31, 2026