☁️ Cloud & DevOps

Mastering Kubernetes Observability for Autoscaling

Marcus Cole
Marcus Cole
Cloud & DevOps Lead

Platform engineer who's been through every infrastructure era — bare metal, VMs, containers, serverless. Has strong opinions about YAML files and even stronger opinions about over-engineering.

Karpenter autoscalingKubernetes-native CI/CDreduce MTTRTekton pipelinesincident response

The Reality Check: When Your Dashboards Lie to You

It's 3:00 AM. Your pager goes off. You drag yourself out of bed, open your laptop, and stare at the monitoring dashboard. The alert is screaming in red: NODE_CPU_UTILIZATION > 90%.

You rub your eyes and dig into the cluster. But the application isn't crashing. Traffic is flowing fine. Error rates are at zero. You just got woken up because your autoscaler did exactly what it was designed to do—pack workloads tightly onto compute instances to save your company money.

This is the reality of modern infrastructure. We have spent the last decade building incredibly dynamic systems—where servers blink in and out of existence based on millisecond-level traffic fluctuations—yet we are still monitoring them like it's 2010. We are treating ephemeral compute nodes like they are bare-metal servers sitting in a rack down the hall.

If you want to survive the current ecosystem of microservices, dynamic autoscaling, and Kubernetes-native pipelines, your approach to Kubernetes observability has to fundamentally change. We need to stop asking "Is the server healthy?" and start asking "Is the system routing and scheduling work efficiently?"

Let's break down what's actually happening under the hood with tools like Karpenter and Tekton, and look at pragmatic ways to reduce MTTR (Mean Time To Recovery) without drowning in complexity.

The Core Problem: Static Metrics in a Dynamic World

Imagine running a busy restaurant kitchen. In the old days, you had exactly four stoves and four chefs. If a stove broke, or a chef got sick, you had a crisis. Monitoring meant keeping an eye on those four specific stoves.

Today, with tools like Karpenter handling Kubernetes autoscaling, your kitchen is magical. When fifty orders suddenly come in, a new stove and a new chef materialize out of thin air just in time to cook the food. When the rush is over, they vanish.

If you are still monitoring the stoves, you're missing the point. A stove disappearing isn't an outage; it's efficiency. What you actually need to monitor is the order ticket rail. How long is a ticket sitting there before a chef picks it up?

Under the Hood: Karpenter and Provisioning Latency

Karpenter has rapidly become the standard for Kubernetes autoscaling, largely replacing the legacy Cluster Autoscaler. Before you blindly trust it, you need to understand how it makes decisions.

The legacy Cluster Autoscaler worked by watching the autoscaling groups of your cloud provider. It was slow and relied on pre-defined node groups. Karpenter bypasses that entirely. It hooks directly into the Kubernetes scheduler. When a Pod is created but cannot be scheduled (due to lack of CPU/Memory), it enters a Pending state. Karpenter sees this unschedulable Pod, evaluates its exact resource requests, and asks the cloud provider for a custom-sized instance to fit that specific Pod "just in time."

Pending Pods Karpenter JIT Node Created Core Metric: Provisioning Latency

Because Karpenter constantly consolidates and terminates nodes to save money, tracking node_count or node_cpu_utilization will drive you crazy. Instead, your Kubernetes observability strategy must shift to provisioning intelligence.

The Pragmatic Solution for Autoscaling Observability

Stop alerting on infrastructure health and start alerting on workload friction.

1. Monitor Pod Scheduling Latency: How long does a Pod stay in the Pending state? If it's more than 60 seconds, your cloud provider is out of capacity, or your Karpenter provisioner is misconfigured.
2. Monitor Node Disruption Budgets: Are your applications safely draining when Karpenter decides to kill a node to save 4 cents an hour?
3. Track the Queue Depth: Alert on the number of unschedulable pods. If the queue is growing, the system is failing to react to demand.

The Hidden Costs of Kubernetes-Native CI/CD

Speaking of dynamic systems, let's talk about Tekton. The CNCF recently voted to accept Tekton as an incubating project. It's a Kubernetes-native framework for CI/CD that treats pipelines as standard Kubernetes resources (Custom Resource Definitions, or CRDs).

On paper, this sounds incredibly elegant. You don't need a separate Jenkins server; your cluster just runs pipelines as Pods. But as a pragmatist, I have to point out the dark side of this "elegance."

Under the Hood: The Etcd Bloat

When you trigger a Tekton pipeline, it creates a PipelineRun object. That object spawns multiple TaskRun objects. Those objects spawn Kubernetes Pods.

Every single one of these objects is stored in etcd, the key-value database that acts as the brain of your Kubernetes cluster. If you have a busy engineering team pushing code hundreds of times a day, Tekton is generating thousands of CRDs and Pods.

If you don't aggressively clean these up, your etcd database will bloat. When etcd slows down, the entire Kubernetes API slows down. Suddenly, Karpenter can't schedule nodes fast enough, deployments time out, and your cluster falls over—all because you kept a record of a successful unit test from three weeks ago.

PipelineRun TaskRun (Build) Pod (Stored in etcd) Without TTL cleanup, etcd database bloats and crashes.

The Pragmatic Solution for Tekton

Don't adopt Tekton just because it has the CNCF stamp of approval. Adopt it only if you genuinely need container-native, isolated build environments that scale with your cluster.

If you do use it, you must configure TTL (Time To Live) controllers immediately. The best code is the code you don't write, and the best Kubernetes resource is the one that deletes itself when it's no longer useful. Set your tekton.dev/prune labels so that successful pipeline runs are purged from the cluster after 24 hours. Send the logs to an external system like Loki or Datadog, and keep your cluster state clean.

Reducing MTTR in Customer-Facing Systems

When customer-facing systems fail, the clock starts ticking. In a microservices architecture, Mean Time To Recovery (MTTR) is the ultimate metric for brand protection. But how do you reduce MTTR when a single user request touches 40 different microservices?

I see teams buy expensive observability platforms, install agents everywhere, and end up with a dashboard containing 500 different charts. When an incident happens, they spend 45 minutes just trying to figure out which chart to look at. More data does not equal faster incident response. Context equals faster incident response.

Traditional vs. Pragmatic Observability

Let's look at how we need to shift our perspective to actually reduce MTTR.

Focus AreaTraditional MonitoringModern Kubernetes ObservabilityWhy It Matters for MTTR
ComputeNode CPU/Memory %Pod Scheduling LatencyTells you if the platform can accept new workloads.
TrafficTotal Requests/SecError Budgets & Route LatencyIdentifies exactly which user journeys are failing.
ScaleNumber of active nodesQueue Depth & Disruption RateReveals if the autoscaler is thrashing or stable.
Alerts"Database CPU is 85%""Checkout API is failing 5% of requests"Aligns engineering response with actual customer pain.

To reduce MTTR, you have to trace the edge. Start at the ingress controller or API gateway. If the customer is experiencing an error, that error will manifest at the edge. From there, use distributed tracing (like OpenTelemetry) to follow the request down the stack.

Do not start at the bottom (infrastructure) and try to guess how it affects the top (the customer). Start at the top and follow the broken pipe down.

What You Should Do Next

If you want to stabilize your systems and stop getting useless pages at 3 AM, here are the pragmatic steps you need to take this week:

1. Audit Your Alerts: Go through your alerting rules. Delete any alert tied to CPU or Memory utilization on autoscaled worker nodes. Replace them with alerts on pod_pending_time and HTTP 5xx error rates at your ingress.
2. Implement Resource Hygiene: If you are using Tekton, Argo Workflows, or Kubernetes Jobs, verify that you have TTL controllers active. Do a dry run of kubectl get pods --field-selector=status.phase=Succeeded to see how much garbage is sitting in your cluster.
3. Instrument the Edge: Ensure your ingress controllers are emitting RED metrics (Rate, Errors, Duration). When an incident occurs, this is your starting line.

The industry will constantly try to sell you on the next layer of abstraction. But underneath the magic, it's all just scheduling queues, network routing, and database state. Master those fundamentals, and you won't need to fear the pager.

There is no perfect system. There are only recoverable systems.

FAQ

Why shouldn't I alert on high CPU usage in Kubernetes? In a statically sized environment, high CPU means you are running out of capacity. In an autoscaled Kubernetes environment (using Karpenter), high CPU means the autoscaler is efficiently bin-packing workloads to save money. Alerting on this creates alert fatigue. You should alert on pending pods instead.
What is Karpenter and how does it differ from Cluster Autoscaler? Cluster Autoscaler relies on cloud provider Auto Scaling Groups (ASGs) and scales node groups up or down. Karpenter bypasses ASGs, observes unschedulable pods directly, and provisions custom-sized nodes "just in time" based on the exact requirements of the pending workloads.
How does Tekton cause etcd bloat? Tekton uses Kubernetes Custom Resource Definitions (CRDs) to manage CI/CD pipelines. Every pipeline run creates multiple objects and pods. If these are not cleaned up using a Time-To-Live (TTL) controller, they remain stored in etcd, eventually degrading the performance of the entire Kubernetes API.
What is the most effective way to reduce MTTR in microservices? Shift from infrastructure monitoring to edge-in observability. Start by monitoring the RED metrics (Rate, Errors, Duration) at your API Gateway or Ingress. When an issue occurs, use distributed tracing to follow the failing request down to the specific broken service, rather than guessing based on server metrics.

📚 Sources

Related Posts

☁️ Cloud & DevOps
Fixing Kubernetes Observability for Heavy AI Workloads
Mar 17, 2026
☁️ Cloud & DevOps
Dedicated vs Virtual Clusters: Which to Choose in 2026?
Mar 30, 2026
☁️ Cloud & DevOps
Mastering Kubernetes Virtual Clusters to Cut Costs
Mar 29, 2026