☁️ Cloud & DevOps

Case Study: Kubernetes Gateway API Migration on AWS

Marcus Cole
Marcus Cole
Cloud & DevOps Lead

Platform engineer who's been through every infrastructure era — bare metal, VMs, containers, serverless. Has strong opinions about YAML files and even stronger opinions about over-engineering.

AWS Load Balancer ControllerIngress annotationsplatform engineeringcloud native infrastructuretraffic routing

If you have spent enough time operating distributed systems, you know the dread of the 3 AM pager alarm. It is rarely a complete hardware failure that wakes you up. More often than not, it is a configuration error. Someone fat-fingered a string, missed a comma, or pasted a slightly malformed JSON block into a Kubernetes Ingress annotation, and suddenly, production traffic is dropping into a black hole.

For years, we have accepted this as the cost of doing business in cloud native infrastructure. But the reality is that relying on untyped, unvalidated string annotations to configure mission-critical load balancers is like steering a ship by writing instructions on sticky notes and slapping them on the captain's wheel. It works until a note falls off, and then you hit a reef.

With the recent General Availability of Kubernetes Gateway API support in the AWS Load Balancer Controller, we finally have a way to fix the plumbing properly. This is not just another shiny tool to add to your platform engineering stack; it is a fundamental correction of a long-standing architectural flaw in how we handle traffic routing.

Let's break down how we approached this migration, why it matters, and how you can apply these lessons to build more resilient infrastructure.

The Reality Check: The Ingress Annotation Nightmare

Before we look at the solution, we have to acknowledge the pain of the past. The original Kubernetes Ingress resource was designed to be simple—perhaps too simple. It provided a basic way to route HTTP traffic to services. But enterprise infrastructure is never simple. We needed custom health checks, SSL redirection, specific load balancing algorithms, and sticky sessions.

Because the Ingress API lacked these fields, controller maintainers (like the AWS Load Balancer Controller team) had to hack them in using annotations.

We ended up with YAML files that looked like this:

# The Old Way: A fragile wall of text
metadata:
  annotations:
    alb.ingress.kubernetes.io/target-group-attributes: load_balancing.algorithm.type=least_outstanding_requests,deregistration_delay.timeout_seconds=30
    alb.ingress.kubernetes.io/healthcheck-path: /healthz
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'

This is not configuration; this is string-encoded technical debt. There is no schema validation. Your IDE cannot help you. If you miss a quotation mark around that JSON array, kubectl apply will happily accept it, but the controller will fail silently or crash at runtime. By the time you realize traffic isn't routing, your customers are already seeing 502 Bad Gateway errors.

The Challenge: Scaling Traffic Routing Without Breaking Things

At BriefStack, our core problem was not the AWS Application Load Balancers (ALBs) themselves. AWS infrastructure is remarkably stable. The bottleneck was our deployment pipeline and the blast radius of configuration changes.

As our platform engineering practices matured—aligning with recent CNCF reports showing teams standardizing on robust delivery tools like Helm and Backstage—we realized our traffic routing layer was the weakest link.

We had dozens of development teams deploying services into shared clusters. Because Ingress resources often share the same underlying ALB to save costs, one team deploying a malformed annotation could corrupt the listener rules for the entire load balancer, taking down unrelated services.

We needed three things:
1. Type Safety: Configuration must be validated before it is applied.
2. Role Separation: Application developers should not be able to break infrastructure components.
3. Consistency: A unified API for both Layer 4 (TCP/UDP) and Layer 7 (HTTP/gRPC) traffic.

Under the Hood: The Harbor Logistics of Gateway API

To understand why the Kubernetes Gateway API solves this, we need to step away from the code and look at how physical systems handle complex routing. Think of your Kubernetes cluster as a busy commercial harbor.

In the old Ingress model, every ship captain (developer) walked up to the dock and shouted their own docking procedures, unloading requirements, and cargo destinations at the workers. Chaos ensued.

The Gateway API introduces a structured logistics chain with clear separation of duties:

1. GatewayClass (The Port Authority): Defines the types of docks available (e.g., AWS ALB vs. internal NGINX). This is managed by the infrastructure provider.
2. Gateway (The Dock Master): Provisions a specific physical dock, assigns an IP address, and opens specific ports (e.g., Port 443 for HTTPS). This is managed by the cluster operator.
3. HTTPRoute (The Cargo Handler): Determines that cargo matching specific criteria (URL paths, headers) goes to specific trucks (Kubernetes Services). This is managed by the application developer.

Traffic Routing Evolution: Ingress vs Gateway API Ingress (Monolithic) Ingress Resource - Messy String Annotations - Single Point of Failure Kubernetes Service Gateway API (Role-Oriented) Gateway (Platform Team) HTTPRoute (App Team) Kubernetes Service

By separating these concerns, the platform team can provision a secure, TLS-enabled Gateway, and development teams can attach their HTTPRoutes to it without ever touching the underlying load balancer configuration. If a developer writes a bad route, it only affects their specific cargo; the dock remains open for everyone else.

The Pragmatic Solution: Implementing the Shift

When we migrated to the new AWS Load Balancer Controller with Gateway API support, we committed to doing it the hard way: no hybrid configurations, no lingering annotations.

Before writing any YAML, we defined our objective: We wanted strict validation for our target group configurations so that deployment pipelines would fail before applying bad state to the cluster.

Here is how we replaced the fragile annotations with Custom Resource Definitions (CRDs). Notice how every parameter now has a specific type and structure.

# The Pragmatic Way: Type-safe TargetGroupConfiguration
apiVersion: elbv2.k8s.aws/v1alpha1
kind: TargetGroupConfiguration
metadata:
  name: auth-service-tg-config
  namespace: auth-team
spec:
  targetGroupAttributes:
    - key: load_balancing.algorithm.type
      value: least_outstanding_requests
    - key: deregistration_delay.timeout_seconds
      value: "30"
  healthCheck:
    path: /healthz
    intervalSeconds: 15

Because this is a CRD, Kubernetes validates the schema. If someone tries to pass an array where a string is expected, the Kubernetes API server rejects the request immediately. The feedback loop drops from "minutes of production downtime" to "milliseconds during a local commit."

Next, we bound this configuration to our application traffic using an HTTPRoute:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: auth-route
  namespace: auth-team
spec:
  parentRefs:
    - name: main-external-gateway
      namespace: platform-infra
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /auth
      backendRefs:
        - name: auth-service
          port: 8080

Notice the parentRefs block. The application team simply points to the main-external-gateway managed by the platform team. They do not need to know if it is an ALB, an NLB, or how the SSL certificates are bound. They just declare their routing intent.

Results & Numbers

Moving away from Ingress annotations wasn't just a theoretical exercise in clean architecture; it yielded measurable improvements in our operational stability.

MetricLegacy Ingress ModelGateway API Model
Config Validation TimeRuntime (Fails during ALB sync)Apply-time (Fails at kubectl apply)
Blast RadiusHigh (One bad annotation breaks the ALB)Low (Bad route isolates to one service)
Platform Support Tickets~15 per week (Routing issues)~2 per week
Lines of Config per App~40 lines of dense YAML~25 lines of readable YAML
Cross-Namespace RoutingRequired complex workaroundsNative support via parentRefs

By enforcing schema validation at the API server level, we eliminated the class of errors that previously caused our most frustrating outages.

Lessons Learned

Adopting the Kubernetes Gateway API is a major shift. Here is what we learned in the trenches:

What Worked:

  • GitOps Integration: Because the Gateway API relies on standard CRDs, our ArgoCD pipelines could finally perform proper dry-runs and drift detection on load balancer configurations.

  • Clear Boundaries: The friction between the platform team and development teams vanished. Developers owned their HTTPRoutes; we owned the Gateway.


What Didn't:
  • The Big Bang Cutover: You cannot simply delete an Ingress and apply a Gateway in the same breath without dropping traffic. The AWS controller has to provision new target groups and listeners. We had to run both in parallel, update DNS weights, and slowly bleed traffic over to the new Gateway.


What to Watch Out For:
  • CRD Versioning: The Gateway API has evolved through v1alpha1, v1beta1, and v1. Ensure your controllers and CRDs are perfectly aligned, or you will face mysterious compatibility errors.


Lessons for Your Team

If you are managing Kubernetes in production, stop building new infrastructure on the Ingress API. The ecosystem has moved on. The CNCF's recent data shows that mature platform engineering teams are standardizing on robust, validated tools, and the Gateway API is the definitive standard for cloud native routing.

Audit your existing Ingress annotations. Calculate the cost of a silent configuration failure. Then, start planning your migration to the Gateway API by setting up a parallel Gateway and migrating your lowest-risk service first.

Technology is just a tool for solving problems, and sometimes the best way to solve a problem is to remove the fragile magic and replace it with boring, validated fundamentals.

There is no perfect system. There are only recoverable systems.


Frequently Asked Questions

Why is the Kubernetes Gateway API replacing Ingress? Ingress was designed for simple HTTP routing and lacked native support for advanced load balancing features, forcing reliance on fragile, unvalidated string annotations. The Gateway API provides a structured, type-safe, and extensible model using Custom Resource Definitions (CRDs) that natively support complex routing, TCP/UDP traffic, and role-based access control.
Does the AWS Load Balancer Controller support both Ingress and Gateway API? Yes. The AWS Load Balancer Controller continues to support the legacy Ingress API while offering General Availability (GA) support for the Gateway API. This allows teams to run both side-by-side during migration phases without breaking existing workloads.
How does the Gateway API separate roles between teams? It uses distinct resources for different personas. Infrastructure providers manage GatewayClasses (the underlying load balancer type), platform operators manage Gateways (the actual load balancer instance and ports), and application developers manage HTTPRoutes (the rules directing traffic to specific services). This prevents developers from accidentally breaking infrastructure-level configurations.
Can I migrate from Ingress to Gateway API without downtime? Yes, but it requires careful DNS management. You should deploy the new Gateway API resources alongside your existing Ingress resources, verify the new load balancer provisions correctly, and then use weighted DNS routing (like Route 53) to gradually shift traffic from the old Ingress endpoint to the new Gateway endpoint before decommissioning the Ingress.

Related Posts

☁️ Cloud & DevOps
Backstage vs kro: Choosing Platform Engineering Tools
Mar 24, 2026
☁️ Cloud & DevOps
Kubernetes Workload Convergence: Why Siloed Clusters Are Dead
Mar 5, 2026
☁️ Cloud & DevOps
Platform Engineering 2026: AI, K8s, and Team Autonomy
Mar 5, 2026