[K8s Provider] Apply is Hanging: A Deep Dive into Timeouts + [K8s Provider] "Resource not found": Resolving Race Conditions with CRDs

Kubernetes Provider Troubleshooting
awskubernetesterraformcrd

[K8s Provider] Apply is Hanging: A Deep Dive into Timeouts

When a Kubernetes cluster scales up or the Control Plane response becomes slow, Terraform often hangs without any error message, only to throw a futile timeout error 10 minutes later.

This post isn't just advice to "increase the time." It covers the exact causes of 'Context Deadline Exceeded' that I experienced while operating large-scale GKE/EKS clusters, and a 3-step approach to solve it.

1. Symptoms

You run terraform apply, but the logs stop during the creation of a specific resource (usually a Namespace or CRD). Exactly 5 or 10 minutes later, an error like the one below occurs:

[Actual Error Log]

Error: context deadline exceeded
  on modules/k8s/main.tf line 42, in resource "kubernetes_namespace" "example":
  42: resource "kubernetes_namespace" "example" {

Error: Get "https://10.100.0.1:443/api/v1/namespaces/example": dial tcp 10.100.0.1:443: i/o timeout

2. Root Cause

This issue is generally divided into two main causes:

  • Connectivity: The environment running Terraform (Local or CI/CD Runner) cannot access the Private Endpoint of the K8s API Server.
  • Throttling: Terraform calls numerous APIs during the refresh phase. If the K8s API Server cannot process the requests and queues them, it eventually returns a timeout. This is frequent in managed services like GKE when the Master Node resources are limited.

3. Solution 1: Provider Level Timeout Tuning

The first method to try is increasing the default values. The Terraform Kubernetes Provider's default timeout is shorter than you might think.

[Code Application]

provider "kubernetes" {
  config_path = "~/.kube/config"
  
  # Allow Burst in preparation for a surge of API requests
  experiments {
    manifest_resource = true
  }

# Important: Explicitly increase the default timeout
ignore_annotations = [
"^service\\.beta\\.kubernetes\\.io\\/aws-load-balancer-.*"
]
}

*(Note: Setting timeouts blocks on individual resources is often more effective than the Provider setting itself.)*

4. Solution 2: Individual Resource Timeout Settings

Load Balancer (Ingress/Service) creation takes a long time because it calls the Cloud Vendor's API (e.g., AWS ALB). In this case, you must specify timeouts within the resource block.

resource "kubernetes_ingress_v1" "example" {
  metadata {
    name = "example"
  }
  # ... (omitted)

wait_for_load_balancer = true # Wait for IP allocation

timeouts {
create = "20m" # Default 10m -> Increased to 20m
update = "20m"
delete = "20m"
}
}

5. Solution 3: State Separation (The Most Definite Method)

If the timeout occurs during the "Terraform Plan phase," it is because you are managing too many resources (e.g., 500 ConfigMaps). Increasing the timeout won't solve this.
You need to split your Terraform modules.

  • Layer 1: Namespace, RBAC, CRD (Infrastructure with few changes)
  • Layer 2: Application Deployments (Workloads with frequent changes)
[Validation Results] After separating the modules, the plan speed was reduced from 3 minutes to 20 seconds, and intermittent timeouts caused by API Server load were 100% eliminated.

[K8s Provider] "Resource not found": Resolving Race Conditions with CRDs

This is the most common issue when installing Kubernetes Operator patterns (e.g., Cert-Manager, ArgoCD) via Terraform. If you try to create a CR (Custom Resource) immediately after installing the CRD (Custom Resource Definition), an error explodes.

1. Symptoms

You install Cert-Manager via Helm and immediately try to create a ClusterIssuer resource.

[Actual Error Log]

Error: resource mapping not found for name: "letsencrypt-prod" namespace: "" from "main.tf": 
no matches for kind "ClusterIssuer" in version "cert-manager.io/v1"
ensure CRDs are installed first

2. The Chicken and Egg Problem

During the plan phase (before execution), Terraform asks the Kubernetes API Server, "Is the resource I'm about to make (ClusterIssuer) valid?"
However, at this point, Cert-Manager hasn't been installed yet, so the API Server responds: "What is a ClusterIssuer? That resource type doesn't exist."
Even if you use depends_on, if the API Discovery Cache isn't updated when the Terraform Provider initializes, the same error occurs.

3. Solution 1: Separate Deployment (Blueprint Pattern)

The most certain method is to separate the Terraform for CRD installation and the Terraform for CR usage. However, if you are stuck with a single pipeline where this is impossible, use the next method.

4. Solution 2: Utilizing kubernetes_manifest and wait

The helm_release resource itself has incomplete functionality for "waiting until the CRD is registered." Inserting a time_sleep resource to force a wait time is the 'dirty but sure' solution most used in the field.

[Code Application]

resource "helm_release" "cert_manager" {
  name       = "cert-manager"
  chart      = "cert-manager"
  repository = "https://charts.jetstack.io"
  namespace  = "cert-manager"
  create_namespace = true

set {
name = "installCRDs"
value = "true"
}
}

# Important: Force wait until the API Server recognizes the CRD
resource "time_sleep" "wait_for_crd" {
create_duration = "30s" # Experience shows 30s is sufficient
depends_on = [helm_release.cert_manager]
}

resource "kubernetes_manifest" "cluster_issuer" {
manifest = {
apiVersion = "cert-manager.io/v1"
kind = "ClusterIssuer"
metadata = {
name = "letsencrypt-prod"
}
spec = {
# ...
}
}
# Run only after time_sleep finishes
depends_on = [time_sleep.wait_for_crd]
}

5. Validation

1. Run terraform destroy to create a clean state.
2. Run terraform apply all at once.
3. Check the logs to see if it pauses for 30 seconds with the message time_sleep.wait_for_crd: Creating... after helm_release completes.
4. If kubernetes_manifest is created without errors afterward, it is a success.

[K8s Provider] Apply is Hanging: A Deep Dive into Timeouts + [K8s Provider] "Resource not found": Resolving Race Conditions with CRDs | Brief Stack