Get Monthly InterviewCoder for $99 $25!
First 100 customers only
24
:
00
:
00

Top 110+ Kubernetes Interview Questions for Any DevOps Role

In coding interviews and software engineer interview preparation, Kubernetes topics often trip people up: cluster design, YAML manifests, kubelet behavior, kubectl commands, networking, and storage questions frequently arise. Have you practiced explaining pod lifecycle, services, ingress, RBAC rules, Helm charts, autoscaling, and how you would debug a failing deployment under time pressure? This guide brings together clear explanations, example answers, real-world scenario questions, and troubleshooting steps, so you can feel fully prepared and confident in landing a DevOps role by mastering Kubernetes interview questions at every level, from basic to advanced, and in scenario-based contexts.

To help with that, Interview Coder offers an undetectable coding assistant for interviews that creates realistic Kubernetes questions, runs mock sessions, provides clear feedback on answers and YAML, and helps you rehearse system design, deployment scripts, CI/CD pipelines, monitoring, and everyday debugging tasks, so you build steady confidence.

Why Is Kubernetes Knowledge So Essential For Modern DevOps?

Blog image

Kubernetes manages containers at scale, so teams stop treating infrastructure as one-off scripts. It enforces declarative configs, automates lifecycle tasks, and gives operations a common control plane for clusters on AWS, Azure, GCP, or on premises.

Hiring managers expect fluency because Kubernetes skills reduce downtime, speed releases, and let teams adopt microservices without chaos. Companies often attach six-figure pay ranges to those skills in North America, with averages from roughly $144,030 to $202,202.

Kubernetes Fundamentals Quick Reference: What You Should Know First

Kubernetes is an open source container orchestration platform originally created at Google and now maintained by the Cloud Native Computing Foundation. It schedules containers inside pods, uses YAML for declarative configs, and stores cluster state in etcd.

It covers service discovery, load balancing, rolling updates, self-healing, and horizontal or vertical scaling using metrics. If you already know pods, deployments, services, and persistent storage, skip this section; otherwise, practice creating pods, services, PVCs, and Deployments with kubectl.

Core Components Explained: What Runs the Cluster and Why It Matters

Control plane components include the kube apiserver, which handles requests, etcd, which holds state, the kube scheduler that assigns pods to nodes, and the controller manager, which runs controllers like Deployment and Node controllers.

Worker node components include kubelet, which enforces pod manifests, and kube proxy, which implements service networking. Understanding how these parts interact helps with debugging API errors, resource contention, and controller loops when something misbehaves.

Kubernetes Architecture: How Master and Worker Nodes Coordinate Workloads

Kubernetes follows a master worker model:

  • The control plane makes decisions
  • Nodes run containers in pods
  • Scheduler places workloads using resource requests and policy

Pods wrap one or more containers and share networking and storage. Services expose groups of pods via Cluster IP, NodePort, or LoadBalancer, and Ingress manages HTTP routing. Knowing where to look for logs, events, and metrics on both control plane and worker nodes speeds troubleshooting.

Key Features Interviewers Ask About: The Mechanisms That Prove You Know Kubernetes

Expect questions on rolling updates, liveness and readiness probes, resource requests and limits, configmaps and secrets, persistent volumes and storage classes, CronJobs, StatefulSets, DaemonSets, ReplicaSets, and PodDisruptionBudgets.

Be ready to explain how horizontal pod autoscaling uses CPU, memory, or custom metrics, and how Cluster Autoscaler interacts with cloud providers. Add examples of kubectl commands you use daily, such as:

  • kubectl get pods --all-namespaces
  • kubectl describe pod
  • kubectl logs -f
  • kubectl exec -it
  • kubectl rollout status deployment/<name>

How Kubernetes Bridges Development and Operations: The Practical Glue

Kubernetes gives developers a consistent runtime and operators a standard control plane. Developers package apps as images and declare the desired state; operations provide the cluster, networking, and storage.

CI CD pipelines run kubectl apply or helm upgrade to push changes through environments, while observability stacks like Prometheus and Grafana monitor health. This shared workflow reduces friction between teams and lets each side focus on its responsibilities.

Real Interview Topics and Sample Kubernetes Interview Questions

  • Cluster management and installation: How do you set up a highly available control plane? What does kubeadm init do? How do you recover from etcd loss?
  • Networking and service discovery: Explain kube proxy modes and how CNI plugins differ. What are NetworkPolicy rules and where do they apply?
  • Storage and stateful apps: How do PVC, PV, and StorageClass relate? When do you use StatefulSet instead of Deployment?
  • Security and RBAC: How do you design least privilege roles? How do you secure the Kube API server and etcd?
  • Scheduling and affinity: How does taint and toleration work? How do you force pods to co-locate or anti-co-locate?
  • Observability and debugging: What metrics and alerts do you track? How do you diagnose a crash loopback off?

Troubleshooting Scenarios You Should Practice Under Pressure

Practice resolving a pod stuck in Pending by checking node capacity, events, and PVC binding; fix ImagePullBackOff by inspecting image name, tags, and registry credentials; recover from a misbehaving Deployment by scaling down, rolling back with kubectl rollout undo, or using kubectl rollout history.

For control plane issues, use kube apiserver logs, etcdctl for state checks, and kubectl get componentstatuses. Use kubectl describe node and kubectl get events to find resource or network problems.

Hands-On Commands and Tactics to Mention in Interviews

  • Inspect resources: kubectl get pods -o wide, kubectl describe pod <name>, kubectl get events
  • Logs and exec: kubectl logs -f <pod> [-c container], kubectl exec -it <pod> -- /bin/sh
  • Rollouts: kubectl rollout status deployment/<name>, kubectl rollout undo deployment/<name>
  • Node maintenance: kubectl cordon <node>, kubectl drain <node> --ignore-daemonsets --delete-local-data
  • Copy and port: kubectl cp, kubectl port-forward svc/<svc> 8080:80

Cite these commands in short examples when asked to show troubleshooting steps.

Storage, Persistence, and Stateful Workloads Interview Focus

Explain the differences between ephemeral storage in pods and persistent volumes managed by storage classes and CSI drivers. Discuss ReadWriteOnce versus ReadWriteMany access modes and the need for volume claims in StatefulSets. Outline backup approaches:

  • Snapshotting PVs at the storage layer
  • Using Velero for cluster backups
  • Application-level backups for databases

Security, Policies, and Governance Interview Focus

Cover RBAC objects like Role and ClusterRole, RoleBinding, and ClusterRoleBinding. Describe NetworkPolicy to restrict pod traffic and PodSecurityPolicy alternatives or Pod Security Admission.

Mention secrets handling: use external secret stores like HashiCorp Vault or cloud KMS, and avoid baking secrets into images or plain configmaps. Talk about image scanning in CI and runtime admission controls like OPA Gatekeeper for policy enforcement.

Ecosystem Tools and Extensions You Should Know

Know Helm charts for package management, Operators for domain-specific automation, service meshes like Istio or Linkerd for advanced traffic controls and observability, Prometheus for metrics, Fluentd or Fluent Bit for logs, and Jaeger for tracing. Be ready to explain when to use Helm versus plain manifests and how Operators reduce manual reconciliation.

Practical Interview Prep Plan and Certification Advice

Master Docker first, then run a local cluster with kind or minikube, and move to cloud clusters on a free tier. Work through labs that simulate outages and upgrades. Prepare CKA for cluster administration and CKAD for application design; both force hands-on practice and exam-style time pressure. Use mock interviews and whiteboard sessions to rehearse architecture explanations and failure modes

Soft Skills and Behavioral Prompts That Interviewers Will Probe

Expect questions about teamwork on incidents, incident postmortems, and how you communicate outages to stakeholders. Talk through a specific incident, what you observed, the steps you took, and how you prevented recurrence with automation or runbooks. Practice clear, concise answers that show ownership and learning.

Mock Questions and Exercises You Can Practice Right Now

  • Task: Create a Deployment for a simple web app, expose it via a Service, and perform a rolling update while maintaining availability.
  • Task: Reproduce a crash loop, use kubectl describe and logs to find the cause, and patch the Deployment.
  • Question: Design a CI CD flow that deploys to staging and production with canary releases and automated rollbacks.

Work these exercises in a timed environment and narrate your decisions as you go.

What Interviewers Probe About System Design and Architecture

They will ask how you design clusters for multi tenancy, how to isolate teams using namespaces and RBAC, and how to plan for capacity and cost. Show understanding of HA control planes, cross-region clusters, backup and restore of etcd, and how observability and alerting tie into an SRE-style runbook. Ask clarifying questions during design problems to reveal trade-offs you consider.

Related Reading

  • Vibe Coding
  • React Interview Questions
  • AWS Interview Questions
  • Leetcode 75
  • Jenkins Interview Questions
  • React Interview Questions
  • Leetcode Patterns
  • Java Interview Questions And Answers
  • Azure Interview Questions
  • SQL Server Interview Questions
  • Leetcode Blind 75
  • C# Interview Questions
  • AngularJS Interview Questions
  • TypeScript Interview Questions
  • AWS Interview Questions

30 Basic Kubernetes Interview Questions and Answers

Blog image

1. Pod Primer: What is a Pod in Kubernetes?

A Pod is the smallest deployable unit in Kubernetes. It represents one or more containers that share the same network namespace and storage volumes. Containers in a Pod communicate over localhost and share the same IP address and ports. Pods are ephemeral; if a Pod dies, the controller usually creates a replacement Pod with a new IP.

Example Pod YAML:

apiVersion: v1kind: Podmetadata: name: my-podspec: containers: - name: nginx-container image: nginx:1.21 ports: - containerPort: 80

2. kubectl Essentials: What Is the Purpose of Kubectl?

kubectl is the main command-line tool for interacting with the cluster and managing Kubernetes resources. Use it to list objects, view logs, apply manifests, and open shells inside containers.

Common commands:

kubectl get pods # list all Podskubectl get services # list all Serviceskubectl logs <pod-name> # view Pod logskubectl exec -it <pod-name> -- /bin/sh # open a shell inside a Pod

3. Deployment Basics: What is a Deployment in Kubernetes?

A Deployment manages the lifecycle of Pods and ReplicaSets, ensuring a specified number of replicas run and stay healthy. It supports rolling updates, rollbacks, and self-healing.

Example Deployment YAML:

apiVersion: apps/v1kind: Deploymentmetadata: name: nginx-deploymentspec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80

4. Service Fundamentals: What Is a Kubernetes Service, and Why Is It Needed?

A Service exposes a stable network endpoint for a set of Pods. Because Pods can be created and destroyed, their IPs change; a Service gives a fixed IP and DNS name so clients and other services can reach them reliably.

Example Service YAML:

apiVersion: v1kind: Servicemetadata: name: my-servicespec: selector: app: my-app ports: - protocol: TCP port: 80 targetPort: 80 type: ClusterIP

5. Service Types Explained: What Service Types Are Available at Kubernetes Services?

Kubernetes supports these Service types:

  • ClusterIP: Internal cluster access only, default type.
  • NodePort: Opens a static port on every node to expose the Service externally.
  • LoadBalancer: Provisions a cloud load balancer to expose the Service with a public IP.
  • ExternalName: Maps a Kubernetes Service to an external DNS name.

6. ConfigMaps and Secrets: What Is the Role of ConfigMaps and Secrets in Kubernetes?

ConfigMaps store non-sensitive configuration data as key-value pairs. Secrets store sensitive data, encoded or managed by external providers. Pods can mount config as files, environment variables, or command line args.

Example ConfigMap YAML:a

piVersion: v1kind: ConfigMapmetadata: name: my-configdata: database_url: "postgres://db.example.com"

Example Secret YAML (base64 encoded):

apiVersion: v1kind: Secretmetadata: name: my-secrettype: Opaquedata: password: cGFzc3dvcmQ= # "password" encoded in Base64

7. Namespace How To: What Are Namespaces in Kubernetes?

A Namespace provides a virtual partition inside a cluster to separate resources for teams, environments, or projects. Use namespaces to avoid name collisions and to apply RBAC and resource quotas per group.

Examples:

# create a namespace called devkubectl create namespace dev# create a Pod in that namespacekubectl run nginx --image=nginx --namespace=dev# get Pods in that namespacekubectl get pods --namespace=dev

8. Labels and Selectors: How Do Labels and Selectors Work in Kubernetes?

Labels are key-value pairs attached to objects like Pods. Selectors filter and target groups of objects based on labels. Labels enable services, deployments, and queries to identify relevant Pods.

Example Pod with labels:

apiVersion: v1kind: Podmetadata: name: my-pod labels: environment: production app: nginxspec: containers: - name: nginx-container image: nginx:1.21 ports: - containerPort: 80Select Pods by label:kubectl get pods -l environment=production

9. Persistent Storage: What Are PVs and PVCs?

A PersistentVolume (PV) is a cluster resource that provides durable storage independent of Pod lifecycle. A PersistentVolumeClaim (PVC) is a user request for storage, bound to a PV. Storage can be statically or dynamically provisioned using StorageClasses.

Example PV and PVC YAML:

apiVersion: v1kind: PersistentVolumemetadata: name: my-pvspec: capacity: storage: 1Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain hostPath: path: "/mnt/data" apiVersion: v1kind: PersistentVolumeClaimmetadata: name: my-pvcspec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi

10. Ingress Routing: What Is a Kubernetes Ingress and How Is It Used?

An Ingress is an API object that routes external HTTP and HTTPS traffic to Services in the cluster based on host or path rules. An Ingress controller implements the routing rules and provides TLS termination, virtual hosts, and URL path-based routing for web traffic.

11. Handling Secrets and Config: How Do You Handle Secrets and Configuration Management in Kubernetes?

Use Secrets for sensitive data and ConfigMaps for non-sensitive configuration. Mount them as volumes or inject them as environment variables. For stronger security, integrate with secret managers such as HashiCorp Vault, cloud provider key management services, or enable encryption at rest for secret objects.

12. Deployment Automation: How Do You Automate Kubernetes Deployments?

Automate deployments with Helm charts, Operators, or GitOps workflows. Helm packages manifests and handles templating and upgrades. Operators encode application operational knowledge into controllers. GitOps keeps manifests in Git and uses reconciliation tools like Argo CD or Flux to apply changes automatically.

13. Scaling Strategies: How Do You Scale Kubernetes Applications Horizontally and Vertically?

Horizontal scaling adds replicas to a Deployment or StatefulSet. Use HorizontalPodAutoscaler to scale based on CPU, memory, or custom metrics. Vertical scaling increases CPU and memory limits for containers; use VerticalPodAutoscaler or update resource specs manually. Horizontal scaling improves throughput while vertical scaling increases per instance capacity.

14. Namespace vs Label: What Is the Difference Between a Kubernetes Namespace and a Label?

A Namespace partitions cluster resources and provides administrative isolation, quotas, and RBAC scoping. Labels tag objects with key-value pairs to enable selection, grouping, and filtering across namespaces. Use namespaces for isolation and labels for organization and selection.

15. Load Balancing and Routing: How Does Kubernetes Handle Load Balancing and Network Traffic Routing?

A Service acts as the primary load-balancing abstraction. It provides a stable IP and DNS name and distributes traffic to backing Pods using kube proxy rules or IPVS. For external traffic, NodePort and LoadBalancer types, or an Ingress in front of Services, route requests to the correct backend Pods.

16. Secret vs ConfigMap: What Is a Kubernetes Secret, and How Is It Different from a Kubernetes ConfigMap?

A Secret stores sensitive data such as passwords, tokens, and keys. ConfigMap stores non-sensitive configuration data. Secrets are encoded and can be configured for stricter access via RBAC, encryption at rest, and external secret managers; ConfigMaps focus on flexibility and easy updates.

17. Stateful Apps: How Do You Deploy a Stateful Application in Kubernetes?

Use a StatefulSet to deploy stateful applications that require stable network IDs and stable persistent storage. StatefulSet ensures ordered deployment, stable names, and stable PVCs for each replica. Combine StatefulSet with a StorageClass to provision persistent storage for each Pod.

18. Deployment vs DaemonSet: What Is the Difference Between a Kubernetes Deployment and a Kubernetes DaemonSet?

A Deployment manages multiple identical replicas of application Pods and supports rolling updates and rollbacks. A DaemonSet ensures a single Pod instance runs on every node or on a subset of nodes, useful for node-level agents such as log collectors and monitoring agents.

19. Node Maintenance: How to Do Maintenance Activity on the K8 Node?

Put a node into maintenance mode and evict Pods safely.

Use these commands:

kubectl cordon <node-name> # mark node unschedulablekubectl drain <node-name> --ignore-daemonsets # evict Pods safely

To list nodes before maintenance:

kubectl get nodesDrain waits for Pods to terminate or move, honoring PodDisruptionBudgets and ignoring daemon sets if requested.

20. Centralized Logging: How to Get the Central Logs from a Pod?

Choose a logging pattern and deploy collection agents.

Common approaches:

  • Node-level logging agent running as a daemon set.
  • Sidecar container that ships logs from the application container.
  • Sidecar with a logging agent for processing before shipping.
  • Export logs directly from the application to a logging endpoint.

A typical stack: filebeat or fluentd running as a daemon set forwards logs to Kafka or directly to an ELK or EFK stack for central aggregation and search.

21. Cluster Monitoring: How to Monitor the Kubernetes Cluster?

Prometheus is the common monitoring solution for Kubernetes. Key components:

  • Prometheus server scrapes and stores time series metrics.
  • Client libraries instrument applications.
  • Push gateway supports short-lived jobs.
  • Exporters expose metrics for services such as HAProxy or node exporters.
  • Alertmanager routes and deduplicates alerts to teams and tools.

22. Security Practices: What Are the Various Things That Can Be Done to Increase Kubernetes Security?

Harden the cluster with these controls:

  • Network policies to restrict Pod-to-Pod communication.
  • RBAC to limit user and service account permissions.
  • Namespaces to partition workloads.
  • Admission controllers to prevent privileged containers and enforce policies.
  • Enable audit logging and encrypt secrets at rest.

23. Load Balancer Role: What Is the Role of Load Balancer in Kubernetes?

A load balancer distributes incoming traffic across multiple backend Pods or nodes so the application remains available under load in cloud environments. A LoadBalancer Service provisions an external load balancer with a single public IP that forwards requests to the cluster and then to the correct Pods via Services.

24. Init Containers: What’s an Init Container and When Can It Be Used?

Init containers run before the app container starts and prepare the environment. Use them to:

  • Wait for an external dependency with a simple sleep or check loop.
  • Populate a shared volume, for example, by cloning a git repo.
  • Run database migrations before launching the main application.

25. Pod Disruption Budget: What is PDB (Pod Disruption Budget)?

A Pod Disruption Budget declares the minimum number of Pods that must remain available during voluntary disruptions, such as node drains. It prevents evictions that would violate availability targets.

Example PDB YAML:

apiVersion: policy/v1beta1kind: PodDisruptionBudgetmetadata: name: zk-pdbspec: minAvailable: 2 selector: matchLabels: app: zookeeper

26. Core Services on Nodes: What Are the Various K8s Services Running on Nodes and the Role of Each Service?

Kubernetes nodes run core components that keep the cluster functional.

Executor node components:

  • kube proxy: maintains network rules for service to pod mapping and routes traffic.
  • kubelet: agent that registers the node with the control plane, watches Pod specs, and ensures containers run as defined.

Master components:

  • kube apiserver: central API endpoint and entry point to the cluster.
  • kube scheduler: assigns Pods to nodes based on resources and constraints.
  • kube controller manager: runs control loops that reconcile cluster state with the desired state via the API server.

27. Resource Controls: How Do We Control the Resource Usage of a Pod?

Set resource requests and limits on containers. Requests inform the scheduler of required resources. Limits the cap resource usage to prevent a single container from starving others.

Example:

apiVersion: v1kind: Podmetadata: name: demospec: containers: - name: example1 image: example/example1 resources: requests: memory: "128Mi" cpu: "250m" limits: memory: "256Mi" cpu: "500m"

28. Kubelet Explained: What Is Kubelet?

Kubelet is the node agent that manages container lifecycle on a node. It registers the node with the control plane, reports resource and health status, and ensures containers specified in Pod specs run and stay running.

Example Deployment manifest showing resource limits for a Pod:

apiVersion: apps/v1kind: Deploymentmetadata: name: my-nginxspec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: my-nginx image: nginx:latest resources: limits: memory: "125Mi" cpu: "750m" ports: - containerPort: 80

29. StatefulSet vs Deployment: Explain the Difference Between a StatefulSet and a Deployment.

StatefulSet manages pods that require stable network identities and persistent storage. It enforces ordered startup, scaling, and termination, and provides stable persistent volume claims per replica. Deployment manages stateless replica pods that are interchangeable, supports rolling updates and rollbacks, and focuses on replication and availability rather than stable identity.

30. kube-proxy Role: What Is the Role of the Kube-Proxy in Kubernetes and How Does It Facilitate Communication Between Pods?

kube-proxy programs network rules on each node to allow Services to route traffic to backing Pods. It watches Service and Endpoints objects and updates iptables, IPVS, or user space rules so that traffic sent to a Service IP or node port reaches the correct Pod endpoints, enabling service discovery and load distribution within the cluster.

Related Reading

  • Cybersec
  • Git Interview Questions
  • Front End Developer Interview Questions
  • DevOps Interview Questions And Answers
  • Leetcode Roadmap
  • Leetcode Alternatives
  • System Design Interview Preparation
  • Ansible Interview Questions
  • Engineering Levels
  • jQuery Interview Questions
  • ML Interview Questions
  • Selenium Interview Questions And Answers
  • ASP.NET MVC Interview Questions
  • NodeJS Interview Questions
  • Deep Learning Interview Questions
  • LockedIn

23 Intermediate Kubernetes Interview Questions and Answers

Blog image

1. Kubernetes Networking: How Cluster Communication Works

Kubernetes gives every Pod its own IP address so Pods can talk directly to one another without NAT. A Container Network Interface plugin provides that flat IP space. Services then present stable endpoints for changing Pod sets by mapping a single Cluster IP to backend Pod IPs.

Kube proxy on each node implements Service forwarding rules using iptables or IPVS, and Ingress controllers handle north-south HTTP and HTTPS routing. Service mesh layers add observability and policy between services when you need fine-grained control.

Why This Matters

Predictable Pod addressing and stable Service endpoints make service discovery and horizontal scaling simple, while the CNI choice and kube proxy mode affect latency, throughput, and operational complexity.

2. RBAC Deep Dive: Permissions That Protect Your Cluster

Role-based access control uses Role or ClusterRole objects to define allowed verbs on resource API groups. RoleBindings and ClusterRoleBindings attach those roles to Users, Groups, or ServiceAccounts. Use least privilege, give service accounts the minimum verbs and resources they require. Audit with kubectl auth can I test permissions, and use aggregated ClusterRoles for cluster-wide policies

.Example read-only role:

apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: pod-readerrules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"]

Bind it to a user:

apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: pod-reader-bindingsubjects: - kind: User name: dummyroleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io

Why This Matters

RBAC enforces multi-tenant isolation and prevents privilege escalation. Protect system namespaces, rotate credentials, and restrict who can create ClusterRoleBindings to stop accidental cluster-wide access.

3. Autoscaling: Horizontal, Vertical, and Cluster Level Strategies

Kubernetes autoscaling has three layers. Horizontal Pod Autoscaler scales replica counts using CPU memory or custom metrics. Vertical Pod Autoscaler adjusts requests and limits for a Pod. Cluster Autoscaler changes node counts based on unschedulable Pods and underutilized nodes.

Example HPA:

kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10

Operational Details

HPA reacts to metrics through the metrics server or external adapters; tune stabilization windows to avoid flapping. VPA and HPA can conflict; use VPA in recommendation or update mode, not always in auto update with HPA. Cluster Autoscaler must honor Pod Disruption Budgets and node taints. Plan scaling for startup latency, Pod scheduling constraints, and stateful workloads.

4. Debugging Pods: Practical Commands and Techniques

Start with kubectl logs to read container output and kubectl describe pod to inspect events and status conditions. Use kubectl exec to run commands inside a live container or kubectl cp to extract files. For transient failures, use kubectl get pods --field-selector=status.phase=Failed. When you cannot exec, use kubectl debug to spawn an ephemeral container with elevated tooling.

Common failure modes:

  • Image pull errors
  • CrashLoopBackOff from OOM kills
  • Readiness probe failures are preventing traffic

Check node conditions and kubelet logs for resource pressure. When debugging network issues, examine CNI plugin logs and use tcpdump inside a debug container.

5. Rolling Updates and Rollbacks: Controlled Change Management

Deployments support rolling updates using maxSurge and maxUnavailable to control replacement pace. The controller creates new Pods, waits for readiness probes, then removes old Pods. Use readinessProbe to avoid sending traffic before the Pod is ready. For fast experimentation, pick a small maxUnavailable and a larger maxSurge for smoother rollouts.

Commands:

  • kubectl set image deployment/my-deployment nginx=nginx:1.21
  • kubectl rollout status deployment my-deployment
  • kubectl rollout undo deployment my-deployment

Advanced atterns

Use canary releases or blue-green for staged validation, and track revision history with kubectl rollout history. Keep progressDeadlineSeconds and revisionHistoryLimit tuned for rollback hygiene.

6. Ingress: HTTP and HTTPS Routing at the Edge

An Ingress object declares rules for host and path-based routing and delegates enforcement to an Ingress controller such as NGINX, Traefik, Contour, or cloud providers. Ingress can terminate TLS and rewrite paths. Use IngressClass and annotations to select controller-specific features like health checks or SSL redirect.

Example:

apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: my-ingressspec: rules: - host: my-app.example.com http: paths: - path: / pathType: Prefix backend: service: name: my-service port: number: 80

Why This Matters

Ingress centralizes routing for many services, simplifies certificate management with cert manager, and reduces the number of external load balancers you run.

7. Resource Requests, Limits, and QoS Classes

Requests reserve CPU and memory for scheduling. Limits the cap on runtime usage. Kubernetes assigns QoS classes based on requests and limits: Guaranteed when requests equal limits for all containers, Burstable when requests are set but do not match limits, and BestEffort when no requests are set.

Example:

resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m"

Operational Guidance

Set requests so the scheduler places Pods properly and set limits to prevent noisy neighbors. Use LimitRange objects to enforce defaults in a namespace and resource quotas to guard cluster capacity.

8. Resource Overruns and Init Containers: What Happens and When to Use Init Containers

If a container exceeds its memory limit, the kernel triggers an OOM kill, and the container restarts according to its restartPolicy. If CPU usage exceeds its limit, the container is throttled rather than killed, which slows processing but avoids restart loops.Init containers run to completion before application containers start. Use them to wait for dependencies, initialize volumes, or run migrations. They run sequentially and have separate images and resource settings, so they are ideal for boot-time checks or one-time setup tasks.

9. Pod Disruptions and High Availability Patterns

Pod Disruption Budgets declare how many Pods must remain available during voluntary disruptions. Anti-affinity and topology spread constraints force Pods to distribute across failure domains. Use Taints and Tolerations to control scheduling on special nodes.

Example PDB:

apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: my-app-pdbspec: minAvailable: 2 selector: matchLabels: app: my-app

Practical Considerations

PDBs block disruptive actions until availability is satisfied, which affects rolling upgrades and node autoscaling. Combine them with readiness probes and graceful termination to maintain user experience.

10. ConfigMap: Injecting Non-Confidential Configuration

ConfigMaps hold configuration data that Pods can consume as environment variables or mounted files. They let you separate configuration from images and update settings without a rebuild.

Pod example snippet:

apiVersion: v1kind: Podmetadata: name: my-podspec: containers: - name: container-name image: image volumeMounts: - name: volume-name mountPath: /etc/configmap volumes: - name: volume-name configMap: name: configmap-name

Best Practice

Use checksum annotations on Deployments to force rollout when a ConfigMap changes, and prefer Secrets for sensitive data.

11. Namespaces: Organizing and Isolating Workloads

Namespaces partition cluster resources and help teams share a cluster without name collisions. Use namespaces to scope RBAC, apply network policies, and attach ResourceQuota and LimitRange objects to control consumption.

Operational Tips

Avoid running everything in the default. Create namespaces per team, environment, or application, and use kubectl config set context to switch defaults when working across namespaces.

12. TLS with Ingress: Secure Traffic at the Edge

Add spec.tls entries and reference a secret that holds the certificate and key. Use cert manager to automate certificate issuance and renewal with ACME providers.

Example TLS block:

spec: tls: - hosts: - some_app.com secretName: someapp-secret-tlsConsider TLS termination at the Ingress for centralized certificate management and enable strict transport security headers via controller annotations.

13. Make That Ingress: Complete Configuration Example

Working Ingress manifest using the current API:

apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: someapp-ingressspec: rules: - host: my.host http: paths: - path: / pathType: Prefix backend: service: name: someapp-internal-service port: number: 8080Use IngressClass to bind to a specific controller and add annotations for health checks or path rewrites when needed.

14. Expose a Service Externally: Load Balancer and NodePort Options

To expose the service externally, add type LoadBalancer and optionally a nodePort. On cloud platforms, a LoadBalancer type provisions a cloud load balancer. For on-prem use, MetalLB or an external LoadBalancer implementation.

Example:

spec: selector: app: some-app type: LoadBalancer ports: - protocol: UDP port: 8080 targetPort: 8080 nodePort: 32412Also consider ExternalName for simple DNS mapping and use annotations to control cloud provider-specific load balancer behavior.

15. etcd: The Cluster State Store and Why It Matters

etcd is the consistent key-value store that holds cluster state, including object definitions and status. Kubernetes components watch etcd for changes and react. Keep etcd healthy with regular backups, secure it with TLS and RBAC, and size the cluster for write throughput and snapshot frequency.

Operational Notes

etcd is sensitive to latency and disk IO. Run it on dedicated control plane nodes, monitor quorum and lag, and test restore procedures frequently.

16. How Rolling Updates Work Inside a Deployment

A rolling update creates new Pods while terminating old ones incrementally according to maxSurge and maxUnavailable settings. The controller waits for Pods to pass readiness checks before terminating the old replica. Use progressDeadlineSeconds to detect stalled rollouts and revisionHistoryLimit to bound storage of older versions.When you need immediate replacement, use the Recreate strategy, but expect short downtime. For stateful applications, prefer StatefulSets, which manage identity and persistent storage differently.

17. What Is a Namespace: A Virtual Sub Cluster

A Namespace gives logical separation inside a cluster. Resources such as Pods, Services, and ConfigMaps live inside a namespace. System components use specific namespaces like kube-system and kube-public, and workloads generally belong in custom namespaces.

Commands:

  • kubectl create namespace my-team
  • kubectl get pods --namespace my-team
  • kubectl config set-context --current --namespace=my-team

Namespaces make policy application and billing attribution clearer when many teams share a cluster.

18. Labels and Selectors: The Glue Between Objects

Labels are key-value pairs attached to objects. Selectors pick sets of objects by matching label expressions. Use matchLabels for simple equality and matchExpressions for set-based logic. Labels drive Service backends, Deployment selectors, and monitoring targets.

Caveat

Deployment selectors are immutable after creation. Ensure the template labels match the selector to avoid orphaned Pods and unintended behavior.

19. Kube Proxy: How Services Get Implemented on Nodes

Kube proxy watches Service and Endpoint objects and programs node-level networking rules. It can operate in iptables or IPVS mode; IPVS offers better performance and smoother connection handling for large-scale clusters. EndpointSlice reduces watch volume by grouping endpoints.When Service traffic grows, the proxy mode and host-level networking configuration influence latency and throughput. Consider using external load balancing or accelerating service mesh dataplanes for complex routing needs.

20. Persistent Volumes: Durable Storage for Containers

A PersistentVolume represents storage provisioned either statically or dynamically by a StorageClass and bound to a PersistentVolumeClaim. PVs include capacity accessModes like ReadWriteOnce ReadOnlyMany and ReadWriteMany, and a reclaimPolicy such as Retain Delete, or Recycle.

Example PV:

apiVersion: v1kind: PersistentVolumemetadata: name: mypvspec: capacity: storage: 2Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain hostPath: path: "/mnt/data"Use CSI drivers for cloud and on-prem storage, enable snapshots and resizing for operational flexibility, and choose WaitForFirstConsumer binding when topology matters for scheduling.

21. DaemonSet versus ReplicaSet: Different Scaling Models

A ReplicaSet ensures a desired number of identical Pods run across the cluster and is ideal for stateless, scalable services. A DaemonSet runs exactly one Pod on selected nodes and is appropriate for node-level agents like log collectors, monitoring agents, or network plugins.When you need per-node functionality, use DaemonSet, when you need a scale based on traffic use ReplicaSet or a Deployment that manages it.

22. Cross Node Pod Communication: How Pods Talk Across Machines

Pods use the cluster network to address each other directly across nodes thanks to the CNI establishing routes or overlays. Services provide stable cluster IPs and headless Services or DNS let clients resolve Pod endpoints for direct connections. NetworkPolicies restrict cross Pod traffic by selecting Pods and specifying allowed ingress or egress rules.When you see intermittent failures check MTU, overlay encapsulation overhead, CNI health, and node routing tables. Apply NetworkPolicy to lock down traffic between namespaces and limit blast radius.

23. Why Kubernetes: Key Benefits for Production Systems

Kubernetes orchestrates containers automatically so you get automated scheduling self healing rolling updates and service discovery. It supports autoscaling load balancing persistent storage and secrets management while integrating with CI CD and observability tools. Extensibility through CRDs and a broad ecosystem makes it suitable for multi cloud and hybrid cloud deployments, and RBAC plus network policies provide enterprise grade security controls.

38 Advanced Kubernetes Interview Questions and Answers

Blog image

1. StatefulSets vs Deployments: Stable Identity and Ordered Lifecycle for Stateful Apps

StatefulSets provide stable network identities, stable persistent storage, and ordered create/terminate semantics for Pods. Use them when replicas must keep a stable hostname, retain per-pod storage, or start and stop in sequence. Deployments manage stateless workloads with interchangeable Pods and fast rolling updates.

Trade Offs

StatefulSets limit pod replacement parallelism and complicate upgrades, while Deployments favor availability and simple scaling. For databases prefer StatefulSet plus a dynamic PersistentVolumeClaim template and careful readiness probes.

2. Service Mesh Explained: Consistent Service-to-Service Features Without Code Changes

A service mesh inserts a network layer between services to handle traffic management, mutual TLS, tracing, and metrics. It delivers L7 routing, retries, circuit breaking, and identity for services. Popular options are Istio, Linkerd, and Consul.

Trade Offs

Istio is feature rich but complex and heavier on control plane resources; Linkerd is lighter and easier to operate but has fewer advanced policy features. Use sidecars and mTLS to centralize crosscutting concerns and reduce per-application changes.

3. Cluster Hardening Checklist: Four-Layer Security Model and Best Practices

Apply security across cloud provider, control plane, container image, and application code. Use cloud IAM and VPC firewall rules, enable RBAC and audit logging, lock down the API server with authentication and admission controls, scan images and use nonroot users, and manage secrets via sealed secrets or an external vault. Add network policies, pod security admission, and image provenance signing. Balance strict policies with developer productivity and automate validation with admission webhooks.

4. Taints and Tolerations: How to Repel and Allow Pods on Nodes

Taints mark nodes to repel Pods unless those Pods have matching tolerations. Use taints to reserve nodes for special workloads or to isolate noisy hardware. Add tolerations in PodSpec when a workload must run on tainted nodes.

Example Taint

kubectl taint nodes node1 key1=value1:NoSchedule. Trade offs: too many taints make scheduling complex; prefer node affinity for positive targeting and taints for negative enforcement.

5. Sidecar Containers: Local Helpers That Share a Pod’s Lifecycle

Sidecars run in the same Pod as the main container and share volumes and the network namespace. Use them for logging collectors, proxies for service mesh (Envoy), configuration syncers, or health exporters. Sidecars simplify observability and security without changing app code. Watch resource contention and restart behavior; use lifecycle hooks and init containers to coordinate startup.

6. Three Common Pod Failure Modes: How to Spot and Fix Pending, CrashLoopBackOff, ImagePullBackOff

Pending: check node capacity, unsatisfiable nodeSelector, or missing PVCs. Use kubectl describe pod and look at events. CrashLoopBackOff:

  • Inspect container logs
  • Misconfigured command
  • Readiness probe failures

Fix by correcting config and adding liveness/readiness probes.

ImagePullBackOff

Verify image name, tag, and registry credentials. Collect events and image pull logs for root cause. Instrument probes and resource requests to reduce noisy failures.

7. Mutating Admission Webhooks: Dynamically Change API Requests Before Persistence

A mutating webhook intercepts create or update API calls and can modify resource manifests before they are stored. Common uses: auto-inject sidecars, set defaults, inject security context, or attach required labels. Webhooks must be secure, highly available, and versioned. Keep the webhook fast and idempotent; fail closed or open according to risk tolerance. Test webhooks in a canary namespace before cluster-wide rollout.

8. Zero-Downtime Deployments: Rolling, Canary, and Blue Green With Readiness Control

Use Kubernetes rolling updates to replace Pods gradually and readiness probes to avoid sending traffic to unready replicas. For safer rollouts, use Canary Deployments with traffic splitting in the service mesh or ingress controller.

Blue-green reduces complexity by switching traffic between two environments, but requires double capacity. Design readiness and liveness probes ensure database migrations are backward compatible, and automate rollback criteria in CI/CD.

9. CRDs: Extend the Kubernetes API With Custom Resource Types

Custom Resource Definitions add new object kinds to the cluster so kubectl and the API can manage domain objects like BackupJobs or ModelVersions. Use CRDs when you need a Kubernetes-native lifecycle for a higher-level abstraction and want to build operators.

Include validation schema and additional printer columns. Validate CRD changes in a test cluster and plan upgrade paths for versioned CRD conversions.

10. Operators: Application Controllers That Encode Operational Knowledge

Operators combine CRDs with a controller that runs a reconciliation loop to automate deployment, scaling, backup, and recovery of complex applications. They encode runbooks as code.

Build operators when manual operations are error-prone or when you need self-healing application logic.

Trade-Offs

Operators centralize complexity but add another codebase and operational surface. Use controller-runtime or operator-sdk and design idempotent reconciliation loops.

11. Horizontal Pod Autoscaler Internals: Metrics Driven Scaling Loop

HPA watches resource or custom metrics and adjusts replica counts to meet targets. It runs a control loop that polls metrics, computes desired replicas, and calls the scale subresource. v2 supports multiple metric types and custom metrics. Common pitfalls:

  • Not exporting the correct metrics
  • Missing metrics adapter
  • Insufficient headroom for burst traffic

Combine HPA with Cluster Autoscaler and resource requests to avoid unschedulable Pods.

12. Custom Resources: How They Behave and How You Use Them With Kubectl

Once a CRD is installed, users create custom resource instances and manage them with standard tools. Custom resources let developers model domain concepts inside Kubernetes and use native RBAC and audit trails. Keep CRD schemas strict to catch invalid configs and use conversion webhooks for API evolution. Consider CRD storage size and avoid unbounded status fields.

13. Port forwarding chain: Container 8080 -> Service 8080 -> Ingress 8080 -> Browser 80

Configure the Service to target container port 8080 and create an Ingress that routes host traffic on port 80 to the service port 8080. Confirm an ingress controller is deployed and bound to nodes. Example service spec:

  • servicePort 8080 targetPort 8080

Check ingress controller logs and service endpoints when traffic does not reach the Pod.

14. External Connectivity Options: NodePort, LoadBalancer, Ingress, and Proxies

Expose pods externally using NodePort, cloud LoadBalancer, or an Ingress resource with a controller for L7 routing. For smaller setups use kubectl proxy or port-forward for ad hoc access. Each approach balances usability, security, and cost: NodePort is simple but exposes node ports, LoadBalancer integrates with cloud providers but costs money, and Ingress centralizes routing and TLS termination.

15. Run a Pod on a Specific Node: nodeName, nodeSelector, and nodeAffinity choices

Use nodeName for a hard bind to a node. Use nodeSelector for simple label matching and nodeAffinity for expressive rules and soft preferences. Prefer nodeAffinity over nodeSelector for future flexibility. Avoid nodeName when you need portability; use affinity and taints for scalable scheduling policies.

16. Docker Swarm vs Kubernetes: Differences in Scope and Capabilities

Kubernetes provides a full control plane with declarative desired state, autoscaling, and a richer API surface. Docker Swarm is simpler to set up and easier for small clusters, but lacks advanced features like native autoscaling and a pluggable control plane. Kubernetes demands more operational work but gives stronger guarantees around scheduling, health checks, and extensibility with CRDs and operators.

17. Secret Reference in Deployment: Pulling Secrets Into Env Vars

The spec snippet maps an environment variable to a key in a Secret. The Pod will populate USER_PASSWORD from some-secret.password. This keeps credentials out of manifests. When using this pattern, set RBAC to limit Secrets access and prefer projected secrets or external vaults for rotation

18. Custom Resource Recap: CRs as API Extensibility

A custom resource acts like any native Kubernetes object but models domain-specific concepts. They work through the API server, use the same RBAC, and support kubectl operations. Use custom resources to make higher-level automation and operator logic declarative.

19. StorageClass Fundamentals: Dynamic Provisioning and Storage Profiles

StorageClass defines provisioner, parameters, reclaim policy, and volume binding mode. PVCs request a StorageClass to dynamically provision PersistentVolumes. Choose a reclaim policy and binding mode to control reclaim behavior and scheduling. Test performance and throughput of a storage class under realistic load for stateful workloads.

20. Controllers Explained: Control Loops That Enforce Desired State

Controllers watch objects and take actions to reconcile the current state to the desired state. Examples include ReplicaSet, Deployment, StatefulSet, and DaemonSet. Build custom controllers for domain automation; ensure leader election for high availability, and implement exponential backoff for transient errors.

21. Deployment Rollout Strategies: Rolling, Recreate, and Blue Green Patterns

RollingUpdate replaces pods incrementally to maintain availability. Recreate tears down old pods before creating new ones, and it can be simpler for non-backward-compatible changes. Blue green runs two environments and swaps traffic; it requires extra capacity. Choose a strategy based on downtime tolerance, database migration risks, and testing needs.

22. CRD Deeper Use Cases: Extend Kubernetes and Pair With Custom Controllers

CRDs let you introduce concepts like CertificateRequests or BackupPlans. Use them with controllers to run reconciliation loops and maintain application invariants. Validate CRDs with OpenAPI v3 schema and provide a status subresource for controllers to report state. Plan CRD versioning and conversion to avoid breaking upgrades.

23. Cluster Security and Access Control: Features and Hardening Practices

Use RBAC with least privilege, enable audit logging, enable network policies, and enforce Pod Security Admission or Pod Security Standards. Use separate service accounts for controllers and restrict token lifetimes. Regularly rotate credentials and automate policy enforcement with admission webhooks. Monitor audit logs and integrate with SIEM for suspicious activity.

24. NetworkPolicy Mechanics: Intent-Based Network Controls for Pods

NetworkPolicy allows you to define which pods can talk to other pods and endpoints. Policies are namespaced and can be applied to ingress and egress. Remember that behavior depends on the CNI plugin; some default deny behavior requires at least one policy per direction. Use policies to isolate workloads and reduce the blast radius from compromised pods.

25. Helm Charts: Packaging, Templating, and Release Management

Helm charts group Kubernetes manifests and templates with values for customization. Charts enable versioned application installs, parameter overrides, and dependency management. Use chart best practices:

  • Keep values organized
  • Validate rendered manifests
  • Store charts in an artifact repository

Helm simplifies CI/CD but keeps secrets out of values.yaml or use sealed secrets.

26. Taints and Tolerations Review: Repel Then Allow Scheduling

Taints mark nodes so only Pods that tolerate them can schedule. Tolerations do not guarantee placement; they only allow Pods to be considered. Use node affinity for positive placement and taints for exclusion. Configure tolerations with operator Exists or Equal depending on matching needs.

27. Init Containers: Setup Steps Before Your App Starts

Init containers run sequentially before app containers. Use them to perform migrations, fetch secrets, set file permissions, or perform one-time initialization. They run to completion and can use the same volumes as app containers. Keep init containers lightweight and idempotent to avoid blocking Pod startup.

28. Service Types Overview: ClusterIP, NodePort, LoadBalancer, ExternalName

ClusterIP exposes a service inside the cluster only. NodePort opens a static port on all nodes for external access. LoadBalancer provisions a cloud load balancer and routes traffic to the service. ExternalName maps a service to an external DNS name. Choose the type that fits availability, exposure, and cost requirements.

29. Custom Operator Concept: Encoding Operational Playbooks as Controllers

Custom operators combine CRDs and controllers to perform lifecycle management for specific applications. They automate creation, failover, upgrades, and backups. When building an operator, split responsibilities between reconciliation, status reporting, and finalizers for cleanup. Use strong test coverage and e2e tests to validate operator behavior under failure.

30. Control Plane Internals: Components That Manage Cluster State

The control plane consists of the API server, etcd, controller manager, and scheduler. The API server serves paths for objects, etcd stores the canonical cluster state, controllers reconcile declared state, and the scheduler places new pods. Separate control plane components onto dedicated nodes or managed services to improve reliability.

31. kube-apiserver Purpose: The Gateway to Cluster State and Validation

kube-apiserver exposes the REST API used by users and components. It authenticates and authorizes requests, validates objects, and serves watch streams to controllers. All writes persist to etcd through the API server. Secure it with TLS, limit external access, and enable audit logs for forensic analysis.

32. Rolling Back Deployments: kubectl Rollout Undo Patterns

Use kubectl rollout undo deployment/<name> to revert to the previous ReplicaSet or use --to-revision to pick a specific revision. Ensure deployments keep revision history (revisionHistoryLimit) and test rollback paths in CI to avoid incompatible states. Pair rollbacks with automated health checks to detect regressions.

33. Pod Disruption Budgets: Protect Availability During Maintenance

A Pod Disruption Budget sets the minimum number or percentage of replicas that must stay available during voluntary disruptions. Use PDBs to guide automated activities like node drain and cluster autoscaler. Avoid overly strict PDBs that block legitimate cluster operations; set them relative to application criticality.

34. kube-controller-manager role: Running Built-In Control Loops

kube-controller-manager runs multiple controllers that reconcile resources such as replication, endpoints, and namespaces. It performs work by watching the API server state and issuing changes. Configure leader election for HA and monitor controller loop latencies to detect performance issues.

35. kube-apiserver Repeated Role Clarification: Frontend, Validation, and Communication Hub

The API server validates and persists resource definitions, serves the Kubernetes API surface, and coordinates component interactions. Its performance and availability determine overall cluster responsiveness.

36. Node Failure Handling and Resiliency: Detection and Automatic Recovery

kubelet heartbeat and node controllers mark nodes NotReady when they fail. The scheduler and controllers create replacement Pods on healthy nodes to maintain replica counts. Use eviction thresholds, pod anti-affinity, and multi-zone node pools to improve fault tolerance. Test chaos scenarios to verify application resilience.

37. RBAC Setup: Roles, RoleBindings, and Best Practice Workflow

Define coarse role models, then map users or service accounts to Roles or ClusterRoles via RoleBinding or ClusterRoleBinding. Use least privilege, group-based bindings, and review bindings regularly. Automate RBAC tests, and grant short-lived elevated access for troubleshooting using an approval flow.

38. Cloud Controller Manager: Cloud Integration and Node Lifecycle Tasks

Cloud controller manager isolates cloud provider interactions. It manages node lifecycle tasks, load balancer provisioning, and route management via cloud APIs. Use this component to keep cloud-specific logic out of core controllers and to enable multi-cloud portability. Monitor cloud API rate limits and credential expiry to prevent unexpected failures.

39. Essential Kubectl Commands: Quick Operational Toolbox

Common kubectl commands to remember: kubectl api-resources, kubectl autoscale, kubectl annotate, kubectl cluster-info, kubectl attach, kubectl apply, kubectl rollout, kubectl edit, kubectl config use-context, and kubectl config current-context. Use kubectl explain to understand API fields and kubectl describe for event details.

40. Helm Explained: Package Manager, Templating, and Release Lifecycle

Helm packages Kubernetes manifests into charts with templating and values for parameterization. Charts enable repeatable installs, upgrades, and rollbacks. Use repositories for chart distribution, and treat Helm releases as part of your CI/CD. Keep secrets out of chart values or encrypt them with a secrets plugin.

25 Scenario-Based Kubernetes Interview Questions and Answers

Blog image

1. Taints vs Node Affinity: When to Repel or Attract Pods

Quick distinction and how I think through placement:

  • Observe intent. Do I want to keep most pods off a node unless they opt in, or do I want to guide the scheduler toward certain nodes? If the former, use taints; if the latter, use node affinity.
  • Taints repel pods unless pods have matching tolerations. I verify by checking node taints: kubectl describe node <node-name> and look under Taints.
  • Node affinity attracts pods to nodes with specific labels. I inspect pod spec for affinity: kubectl get pod <pod> -o yaml and look under spec.affinity.nodeAffinity.
  • Use case reasoning. For strict enforcement, apply taints so that only qualified pods are scheduled. For placement preference, use node affinity so the scheduler prefers those nodes, but can place elsewhere if needed.
  • Diagnostic check. If pods land on unexpected nodes, compare node labels, taints, and pod tolerations to reconcile scheduler behavior.

2. Enforce GPU-only Scheduling: Label, Taint, and Tolerate for ML Workloads

A practical plan and verification steps:

  • Label GPU nodes to mark them: kubectl label nodes <node> gpu=true. I verify with kubectl get nodes --show-labels.
  • Taint GPU nodes to repel non-ML pods: kubectl taint nodes <node> dedicated=gpu:NoSchedule. I confirm with kubectl describe node <node>.
  • Update ML workload Pod specs: Add a toleration for dedicated=gpu:NoSchedule and a node affinity that selects gpu=true. Example checks: kubectl get deploy <ml-deploy> -o yaml.
  • Test scheduling: Deploy a test ML Pod with the toleration and affinity and confirm it lands only on GPU nodes.
  • Diagnostic steps if it fails: Check tolerations and affinity blocks in the pod yaml, ensure kube-scheduler logs show why placement failed, and ensure labels and taints match exactly.

3. CrashLoopBackOff Debugging: Stepwise Kubernetes Troubleshooting

How I approach repeated crashes:

  • Fetch logs: kubectl logs <pod> --previous if the container restarts frequently, otherwise kubectl logs <pod>.
  • Inspect events and pod status: kubectl describe pod <pod> and read the Events section for probe failures or image pull issues.
  • Check probes and startup sequence. Validate readinessProbe and livenessProbe settings might be killing the container too early.
  • Validate environment and config: Confirm ConfigMaps, Secrets, and environment variables are present and correct.
  • Look for OOMKilled and resource issues: kubectl get pod <pod> -o wide and kubectl describe pod <pod> for OOMKilled messages.
  • Run interactively to reproduce and inspect the runtime: kubectl exec -it <pod> -- /bin/sh or run a debug pod with the same image and entrypoint to iterate quickly.
  • If logs are empty, check image CMD or entrypoint and consider overriding to sleep so I can exec into the container.

4. Node Failure on AWS EKS: Expected Behavior and Checks

How I reason about failure and recovery:

  • Detect failure: kubelet stops reporting heartbeats, node enters NotReady. I see this via kubectl get nodes.
  • Controller behavior: Pods on the node become Unknown or show Terminating and then controllers like Deployments trigger scheduling elsewhere.
  • Autoscaling and replacement: EKS node groups or Auto Scaling Groups may create a replacement node. I check the ASG console and EKS node group status.
  • Availability safeguards: Use PodDisruptionBudgets to keep a minimum replicas during evictions and replacements.
  • Diagnostics: Review kubelet and cloud provider logs, check AWS EC2 instance status, confirm drain behavior via kubectl describe node, and look for eviction events.

5. Replication Controllers, ReplicaSets, and Deployments: Roles and When to Use Each

Thought process for controller choice:

  • Identify required features. If you need rollouts and history, choose Deployment.
  • ReplicationController is legacy and effectively deprecated. I avoid using it in modern clusters.
  • ReplicaSet replaces ReplicationController and supports set-based selectors. A ReplicaSet manages pod replicas, but Deployments manage ReplicaSets and provide rollouts and rollbacks.
  • Diagnostic angle: When rollout issues occur, inspect underlying ReplicaSets and revision history with kubectl rollout history deployment <name> and kubectl get rs.

6. Rolling Back a Failing Deployment: Steps and Checks

Practical rollback workflow:

  • Trigger rollback to the last known good revision: kubectl rollout undo deployment <name>.
  • Inspect history: kubectl rollout history deployment <name> to pick a specific revision if needed.
  • After rollback, monitor pods and events: kubectl get pods --watch and kubectl describe pod to ensure health checks pass.
  • Harden process: tune probes and rollout settings so future rollouts fail fast and trigger automated rollback if configured.

7. Zero Downtime Deployments: Safe Rollout Configuration and Reasoning

A tactical checklist and validation:

  • Set a rolling update strategy that keeps availability: use strategy: rollingUpdate: maxUnavailable: 0 maxSurge: 1 Confirm the Deployment spec includes these values.
  • Use readinessProbes so pods only receive traffic after they are ready, and use preStop hooks to drain in-flight requests.
  • Prefer canary or blue-green patterns for risk reduction. Tools like Argo Rollouts help enforce gradual traffic shifts.
  • Validate system behavior by observing service endpoints, watching metrics, and running smoke tests during rollout.

8. Securing Secrets on Kubernetes in AWS: Practical Safeguards

Steps to reduce secret exposure and validate security:

  • Enable encryption at rest for Kubernetes Secrets using AWS KMS and configure EKS accordingly.
  • Ensure TLS for in-transit encryption between components and for ingress traffic.
  • Apply RBAC so only authorized identities can access secrets. Audit roles and bindings with kubectl get rolebinding and kubectl get clusterrolebinding.
  • Use External Secrets Operator or AWS Secrets Manager to avoid storing plaintext secrets in etcd. Sync secrets from Secrets Manager or Parameter Store.
  • Verify access by attempting to read a Secret with an unprivileged service account to confirm RBAC denies access.

9. ResourceQuotas and LimitRanges: Control and Default Constraints

How to decide and verify quotas:

  • ResourceQuotas set namespace wide limits for CPU, memory, and object counts like pods. I list them with kubectl get resourcequota -n <ns>.
  • LimitRanges set defaults and maximums per container for requests and limits. I inspect them with kubectl get limitrange -n <ns>.
  • Diagnostic use: When pods fail to schedule due to missing requests, check LimitRanges for default values to ensure the scheduler has the correct resource data.
  • Operational plan: Combine both to prevent noisy neighbors and to shape cluster capacity usage.

10. LoadBalancer vs Ingress in AWS: When to Use Each

Decision steps and verification:

  • Service type LoadBalancer provisions an AWS ELB per Service. I confirm created ELBs in the AWS console.
  • Ingress centralizes routing through a single ALB using the AWS Load Balancer Controller, routing many services via host or path rules.
  • Consider cost and limits. If you have many services, prefer Ingress to reduce the ELB count. If a service needs a dedicated IP or special networking, use LoadBalancer.
  • Troubleshoot routing by checking Service and Ingress status and controller logs.

11. Restrict Cross-Namespace Traffic: NetworkPolicy Application and Testing

How I design and confirm network isolation:

  • Start with default deny rules and explicitly allow required ingress and egress using NetworkPolicies with Calico or Cilium.
  • Apply policies and then test from a pod in one namespace to a pod in another using curl or netcat.
  • Inspect effective policies with kubectl get networkpolicies -n <ns> and kubectl describe networkpolicy <name>.
  • If traffic is still allowed, verify the CNI plugin is installed and supports NetworkPolicy enforcement.

12. API Server Down in EKS: Impact and Immediate Checks

What I check and how I reason about continuity:

  • Recognize limitations: When the API server is unreachable, you cannot create or modify Kubernetes objects.
  • Existing workloads keep running because kubelets continue managing pods on nodes.
  • Verify node-level operations and probe kubelet logs for local issues while the control plane is restored.
  • Once API connectivity returns, reconcile controllers may initiate changes; I watch for sudden scheduled events and ensure controllers do not spike activity unexpectedly.

13. Static Pods: Purpose and Node Level Management

How static pods behave and how I validate them:

  • Static pods live in /etc/kubernetes/manifests/ on a node, and kubelet manages them directly.
  • Use them for core node components in single-node or bootstrap scenarios. I check files on the node and kubelet logs for creation events.
  • They do not appear in the API server as objects created by controllers; the API server shows them under pods, but kubelet manages their lifecycle.
  • For troubleshooting, inspect the manifest file, kubelet logs, and container runtime status on the node.

14. Configure Kubectl for an EKS Cluster: Command and Verification

Steps to connect and confirm access:

  • Run: aws eks update-kubeconfig --name <cluster-name> --region <region>
  • The command uses IAM credentials to retrieve the cluster endpoint and authentication data and updates kubeconfig.
  • Verify access with kubectl get nodes and kubectl get pods -n kube-system.
  • If you cannot connect, check AWS CLI credentials, IAM permissions, and cluster endpoint network access.

15. Monitoring and Logging Kubernetes Workloads on AWS: Observability Checklist

Recommended stack and validation steps:

  • Metrics: Deploy Prometheus and Grafana or use Amazon Managed Prometheus and Amazon Managed Grafana. Validate scrape targets and dashboards.
  • Logging: Forward logs with Fluent Bit to Amazon CloudWatch Logs and confirm logs appear in the correct log groups.
  • Tracing: Add AWS X Ray or OpenTelemetry to services and ensure traces surface in the tracing backend.
  • Verify alerting through CloudWatch Alarms or Prometheus Alertmanager and test alert routes to pager or Slack.

16. Rolling Updates in Production: Safe Rollout and Monitoring

A safe update process and monitoring plan:

  • Configure Deployment with maxUnavailable: 0 and readinessProbes to avoid dropping traffic.
  • Trigger the rollout and watch pod readiness and metrics.
  • Monitor health with CloudWatch Alarms or Prometheus Alerts and ensure rollback actions are ready.
  • For gradual strategies, integrate Argo Rollouts or Flagger to shift traffic progressively and test canaries for correctness.

17. Register a Custom Resource Definition: Steps and Controller Design

How I register and validate a CRD:

  • Create the CRD YAML and apply it: kubectl apply -f crd.yaml.
  • Confirm creation with kubectl get crd and kubectl api-resources.
  • Deploy a controller that reconciles CR instances and validate by creating a custom resource and watching the controller logs.
  • Troubleshoot by checking CRD validation errors, API server logs, and controller RBAC permissions.

18. Kubernetes API Server Role: Functions to Monitor and Protect

Key responsibilities and diagnostic checks:

  • The API server serves as the control plane entry, handling authentication, validation, and persisting objects to etcd.
  • It mediates communication between the scheduler, controllers, and kubelets, so I monitor latency and request rates.
  • When debugging control plane issues, check API server metrics, audit logs, and etcd health.
  • If requests fail, inspect authentication and admission controllers for errors and review API server flags for misconfiguration.

19. Feature Gates: Turning Optional Features On and Off

How I enable and validate experimental features:

  • Feature Gates toggle optional Kubernetes features. I set them via component flags, for example --feature-gates=EphemeralContainers=true.
  • Apply the flag on relevant components, such as the API server or kubelet, depending on the feature.
  • Verify the feature is active by reviewing component logs and trying the feature end-to-end.
  • If behavior is unexpected, confirm consistent flags across components and check Kubernetes version compatibility.

20. Production Grade Kubernetes for Fintech on AWS: Architecture and Controls

Architecture choices and operational checks:

  • Use EKS with Fargate for managed serverless workloads and EC2 node groups for workloads needing GPUs or special networking. Deploy across multiple AZs for resilience.
  • Use IAM Roles for Service Accounts for least privilege and the External Secrets Operator with AWS Secrets Manager for secret rotation.
  • Add a service mesh like Istio or Linkerd for mTLS and policy enforcement, and Calico for network policies.
  • Manage infrastructure as code with Terraform and GitOps with ArgoCD. Enforce security with CIS benchmarks, Pod Security Standards, and OPA Gatekeeper.
  • Observability: Prometheus, Loki, Grafana, and CloudWatch integration. Validate backups, disaster recovery, and run periodic security scans and penetration tests.

21. Debugging a Slow Application: Systematic Performance Triage

Stepwise diagnostic plan:

  • Start with metrics: run kubectl top pods --sort-by=cpu and kubectl top pods --sort-by=memory to find hot pods.
  • Describe the problematic pod: kubectl describe pod <pod-name> to check for throttling, restarts, or probe failures.
  • Inspect logs for timeouts and errors: kubectl logs <pod-name>.
  • Test network paths: kubectl exec -it <pod-name> -- ping my-database and kubectl exec -it <pod-name> -- curl http://my-service to measure latency.
  • Check node health and resource exhaustion: kubectl get nodes and kubectl describe node <node-name>.
  • If CPU throttling appears, increase requests and limits or rightsize pods and consider horizontal pod autoscaling.
  • Reproduce load in staging and iterate with profilers and tracing to pinpoint code-level hotspots.

22. Nginx Reachable but URL Fails: Network and Routing Checklist

How I methodically find the fault:

  • Confirm pod health: kubectl get pods -o wide and kubectl describe pod nginx-web to verify ready status.
  • Inspect Service mapping: kubectl describe service nginx-service to verify targetPort, port, and selector match container ports.
  • Ensure Service selects the pod by label and that endpoints exist: kubectl get endpoints nginx-service.
  • Check NetworkPolicies that might block traffic: kubectl get networkpolicies and kubectl describe networkpolicy <policy-name>.
  • Verify Ingress and DNS: kubectl describe ingress nginx-ingress and test curl to the external IP to see ALB or controller errors.
  • If using an ALB or ELB, check the AWS console for listener and target group health and ensure security groups allow HTTP traffic.

23. Deployment Fails After Upgrade: Fast Recovery and Investigation

Recovery workflow and root cause hunting:

  • Roll back immediately to restore service: kubectl rollout undo deployment my-app.
  • Inspect deployment history to see what changed: kubectl rollout history deployment my-app.
  • Gather logs from the failing pods: kubectl logs -l app=my-app and check image pull errors or runtime exceptions.
  • Check readiness and liveness probes for misconfiguration that could mark pods unhealthy.
  • Verify image tags and registry access to rule out image pull issues.

24. Microservice Cannot Reach External Database: Connectivity Troubleshooting

Steps to reestablish database connectivity:

  • Test reachability from inside a pod: kubectl exec -it <pod-name> -- curl http://my-database.example.com:5432 and check connection results.
  • Validate DNS resolution inside pods: kubectl exec -it <pod-name> -- nslookup my-database.example.com to confirm CoreDNS is resolving.
  • Check outbound policies: kubectl get networkpolicies and kubectl describe networkpolicy <policy-name> to see if egress is blocked.
  • Verify firewall and VPC routes in AWS for the cluster subnets to the database host and confirm security groups allow the connection.
  • If a NAT or egress gateway is used, validate its health and logs.

25. Pending Pods Due to Exhausted Resources: Scheduling Triage and Fixes

How I diagnose scheduler constraints and restore capacity:

  • Confirm the scheduling error: kubectl describe pod <pending-pod> will show messages like 0/3 nodes are available: insufficient CPU and memory.
  • Check node resource usage: kubectl top nodes and kubectl describe node <node-name> to find exhausted nodes.
  • Identify heavy consumers across namespaces: kubectl top pods --all-namespaces and look for runaway processes.
  • Apply resource requests and limits on containers so the scheduler has accurate inputs and no single pod starves the cluster.
  • Free capacity by scaling down nonessential workloads: kubectl scale deployment <deployment-name> --replicas=0 and schedule batch work during off-peak times.
  • Add nodes or increase autoscaler limits to raise available cluster capacity and recheck pending pods until they schedule.

Related Reading

  • Coding Interview Tools
  • Jira Interview Questions
  • Coding Interview Platforms
  • Common Algorithms For Interviews
  • Questions To Ask Interviewer Software Engineer
  • Java Selenium Interview Questions
  • Python Basic Interview Questions
  • RPA Interview Questions
  • Angular 6 Interview Questions
  • Best Job Boards For Software Engineers
  • Leetcode Cheat Sheet
  • Software Engineer Interview Prep
  • Technical Interview Cheat Sheet
  • Common C# Interview Questions

Nail Coding Interviews with Interview Coder's Undetectable Coding Assistant − Get Your Dream Job Today

Grinding LeetCode for months to maybe pass one tech interview? There's a smarter way. Interview Coder is your AI-powered, undetectable coding assistant for coding interviews, completely undetectable and invisible to screen sharing. While your classmates stress over thousands of practice problems, you'll have an AI assistant that solves coding challenges in real-time during your actual interviews.

Used by 87,000+ developers landing offers at FAANG, Big Tech, and top startups. Stop letting LeetCode anxiety kill your confidence. Join the thousands who've already taken the shortcut to their dream job. Download Interview Coder and turn your next coding interview into a guaranteed win.


Interview Coder - AI Interview Assistant Logo

Ready to Pass Any SWE Interview with 100% Undetectable AI?

Start Your Free Trial Today