Diving Into Kubernetes Diagnostics: My K8sGPT Journey as a Rookie

2025‑10-22 Kubernetes K8sGPT

Greetings, Kubernetes enthusiasts! As a rookie to native Kubernetes—still finding my footing in logs and YAML—I was excited to dive into K8sGPT. This CNCF Sandbox project harnesses AI to scan Kubernetes clusters, diagnose issues, and explain them in simple English, weaving in SRE expertise with cutting-edge technology. I set it up on a local VMware VM running Ubuntu Server 24.04.3, and the experience was both educational and practical. In this blog, I’ll share what K8sGPT is, how I got it running, my hands-on experiences as a beginner, and an invitation for you to explore it too.

What is K8sGPT?

K8sGPT is a tool designed to simplify Kubernetes troubleshooting with a two-phase approach. Phase 1 runs built-in checks to detect issues, while Phase 2 enriches these findings with AI explanations using LLMs like OpenAI or self-hosted options. According to its website, it codifies SRE experience into analyzers, pulling relevant data and anonymizing sensitive details to provide actionable insights. Key features include:

Filters: Predefined rules for resources like Pods, ConfigMaps, and Deployments.
AI Enhancements: Offers natural language explanations and remediation steps.
Extensibility: Supports integrations (e.g., Trivy) and custom analyzers.
Flexibility: Works with hosted or self-hosted AI backends.
CLI Focus: Simple commands for quick analysis.

It’s a great starting point for rookies like me to learn SRE practices and a flexible tool for seasoned engineers to customize. I followed the official documentation to set it up, which provided clear guidance on installation and configuration.

My Setup: A Local VMware Adventure

I set up K8sGPT on a local VMware VM with Ubuntu Server 24.04.3, using 32 cores, 20GB RAM, and 200GB storage—perfect for a beginner’s cluster. Here’s how I got started:

VM and Kubernetes Install:
- Launched the VM and connected via SSH.
- Installed Kubernetes.
- Checked status with kubectl get pods -A—saw Cilium, CoreDNS, and other components running (e.g., cilium-envoy-chwqk 1/1 Running).
- Enabled kubectl completion with source <(kubectl completion bash).
K8sGPT Install:
- Downloaded the binary: curl -sLO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.25/k8sgpt_amd64.deb.
- Installed with sudo dpkg -i k8sgpt_amd64.deb.
- Confirmed with which k8sgpt (/usr/bin/k8sgpt) and k8sgpt version (v0.4.25).
- Enabled completion with source <(k8sgpt completion bash).
AI Configuration:
- Started with OpenAI: k8sgpt auth add --backend openai --model gpt-4o-mini.
- Tested self-hosted LocalAI later, though 20GB RAM struggled with larger models.

The process took about 25 minutes, with Kubernetes stabilization being the longest wait. My VM’s specs worked well, but RAM was a bottleneck for self-hosted AI—consider 32GB as suggested by the lab.

My Experiences: Insights and Challenges as a Rookie

Following the lab, I broke and fixed my cluster to test K8sGPT. Here’s what I learned:

The Wins: Making Sense of Chaos

Phase 1 (Without AI): Ran k8sgpt analyze—flagged unused ConfigMaps like kube-root-ca.crt. Created a broken pod (kubectl run brokenpod --image=nginx:unknown_version), and it caught “ErrImagePull” with details. The -s flag showed analyzer times (e.g., Pod took 26ms), which helped me understand performance.
Phase 2 (With AI): Added --explain—AI explained the pod issue: “Failed to pull image due to invalid tag; use ’nginx:latest’.” Initially hit an error (“AI provider not specified”), but k8sgpt auth fixed it. Self-hosted LocalAI worked but was slower on my CPU-only setup.
Filters and Rules: Listed active filters with k8sgpt filters list—covered Pods, Deployments, etc. Peeked at the pod.go source to see how it detects errors like “ErrImagePull”.
Fixing Issues: Patched the pod with kubectl patch pod brokenpod -p '{"spec":{"containers":[{"name":"brokenpod","image":"nginx:latest"}]}}'—k8sgpt analyze --filter Pod confirmed no errors after a short wait.

It turned my rookie struggles into manageable steps, with AI guiding me like a mentor.

The Gotchas:

LocalAI is Resource Heavy: The self-hosted AI option was resource-intensive. With only 20GB RAM on my virtual machine, loading models was slow, particularly for larger ones. The documentation highlighted that language models require significant memory—ideally 32GB or more—and my experience confirmed this. For a smoother experience, using a hosted AI backend was a faster choice, especially on my modest setup.

Wrapping Up: K8sGPT as a Rookie’s Best Friend

K8sGPT transforms Kubernetes troubleshooting with AI-driven clarity, making it accessible even for rookies like me. My VM-based trial showed its potential, despite some resource tweaks needed. I’m excited to keep exploring it in my learning journey.

Ready to try it? Dive into the official Getting Started Guide.

Happy clustering!

Gerelateerde posts

What is K8sGPT?

My Setup: A Local VMware Adventure

My Experiences: Insights and Challenges as a Rookie

The Wins: Making Sense of Chaos

The Gotchas:

Wrapping Up: K8sGPT as a Rookie’s Best Friend

Gerelateerde posts

Puur Omdat het Kan: K8S ExternalName

Horizontal Pod Autoscalers Uitgelegd

KubeVirt en MetalLB

Google Anaytics (functional)