When I first got eBPF installed, I’ll admit — I felt a little overwhelmed. There were all these tools with cryptic names like tcplife
, profile-bpfcc
, and opensnoop
.
So instead of trying to learn everything at once, I decided to use eBPF the same way I troubleshoot problems: one symptom at a time.
In this post, I’ll walk you through four practical eBPF tools I now use regularly in Kubernetes clusters, with real examples of how they helped me.
Scenario 1: Tracking Down Slow Network Calls
One evening, a simple service-to-database request was dragging. Logs told me it was “timing out,” but not why. Was it DNS? Was the database just slow? Or was something else choking the connection?
That’s when I reached for tcplife-bpfcc
.
Run this command on the node where the pod is scheduled:
sudo tcplife-bpfcc
It shows every TCP connection, how long it lived, and how much data moved.
Example output:
PID COMM IP SADDR DADDR DURATION MS BYTES
1223 python3 4 10.42.0.15:45872 10.42.1.25:5432 3054 145
1244 curl 4 10.42.0.15:51132 142.250.72.14:443 123 2034
Here I could see my Python process making repeated connections to Postgres, each living for ~3 seconds before failing. It wasn’t DNS. It wasn’t CPU. It was a slow database response.
Lesson learned: tcplife
makes invisible network delays visible.
Scenario 2: Who’s Eating My CPU?
Another time, our cluster autoscaler was spinning up extra nodes even though workloads weren’t supposed to be CPU-heavy. Metrics just said “high CPU,” but they didn’t explain who the culprit was.
That’s where profile-bpfcc
comes in — a CPU profiler that doesn’t need you to instrument your code.
Command:
sudo profile-bpfcc -F 99
This samples CPU stacks at 99Hz and can generate flame graphs if piped into visualization tools. Even just in raw output, it’s gold:
python3 app.py
-> handle_request
-> json.dumps
-> encode_basestring_ascii
Turns out, a JSON serialization loop was chewing cycles way more than expected. We optimized it, and the autoscaler calmed down.
Lesson learned: Profiling with eBPF means you don’t need a special agent inside every pod — you can see CPU usage across the whole node.
Scenario 3: Mysterious File Errors
A team complained that a service was “randomly failing” to read config files. Kubernetes ConfigMaps mounted fine, but somehow the app wasn’t seeing them consistently.
Instead of tailing logs and guessing, I used opensnoop-bpfcc
:
sudo opensnoop-bpfcc
It prints every file open attempt in real time. Output looked like this:
PID COMM FD ERR PATH
1321 java -1 ENOENT /etc/config/settings.yaml
Boom. The app was trying to open a file that didn’t exist in the container image. This wasn’t Kubernetes’ fault — it was a bad config reference.
Lesson learned: opensnoop
turns vague “file not found” errors into exact paths and processes.
Scenario 4: Hunting Memory Leaks
Long-running pods sometimes consumed more memory than expected. To catch this, I tried memleak-bpfcc:
sudo memleak-bpfcc
It tracks memory allocations that never get freed, grouped by process.
Output:
PID COMM ALLOC SIZE COUNT
1443 python3 512 1042
1443 python3 1024 256
This showed me that a Python process was leaking memory in large chunks. Sure enough, after checking the code, a poorly managed cache was to blame.
Lesson learned: With eBPF, you don’t have to wait for OOMKills to happen — you can catch leaks as they grow.
Putting It Together in Kubernetes
Here’s the workflow I now follow:
- Find the node running the pod you care about:
kubectl get pod -o wide
- SSH into the node (or use
kubectl debug node/...
). - Run the relevant eBPF tool:
tcplife-bpfcc
→ network slowdownsprofile-bpfcc
→ CPU profilingopensnoop-bpfcc
→ file access issuesmemleak-bpfcc
→ memory leaks
- Map PIDs to containers with:
crictl ps docker ps # if using Docker
This gives you container-aware visibility into what’s happening under the hood.
Recap
In this post, we walked through four practical eBPF tools:
tcplife-bpfcc
for slow network calls.profile-bpfcc
for CPU bottlenecks.opensnoop-bpfcc
for file access issues.memleak-bpfcc
for memory leaks.
Each one gave me visibility into problems that traditional logs and metrics completely missed.
In Part 3, we’ll go even further — combining these tools into real Kubernetes monitoring workflows. Think “debugging a slow microservice call end-to-end” or “catching a noisy neighbor draining resources.”