Part 2: Hands-On — Using eBPF Tools to Monitor Applications in Kubernetes

When I first got eBPF installed, I’ll admit — I felt a little overwhelmed. There were all these tools with cryptic names like tcplife, profile-bpfcc, and opensnoop.

So instead of trying to learn everything at once, I decided to use eBPF the same way I troubleshoot problems: one symptom at a time.

In this post, I’ll walk you through four practical eBPF tools I now use regularly in Kubernetes clusters, with real examples of how they helped me.

Scenario 1: Tracking Down Slow Network Calls

One evening, a simple service-to-database request was dragging. Logs told me it was “timing out,” but not why. Was it DNS? Was the database just slow? Or was something else choking the connection?

That’s when I reached for tcplife-bpfcc.

Run this command on the node where the pod is scheduled:

sudo tcplife-bpfcc

It shows every TCP connection, how long it lived, and how much data moved.

Example output:

PID    COMM      IP SADDR            DADDR            DURATION MS   BYTES
1223   python3   4  10.42.0.15:45872 10.42.1.25:5432  3054        145
1244   curl      4  10.42.0.15:51132 142.250.72.14:443  123      2034

Here I could see my Python process making repeated connections to Postgres, each living for ~3 seconds before failing. It wasn’t DNS. It wasn’t CPU. It was a slow database response.

Lesson learned: tcplife makes invisible network delays visible.

Scenario 2: Who’s Eating My CPU?

Another time, our cluster autoscaler was spinning up extra nodes even though workloads weren’t supposed to be CPU-heavy. Metrics just said “high CPU,” but they didn’t explain who the culprit was.

That’s where profile-bpfcc comes in — a CPU profiler that doesn’t need you to instrument your code.

Command:

sudo profile-bpfcc -F 99

This samples CPU stacks at 99Hz and can generate flame graphs if piped into visualization tools. Even just in raw output, it’s gold:

python3 app.py
    -> handle_request
    -> json.dumps
    -> encode_basestring_ascii

Turns out, a JSON serialization loop was chewing cycles way more than expected. We optimized it, and the autoscaler calmed down.

Lesson learned: Profiling with eBPF means you don’t need a special agent inside every pod — you can see CPU usage across the whole node.


Scenario 3: Mysterious File Errors

A team complained that a service was “randomly failing” to read config files. Kubernetes ConfigMaps mounted fine, but somehow the app wasn’t seeing them consistently.

Instead of tailing logs and guessing, I used opensnoop-bpfcc:

sudo opensnoop-bpfcc

It prints every file open attempt in real time. Output looked like this:

PID    COMM      FD  ERR  PATH
1321   java      -1  ENOENT  /etc/config/settings.yaml

Boom. The app was trying to open a file that didn’t exist in the container image. This wasn’t Kubernetes’ fault — it was a bad config reference.

Lesson learned: opensnoop turns vague “file not found” errors into exact paths and processes.


Scenario 4: Hunting Memory Leaks

Long-running pods sometimes consumed more memory than expected. To catch this, I tried memleak-bpfcc:

sudo memleak-bpfcc

It tracks memory allocations that never get freed, grouped by process.

Output:

PID    COMM     ALLOC SIZE  COUNT
1443   python3  512         1042
1443   python3  1024        256

This showed me that a Python process was leaking memory in large chunks. Sure enough, after checking the code, a poorly managed cache was to blame.

Lesson learned: With eBPF, you don’t have to wait for OOMKills to happen — you can catch leaks as they grow.


Putting It Together in Kubernetes

Here’s the workflow I now follow:

  1. Find the node running the pod you care about: kubectl get pod -o wide
  2. SSH into the node (or use kubectl debug node/...).
  3. Run the relevant eBPF tool:
    • tcplife-bpfcc → network slowdowns
    • profile-bpfcc → CPU profiling
    • opensnoop-bpfcc → file access issues
    • memleak-bpfcc → memory leaks
  4. Map PIDs to containers with: crictl ps docker ps # if using Docker

This gives you container-aware visibility into what’s happening under the hood.


Recap

In this post, we walked through four practical eBPF tools:

  • tcplife-bpfcc for slow network calls.
  • profile-bpfcc for CPU bottlenecks.
  • opensnoop-bpfcc for file access issues.
  • memleak-bpfcc for memory leaks.

Each one gave me visibility into problems that traditional logs and metrics completely missed.

In Part 3, we’ll go even further — combining these tools into real Kubernetes monitoring workflows. Think “debugging a slow microservice call end-to-end” or “catching a noisy neighbor draining resources.”

Leave a Reply

Your email address will not be published. Required fields are marked *