💥 The Incident

A few days ago, someone decided to DDoS the entire IP range of my Hong Kong VPS provider.

My machine had fail2ban running. It did its job — maybe a little too enthusiastically. Within minutes it had banned over 20,000 IPs, allocating memory for each rule until the box ran out of RAM entirely and triggered a kernel panic. The VPS went dark… Great… :/

So here’s the irony: fail2ban didn’t fail because it was badly configured. It failed because of a fundamental architectural problem. Every packet in a flood still has to be received by the kernel, handed up the networking stack, and then evaluated before fail2ban can react. Under a real volumetric flood, that cost alone — tens of thousands of soft interrupts per second — is enough to saturate a single vCPU and collapse the machine before any rule can take effect.

This sent me down the XDP/eBPF rabbit hole.


❓ What is XDP?

XDP (eXpress Data Path) is a Linux kernel technology that lets you attach eBPF programs directly to the NIC driver, processing packets before they enter the networking stack. This means you can drop, pass, or redirect packets at wire speed — before skb allocation, before iptables, before any userspace process sees them.

The traditional packet filtering path looks like this:

NIC → Driver → skb allocation → netfilter/iptables → kernel TCP/IP stack → application

With XDP:

NIC → Driver → [XDP program runs here] → DROP / PASS / REDIRECT

Dropped packets never touch the kernel stack. No memory allocation. No soft interrupt cascade. No fail2ban. Just gone. 😎👍

This is why XDP is so effective against volumetric floods: the CPU cost of dropping a packet at the XDP layer is roughly 34–65 nanoseconds on a KVM VPS — versus hundreds of nanoseconds (and potential memory pressure) for a packet that travels all the way up the stack.


🔧 What I Built

The result is basic_xdp : an XDP-based port whitelist firewall for Linux, with two components.

1. 🧱 The XDP Firewall (xdp_firewall.c)

The eBPF program runs at the driver level and enforces a TCP/UDP port whitelist stored in BPF maps. Key design decisions:

  • BPF ARRAY maps (indexed by port number, 65536 entries) — O(1) lookup, no hash collisions, hot-updatable from userspace without reloading the program.
  • ACK passthrough before whitelist check — reply packets of established connections have random ephemeral destination ports, so they must be passed before the whitelist lookup runs. Only pure SYN packets (new connection attempts) are checked against the whitelist.
  • IPv4 fragment dropping — fragmented packets can bypass port-based filters since only the first fragment carries the transport header. Fragments are dropped and counted separately.
  • IPv6 extension header traversal — crafted packets with chained extension headers (Hop-by-Hop, Routing, Destination Options, Fragment) can place the TCP/UDP header at an unexpected offset. The program walks up to 6 extension headers to find the real transport layer before applying the whitelist check.
  • Per-CPU packet counters — a BPF_MAP_TYPE_PERCPU_ARRAY tracks pass/drop counts per protocol, locklessly. Readable with bpftool map dump.

2. 🕵️ The Port-Sync Daemon (xdp-sync-ports.py)

Managing firewall rules manually is how misconfigurations happen. Every time you start a new service, you have to remember to add its port. Miss one, and your service is silently blocked. Add too many, and your attack surface grows.

The daemon solves this by keeping the BPF map in sync with the system’s actual listening ports automatically. But the interesting part is how it detects changes.

Most “auto-sync” firewall solutions use a cron job or a polling loop with a fixed interval. Slow to react, and wasteful.

Instead, the daemon subscribes to the Linux Netlink Process Connector — a kernel interface that delivers real-time PROC_EVENT_EXEC and PROC_EVENT_EXIT notifications whenever a process starts or exits. When a new service starts and calls bind(), the daemon learns about it within milliseconds, waits 300ms for the bind to complete (debounce), then scans /proc via psutil and syncs any new ports.

The typical end-to-end latency from “service starts” to “port whitelisted” is under one second, with no polling overhead.

Debounce logic to prevent self-triggering

There’s a subtle problem with event-driven port scanning: tools like cron, logrotate, and even psutil’s own /proc reads spawn child processes, which generate a burst of EXEC/EXIT events. If each event triggers an immediate scan, you get a feedback loop.

The daemon handles this with a simple but effective strategy:

  • On the first EXEC/EXIT event, arm a debounce timer (300ms) instead of syncing immediately.
  • During the debounce window, stop select()-ing on the netlink socket entirely — just sleep(). This prevents new events from re-arming the timer.
  • After syncing, drain any queued events before returning to the event loop. This discards events generated by our own psutil scan.

Direct bpf(2) syscall, no bpftool dependency

Rather than shelling out to bpftool, the daemon calls the bpf(2) syscall directly via ctypes. This removes the runtime dependency on bpftool and avoids subprocess overhead on every update.

A write-through in-memory cache tracks which ports are currently whitelisted. On each sync, only the diff (ports added or removed) results in syscalls — not a full map rewrite.

Permanent port whitelist

A TCP_PERMANENT dict in the daemon ensures certain ports (e.g., SSH) are never removed from the whitelist even if the listener temporarily disappears (during a service restart, for instance). This prevents the daemon from locking you out of your own machine during a race condition.


📊 Real-World Performance Benchmark

This benchmark simulates a volumetric UDP flood attack. We used a high-performance AMD EPYC™ 7Y43 server as the “Attacker” to stress-test a 1 vCPU AMD Ryzen 9 3900X instance protected by Basic XDP.

Test Environment

  • 🇭🇰 Attacker: AMD EPYC™ 7Y43 @ 2.55GHz (Generating ~367k PPS / 188 Mbps)
  • 🇺🇸 Target (Receiver): AMD Ryzen 9 3900X @ 2.0GHz (1 vCPU, 1GB RAM)
  • Tool: pktgen (Linux Kernel Packet Generator)
  • Attacker and target connected over public internet

Comparative Results

MetricBasic XDP OFFBasic XDP ONImprovement
Softirq (si) CPU Usage85.9%3.0%~28x Reduction
System ResponsivenessExtremely LaggySmoothSignificant
Packet HandlingProcessed by Kernel StackDropped at Driver Level-

When XDP is off, the kernel networking stack processes every incoming packet, consuming nearly all available CPU through soft interrupts. With XDP on, the same 367k PPS flood is absorbed at the driver level — the machine stays fully responsive and SSH remains stable throughout.

XDP OFF — softirq at 85.9% under flood:

XDP OFF

XDP ON — same flood, CPU drops to 3.0%:

XDP ON

XDP ON — before attack:

XDP ON, before attack

XDP ON — after attack:

XDP ON, After attack


🔍 How to reproduce

# On the target instance:
git clone https://github.com/Kookiejarz/basic_xdp.git
cd basic_xdp
sudo bash setup_xdp.sh

The setup script compiles the eBPF program, loads it onto your network interface, pins the BPF maps to /sys/fs/bpf/xdp_fw/, and installs the sync daemon as a systemd service.

# Verify the daemon is running:
systemctl status xdp-sync-ports

# Watch port sync events live:
journalctl -fu xdp-sync-ports

# Read packet counters:
bpftool map dump pinned /sys/fs/bpf/xdp_fw/pkt_counters

To reproduce the benchmark:

# Load the kernel module
modprobe pktgen

# Configure the device (replace with your interface name)
PGDEV=/proc/net/pktgen/INTERFACE
echo "rem_device_all" > /proc/net/pktgen/kpktgend_0
echo "add_device INTERFACE" > /proc/net/pktgen/kpktgend_0

# Set attack parameters
echo "count 10000000" > $PGDEV             # Send 10 million packets
echo "pkt_size 64" > $PGDEV                # Small packets put more stress on the CPU
echo "dst TARGET_IP" > $PGDEV         	   # Target IP
echo "dst_mac TARGET_MAC" > $PGDEV  	   # Target MAC
echo "clone_skb 100" > $PGDEV              # Speed up packet generation