OOM killer: what it is, why it happens, and how to stop it
The Reflex Team10 minFebruary 2026
The Linux OOM killer is not out to get PHP specifically—but PHP-FPM is an easy target because you voluntarily run a pool of fat processes that each look like a juicy allocation to the kernel.
What the OOM killer actually is
When the kernel cannot satisfy a memory allocation request and reclaiming caches is not enough, it picks a process (or several) and sends SIGKILL. No graceful shutdown. Your framework never runs terminate().
Kernel log lines look like:
Out of memory: Killed process 18421 (php-fpm8.2) total-vm:2458112kB, anon-rss:198764kB
That is your post-mortem receipt.
Why PHP-FPM is exposed
Each worker is a separate process with its own RSS. memory_limit protects PHP from itself, not from the sum of children × peak memory. If traffic spikes and thirty workers each grab 200 MB, you are in trouble before memory_limit trips—because the kernel looks at system free memory, not per-interpreter limits.
Read the evidence
dmesg -T | grep -i "killed process"— who died and when.journalctl -kon systemd hosts—same story, nicer timestamps.- FPM slow log and nginx upstream timers—symptoms before death.
Tuning pm.max_children with arithmetic
Rough guardrail (simplified):
max_children ≈ (available_ram_for_fpm) / (expected_peak_rss_per_worker)
Leave headroom for MySQL, Redis, nginx, page cache, and the OS. If your math says forty children on a 2 GiB VPS, your app is not "efficient"—your server is undersized or your workers are too heavy.
Swap: friend and liar
A little swap can prevent trivial spikes from killing workers. Lots of swap just turns fast death into slow thrashing. Watch si/so in vmstat. If you are swapping hard under load, you are already losing.
oom_score_adj (advanced)
Operators sometimes nudge OOM priority so the agent or sshd survives longer than a runaway worker. That is a scalpel, not a bandage—document why you did it.
What Reflex adds
We care about OOM kills because they are frequently the first domino: upstream resets, retry storms, partial writes. Reflex correlates kernel OOM events with pool pressure and recent deploy markers so you are not reconstructing timelines from three tabs at 3am.
Bottom line
Respect the kernel budget. Measure worker RSS honestly. Treat max_children as a capacity plan, not a default in a tutorial. Your future self—sleeping—will thank you.