The server crashed again. No warning, no Telegram notification, no graceful shutdown. Just the email from UptimeRobot and silence. Found it powered off in the basement with a black screen. Had to press the power button manually to bring it back.

This is the first unexplained crash since the new Debian install. Time to play detective.

The Evidence

Checked the reboot history:

last -x reboot

The server had been running stable for 3 days straight (June 2 → June 5). Then around 11:19 AM, the logs just... stop. No shutdown sequence, no error messages, no kernel panic. The journal entries go from normal cron jobs and DHCP requests to nothing.

Jun 05 11:15:01 mosearcserver CRON[73741]: (root) CMD (/usr/local/bin/notify-service.sh webhook)
Jun 05 11:19:20 mosearcserver dhcpcd[814]: enp4s0f2: requesting DHCPv6 information
-- end of logs --

An abrupt log stop with no shutdown sequence means the system didn't decide to turn off — it was forced off. Either the power went out, or the kernel froze so hard it couldn't even write an error message.

Ruling Things Out

Went through the usual suspects:

  • Battery (UPower)? — Checked for UPower activity. The kernel shows ACPI: battery: Slot [BAT0] (battery absent). No battery in the system, no UPower installed. Not the ghost battery problem from Chapter 4.
  • Overheating? — dmesg shows thermal zone at 36°C. Cool as a cucumber. No thermal warnings anywhere.
  • Memory errors? — No MCE (Machine Check Exception) errors, no memory corruption logs. Hardware looks clean.
  • Kernel panic? — No crash dumps in /var/crash/, no kern.log. The filesystem wasn't even marked as dirty after the crash — fsck was skipped on the next boot.
  • C-State idle freeze? — GRUB parameters are set correctly: processor.max_cstate=1 intel_idle.max_cstate=1. This was fixed.

Every diagnostic came back clean. No smoking gun.

The Suspects

With no concrete evidence, I'm left with three possibilities:

  1. Power outage or flicker. The most likely candidate. Since the battery was removed, the laptop has zero protection against power interruptions. Even a half-second flicker would kill it instantly. No logs, no warning, just dead.
  2. Charger issue. The power adapter could be intermittently cutting out — old chargers overheat and briefly stop delivering power. Similar to a power outage but caused by the hardware.
  3. Kernel freeze. A one-time kernel hang so severe that nothing gets written to disk. Rare, but possible on older hardware. Would also leave no trace in the logs.

The Experiment

I needed a way to figure out which one it was. And then I realized: I already have the perfect diagnostic tool sitting on a shelf. The old battery.

Yes, the same battery that caused the phantom shutdowns in Chapter 4. The one I removed because it kept lying to UPower about its charge level. But here's the thing — this is a new OS. This Debian install doesn't even have UPower installed:

mose@mosearcserver:~$ which upower
mose@mosearcserver:~$ dpkg -l | grep upower
mose@mosearcserver:~$

No UPower means no service watching the battery level, no CriticalPowerAction, no phantom shutdowns. The battery can lie all it wants — nobody's listening.

So the battery becomes a pure diagnostic tool:

  • If the server stops crashing → the battery absorbed a power flicker that would have killed the server. Verdict: power problem. Buy a UPS.
  • If the server crashes again → the battery kept it alive through any power issues, so the crash must be kernel or hardware. Verdict: software/hardware problem. Investigate further.

Reinserted the battery. Now we wait.

Prep Work

While I'm waiting for the experiment to run, I also enabled persistent journaling so future crashes leave more evidence:

# /etc/systemd/journald.conf
Storage=persistent
sudo systemctl restart systemd-journald

Before this, the journal was stored in volatile memory — if the system crashed hard, the logs from that boot session could be lost. With persistent storage, they're written to disk and survive reboots. If the server does crash again, there might be a few extra lines of evidence that weren't captured before.


Status: Waiting

The battery is in. UPower is absent. The experiment is running. If nothing crashes in the next two weeks, I'm buying that UPS and calling it solved. If it crashes again... well, that'll be a much more interesting blog post.

Stay tuned.