Good question, thank you. To the outside, the server was totally non-responsive (no Ping, no httpd, no IMAP, no sshd), but according to the logs, it wasn't quite dead.
- During the entire downtime, the cronlog has one line per minute, which seems to be normal behavior:
12:58:01 server CROND[12451]: (root) CMD (/usr/local/directadmin/dataskq)
Interestingly, a couple of days before the downtime, the hourly entries changed. Originally, it was along the lines of:
Sep 21 08:01:01 server CROND[9323]: (root) CMD (run-parts /etc/cron.hourly)
Sep 21 08:01:01 server run-parts(/etc/cron.hourly)[9323]: starting 0anacron
Sep 21 08:01:01 server run-parts(/etc/cron.hourly)[9332]: finished 0anacron
...but a couple of days before the incident, the entries change to:
Sep 28 05:01:01 server CROND[10554]: (root) CMD (run-parts /etc/cron.hourly)
Sep 28 05:01:01 server run-parts(/etc/cron.hourly)[10554]: starting 0anacron
Sep 28 05:01:01 server anacron[10563]: Anacron started on 2013-09-28
Sep 28 05:01:01 server run-parts(/etc/cron.hourly)[10565]: finished 0anacron
Sep 28 05:01:01 server anacron[10563]: Job `cron.daily' locked by another anacron - skipping
Sep 28 05:01:01 server anacron[10563]: Normal exit (0 jobs run)
...and stay like that until the server is finally rebooted after the "crash". I googled this, and came across this old thread:
https://bugzilla.redhat.com/show_bug.cgi?id=517321
From what I make of this, it seems like this issue might actually prevent proper logging on my server (so in fact, the issues for the actual downtime may be quite trivial, but are impossible to troubleshoot because of poor logs). This would also be in line with a logging issue I came across a couple of months ago and which I was never able to fix:
http://forum.directadmin.com/showthread.php?t=47008
- "messages" has has nothing for the time span except for two lines, which I had initially overlooked:
05:20:22 server init: serial (hvc0) main process ended, respawning
05:24:23 server init: serial (hvc0) main process ended, respawning
(These two lines come almost 7 hours after the server was down - after 8 hours I hit the hard reset. Not sure if this is interesting information or not.)
- The mail log has nothing for that timespan.
- Apache logs seems to be configured to be deleted (!) on a daily basis (I would have expected to see the older log files archived) - so there's nothing to check there.
- dmesg has no timestamps (not sure if that is normal), but without that it seems to me of limited use.