Weird - 3 DA servers went offline

dorucrisan · Sep 1, 2025

Something weird here. Out of the sudeden 2 servers running Alma Linux 9 and Direct admin, went offline the same day, could not be accessed fom outside. They were running for 2 years or so without issues. Connecting a monitor, they can acces the internet, can do yum update or such. I installed a new 3-rd server, same AL9+DA for tests. it was working for maybe a couple of hours then went offline too, exactly like the other two. Connecting a monitor, this one says it is in "emergency mode". I do not know what is happening and how to solve. I installed a 4th server, this time using Rocky 9 + DA. This one is working for the past 24h without issues. Is there any recent network bug in AL9? Anybody has a clue? Thanks in advance.

ericosman · Sep 2, 2025

Thats very strange, normally linux only boots into emergency mode when it has an issue during boot...
Are you sure there is no hardware defect, did you change something or maybe someone who had access to the server?

dorucrisan · Sep 2, 2025

Yes, I am aware of fstab errors that can cause this, but there is nobody in this location except me and I did not do anything wrong.
Is there any way to restart a service maybe...edit a file or whatever? Server goes outside from local keybord but can not be accessed, does not respond to ping either. |I have 3 Alma 9 servers here that died yesterday for no reason. I cannot understand. New Rocky server still works.

Ohm J · Sep 2, 2025

if you want to know, you need to recovery access from emergency boot and see the logs in "/var/log".

Even hardware fail... or something else, the logs should be here.

dorucrisan · Sep 2, 2025

Ohm J said:
if you want to know, you need to recovery access from emergency boot and see the logs in "/var/log".

Even hardware fail... or something else, the logs should be here.

Thank you, I don't know how to do that. 3 servers in a day is too much. Can't be hardware fail anyway. I suspect AL9 update or something that caused the issue.

ericosman · Sep 2, 2025

Did you maybe have an power-outage?

Richard G · Sep 2, 2025

dorucrisan said:
I suspect AL9 update or something that caused the issue.

I don't think so. If this would be the case, then more people would encounter this issue.
Indeed 3 in a day is a lot, but that also very well point to a local issue like as said power-outages, lightning struck of something else depending on where they are residing.

dorucrisan · Sep 6, 2025

Richard G said:
I don't think so. If this would be the case, then more people would encounter this issue.
Indeed 3 in a day is a lot, but that also very well point to a local issue like as said power-outages, lightning struck of something else depending on where they are residing.

Thanks, understood but it is not the case. Looks like software issue in AL9 or in DA in conjunction with AL9. Now, I am trying to reinstall, on the same machine that worked the past 2 years, Dell server, fresh AL9. After reboot I get this message and server does not connect to network. I am sure problem is AL related. I wonder what is causing that.
LATER EDIT: Looks like there is a problem with mounting a separate data drive for backups, fstab syntax is not the same, now drive is required to be identified by UID, not sdb...sdc...etc. I suspect this is the cause for boot failure with "emergency mode" message. Something became deprecated in old installs.

Richard G · Sep 6, 2025

Looks indeed something with Almalinux. Very odd that this suddenly happened. Is this a dedicated server or a VPS?
I've got some servers too with still 1 line in there to /dev/shm but no problems yet.

dorucrisan said:
Looks like there is a problem with mounting a separate data drive for backups,

That could be easily tested by commenting out that mount, put a # in front of it in the /etc/fstab and then see if booting works again.

At least it's not a DA issue but an OS issue.
I found that RHEL has the solution here: https://access.redhat.com/solutions/7047461
unfortunately one has to be a subscriber to be able to see the solution and I don't have a RHEL subscribtion.

dorucrisan · Sep 6, 2025

Yes you can do that if you can login, but some cases you cannot login locally on the machine. Yet not clear what is it, it may be due to new HDD register system in fstab based on drive UUID not on drive name (sdb...sdc...etc). Looks like (according to some posts) that server may mistankely change drive name while rebooting. Then, system goes into emergency mode like mine and I cannot escape from there. I know there is a way, I just don't know it. I registered second drive based on UUID in fstab now and one of dead fellows started. Will keep digging. I have another server still working old style. I guess I have to avoid rebooting. And yes, I am no expert, but seems to be OS issue not DA.
Just FYI, this is how last line in fstab is looking now, the one loading additional drive.

UUID=b0252b79-9255-4d7b-8127-7b56a169ffda /BACKUP xfs defaults 0 0

old style was like that:

/dev/sdb1 /BACKUP ext4 defaults 0 0

Richard G · Sep 6, 2025

dorucrisan said:
I know there is a way, I just don't know it.

Normally from rescue mode you can login to rescue mode or use KVM from datacenter. And then you have to mount the drive manually. After mounting you can change the required files.

But if you even can not login to rescue mode of via KVM then it's a real problem.

Normally both UUID or /dev/sdX should work but it's way better and safer to use UUID as /dev/sdb etc. is a very old method and can also cause other odd issues.

Active8 · Sep 8, 2025

I see you are using 9.5 but 9.6 is the latest versions and the kernel is Kernel: 5.14.0-570.39.1.el9_6.x86_64

dorucrisan · Sep 8, 2025

Active8 said:
I see you are using 9.5 but 9.6 is the latest versions and the kernel is Kernel: 5.14.0-570.39.1.el9_6.x86_64

Yes correct. When doing yum update, does not update to latest version? Is it a way for updating on working server?

Active8 · Sep 8, 2025

dorucrisan said:
yum update, does not update to latest version? Is it a way for updating on working server?

True but if that not works something is wrong with your server.
Try to clean the cache

Code:

dnf clean all
dnf update

dorucrisan · Sep 25, 2025

Active8 said:
True but if that not works something is wrong with your server.
Try to clean the cache

Code:

dnf clean all dnf update

thank you, that worked.

Weird - 3 DA servers went offline

dorucrisan

Verified User

ericosman

Verified User

dorucrisan

Verified User

Ohm J

Verified User

dorucrisan

Verified User

ericosman

Verified User

Richard G

Verified User

dorucrisan

Verified User

Richard G

Verified User

dorucrisan

Verified User

Richard G

Verified User

Active8

Verified User

dorucrisan

Verified User

Active8

Verified User

dorucrisan

Verified User