Weird - 3 DA servers went offline

dorucrisan

Verified User
Joined
Oct 23, 2021
Messages
198
Location
Bucharest / Romania
Something weird here. Out of the sudeden 2 servers running Alma Linux 9 and Direct admin, went offline the same day, could not be accessed fom outside. They were running for 2 years or so without issues. Connecting a monitor, they can acces the internet, can do yum update or such. I installed a new 3-rd server, same AL9+DA for tests. it was working for maybe a couple of hours then went offline too, exactly like the other two. Connecting a monitor, this one says it is in "emergency mode". I do not know what is happening and how to solve. I installed a 4th server, this time using Rocky 9 + DA. This one is working for the past 24h without issues. Is there any recent network bug in AL9? Anybody has a clue? Thanks in advance.
 
Thats very strange, normally linux only boots into emergency mode when it has an issue during boot...
Are you sure there is no hardware defect, did you change something or maybe someone who had access to the server?
 
Yes, I am aware of fstab errors that can cause this, but there is nobody in this location except me and I did not do anything wrong.
Is there any way to restart a service maybe...edit a file or whatever? Server goes outside from local keybord but can not be accessed, does not respond to ping either. |I have 3 Alma 9 servers here that died yesterday for no reason. I cannot understand. New Rocky server still works.
 
Last edited:
if you want to know, you need to recovery access from emergency boot and see the logs in "/var/log".

Even hardware fail... or something else, the logs should be here.
 
if you want to know, you need to recovery access from emergency boot and see the logs in "/var/log".

Even hardware fail... or something else, the logs should be here.
Thank you, I don't know how to do that. 3 servers in a day is too much. Can't be hardware fail anyway. I suspect AL9 update or something that caused the issue.
 
I suspect AL9 update or something that caused the issue.
I don't think so. If this would be the case, then more people would encounter this issue.
Indeed 3 in a day is a lot, but that also very well point to a local issue like as said power-outages, lightning struck of something else depending on where they are residing.
 
I don't think so. If this would be the case, then more people would encounter this issue.
Indeed 3 in a day is a lot, but that also very well point to a local issue like as said power-outages, lightning struck of something else depending on where they are residing.
Thanks, understood but it is not the case. Looks like software issue in AL9 or in DA in conjunction with AL9. Now, I am trying to reinstall, on the same machine that worked the past 2 years, Dell server, fresh AL9. After reboot I get this message and server does not connect to network. I am sure problem is AL related. I wonder what is causing that.
LATER EDIT: Looks like there is a problem with mounting a separate data drive for backups, fstab syntax is not the same, now drive is required to be identified by UID, not sdb...sdc...etc. I suspect this is the cause for boot failure with "emergency mode" message. Something became deprecated in old installs.

1757147767746.png
 
Last edited:
Looks indeed something with Almalinux. Very odd that this suddenly happened. Is this a dedicated server or a VPS?
I've got some servers too with still 1 line in there to /dev/shm but no problems yet.

Looks like there is a problem with mounting a separate data drive for backups,
That could be easily tested by commenting out that mount, put a # in front of it in the /etc/fstab and then see if booting works again.

At least it's not a DA issue but an OS issue.
I found that RHEL has the solution here: https://access.redhat.com/solutions/7047461
unfortunately one has to be a subscriber to be able to see the solution and I don't have a RHEL subscribtion.
 
Yes you can do that if you can login, but some cases you cannot login locally on the machine. Yet not clear what is it, it may be due to new HDD register system in fstab based on drive UUID not on drive name (sdb...sdc...etc). Looks like (according to some posts) that server may mistankely change drive name while rebooting. Then, system goes into emergency mode like mine and I cannot escape from there. I know there is a way, I just don't know it. I registered second drive based on UUID in fstab now and one of dead fellows started. Will keep digging. I have another server still working old style. I guess I have to avoid rebooting. And yes, I am no expert, but seems to be OS issue not DA.
Just FYI, this is how last line in fstab is looking now, the one loading additional drive.

UUID=b0252b79-9255-4d7b-8127-7b56a169ffda /BACKUP xfs defaults 0 0

old style was like that:

/dev/sdb1 /BACKUP ext4 defaults 0 0
 
Last edited:
I know there is a way, I just don't know it.
Normally from rescue mode you can login to rescue mode or use KVM from datacenter. And then you have to mount the drive manually. After mounting you can change the required files.

But if you even can not login to rescue mode of via KVM then it's a real problem.

Normally both UUID or /dev/sdX should work but it's way better and safer to use UUID as /dev/sdb etc. is a very old method and can also cause other odd issues.
 
I see you are using 9.5 but 9.6 is the latest versions and the kernel is Kernel: 5.14.0-570.39.1.el9_6.x86_64
 
Back
Top