Segmentation fault after update v1.648 (ef029e9) to v1.649 (980b861)

sec-is

Verified User
Joined
Feb 14, 2020
Messages
85
In the logs I see this: 2023-05-10 01:33:36 localhost: called: install
This is an automatic update of DA.

This is the message I got:
This is an automated message notifying you that DirectAdmin has been successfully updated with warnings.
v1.648 (ef029e9) to v1.649 (980b861)
Warning message:
/usr/local/directadmin/scripts/update.sh:
Removed symlink /etc/systemd/system/multi-user.target.wants/da-popb4smtp.service.
To view what has changed, please visit:

The warning is not important, it did not show up on later versions, and it did not harm as DA has enhanced their program to include that service.

Since this update I see in the error log (this starts 10 seconds after the update):
2023:05:10-01:36:46: *** Segmentation fault *** Waiting for child exit : 0 servers :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** start :
2023:05:10-01:43:30: *** Segmentation fault *** Waiting for child exit : 0 servers :
2023:05:10-02:00:51: *** Segmentation fault *** Waiting for child exit : 0 servers :
2023:05:10-02:04:12: *** Segmentation fault *** Waiting for child exit : 0 servers :

The error log continues to grow with messages like above.

Today I got a call from a customer telling me he is not able to login in directadmin /2222).
(I did not know of the segment errors, for me the journey starts here and now, the day I post this).

I clicked on the link and found some info about the license. Well, the license is okay, so I follow up on the instructions, stop DA
#systemctl stop directadmin

Then I had to wait 10 minutes and start da
#sleep 600; systemctl start directadmin

I was able to login. I went to the license check page, and did a check. As expected: valid.
Then I saw I could update da. A minor release. I updated it.
2023-05-12 20:50:06 : called: install

But the problem was still there.
2023:05:12-21:50:35: *** Segmentation fault *** Waiting for child exit : 9 servers :

Again I am locked out of DA:

License check failure See the Debug Guide

Reason: transient license check failure: too many requests, try again later
Current Server Time: Fri, 12 May 2023 22:16:26 UTC

What is going on?

Also note: earlier today (after seeing the segmentation faults, but before the latest update of da) I did a reboot, thinking it may be system related.

Other services are running okay. But I do am afraid all of this restarting may make the system unreliable. There are tasks which need to be done on time by directadmin which may possibly go wrong. I don't know, we'll see, but I do not want to be missing a backup or a logrotation etc.

# da update
directadmin current v1.649 206fb0a42026959f2fc5b25a7bce453eed92eae8 linux_amd64: already latest
(this is a CentOS 7.2 (edit: 7.4 I should say) box, running for a long time now, never had any problems).

I tried to read some of the logs in DA, via the log reader. But DA keeps on crashing itself. It is not that I need to open a specific page in DA, it does it all by itself.

A NEW TRY:
# ./directadmin b703
2023/05/12 23:09:04 info license updated successfully
Debug mode. Level 703
2023/05/12 23:09:04 info executing task task=action=vacation
And after I login, I see this line pas by:
2023/05/12 23:10:09 info license updated successfully
2023/05/12 23:12:08 info license updated successfully
*** Segmentation fault *** start :
*** Segmentation fault *** start :
*** Segmentation fault *** Waiting for child exit : 9 servers :

The ONLY line I see just before the 'Segmentation fault' is the 'license update'.

Gee, why does it need to check the license every 2 to 3 minutes? (and then crash)

Any how, I am back to being locked out.
This time there is no child left. That is good.
But the crashing over and over again is not.
** it has gotten very late (the time on the log is not my timezone), I need some sleep and see tomorrow. As long as all other services keep running, it is not too bad. But customers complaining is another thing to deal with, I hope this gets solved asap.

A few hours later .. .. I woke up, maybe I should revert the version?
When I was able to login, I updated the da version from current to stable.
# dig +short -t txt alpha-version.directadmin.com beta-version.directadmin.com current-version.directadmin.com stable-version.directadmin.com
"v=1.650&commit=bca6c9604bf150e9742514887e6914912111a9e2&rt=2023-05-13T02:07:11Z&du=1h&df=15m"
"v=1.649&commit=969d9a0ae99c388cb03996e8b78c368ae7220c23&rt=2023-05-13T01:41:15Z&du=24h&df=1h"
"v=1.649&commit=969d9a0ae99c388cb03996e8b78c368ae7220c23&rt=2023-05-13T01:43:52Z&du=336h&df=8h"
"v=1.648&commit=b54781ca43a54f07892354a2ae5c8543d3e268eb&rt=2023-05-10T08:19:34Z&du=168h&df=24h"

Unfortunately, 1.648 has been updated many times, 2023-05-10T08:19:34 seems the last one, and I might need an even older one (a few hotfixes earlier?)
I did get 1.648, this should be okay, as the problems started after I got a higher version. So I installed the 'stable' version.
Of course, after the update immediately I got the 'transient license check failure: too many requests' back.
But I do not yet see a 'Segmentation fault'. But DA has only run a minute, and not even working.

Testing:
# ./directadmin b703
2023/05/13 08:36:48 error license check failure error=transient license check failure: too many requests, try again later
Debug mode. Level 703
^C

Still no go.
A few minutes later:
# ./directadmin b703
2023/05/13 08:40:00 info license updated successfully
Debug mode. Level 703

2023/05/13 08:40:00 info executing task task=action=cache&value=safemode
2023/05/13 08:40:00 info executing task task=action=cache&value=showallusers
2023/05/13 08:40:00 info executing task task=action=convert&value=cronbackups
2023/05/13 08:40:00 info executing task task=action=convert&value=suspendedmysql
2023/05/13 08:40:00 info executing task task=action=syscheck
2023/05/13 08:40:02 info executing task task=action=rewrite&value=cron_path
2023/05/13 08:40:02 info executing task task=action=vacation

So it is running again. I will exit debug mode and let it start 'normally' and let it run for a while and come back later to add to this thread, letting you know the progress.
^C2023/05/13 08:41:27 error waiting for legacy sub-process to terminate error=waitid: no child processes

After starting directadmin, IMMEDIATELY 'transient license check failure: too many requests' got back.

Sigh. Some help from someone is very much appreciated! Maybe @smtalk can jump in? @fln?
 
Last edited:
Already 6 hours later, the 'old' version is running fine, I have no segmentation faults anymore. @DirectAdmin Support : please fix your latest update so it no longer does a segmentation fault? PM me if you want to see it for yourself, I'll grant you a temporary login.
 
We have an error exactly same as you @sec-is , Segmentation fault every few second after directadmin starting
its bump high memory usage almost 100% and return failed killed state in log .
this error not come in previous version 1.648 , only in 1.649
 
For other readers: you may use 'da update stable' to go back one version, it helped me out.
I also noticed something very strange, my swapfile was getting filled more than RAM. 8gb RAM and 4GB swap, why would swap fill up with 1.4GB and RAM with 1GB (not sure of the exact numbers, but it was something like that)? I noticed this change after the update. I tried rebooting, but the swap increased rapidly. There may have been a spike in RAM usage causing this. DA is going to research this.
Do you also have CentOS 7.2 running?
 
I also noticed something very strange, my swapfile was getting filled more than RAM. 8gb RAM and 4GB swap, why would swap fill up with 1.4GB and RAM with 1GB (not sure of the exact numbers, but it was something like that)?
Do you also have CentOS 7.2 running?
Why you don't update your OS, centos 7.9 was released 3 years ago, and you still on 7.2
also check vm.swappiness
sysctl -a | grep swa
you can edit /etc/sysctl.conf and add there needed percent, usually 5 better than default 60
vm.swappiness=5
and then run:
sysctl -p
 
Thanks for reporting the issue and giving us access to investigate this further. We have released a fix unexpected segmentation fault errors (which in turn lead to multiple DA restarts and licensing system starting to rate limit). Anyone affected by such issue can upgrade with da update command. Automatic update might not work because server being continuously restarted.
 
Closing words from @fln to me:
  • Licensing checks happens after 2 minutes of DA start, but then it is extended exponentially and happens only once every ~4 hours after DA is running for a while.
  • It does not matter how many child processes are started, only one process checks the license validity.
  • Licensing rate-limiting does not apply on license checks when DA is still running, only initial license check on DA start is rate-limited. So it does not matter how often DA checks the license info as long as it is not re-started it will never hit rate-limit. In our case the problem was that due to abrupt termination of the web server DA would restart (once every 3 minutes) it have hit the restarts rate limit.
So the essence of the issue is that due to unexpected restrictions on `/proc` on CentOS 7 the DA would restart every 3 minutes. After some time it would hit our licensing rate-limiting system and would start throwing licensing errors. As soon as DA stops restarting (when we fixed the issue, or when it was rolled back to stable v1.648) licensing errors were gone. Based on our licensing info this issue affects only a small portion of the all DA installations in general and we expect the new release with a fix be rolled out soon to every affected system automatically.
 
Back
Top