Solved API (CMD_API_DOMAIN) is super slow with 1.62

kam

Verified User
Joined
Jan 4, 2009
Messages
55
============================
Updated On 2021-6-15 Problem Solved

I fixed it by change "SSL Certificate" setting from "Use best match certificate" to anythings else.
With the old version, I don't encounter this problems when I choose "Use best match certificate" with no certificate.
May be the version 1.62 change somethings in the script,
It will keep looping to execute Openssl, for those domains with "Use best match certificate" SSL option selected while those domains had no certificates.


Most of the users may not encounter this issues.
But for whom already have many domains in DA and using "Use best match certificate" with no certificates as their default option.
After upgrade to 1.62, they will found that the waiting time to add / modify / delete domains are much longer then usual, .while it's working well in old version with the same settings.

==============================


Hello,

I have around 1200 domains.
80 of them are added as top level, and the remaining added as Alias (DomainPointer).

After upgrade to version 1.62, the API interface took 56 seconds to handle a modify request.
I remember that it only took around 1~2 seconds before the upgrade. Can anyone please look into it.

Thanks


Query
CMD_API_DOMAIN?action=modify&domain=anywhere.in&bandwidth=3333&uquota=shared&ssl=&cgi=&php=

Response
Slow_API.jpg
 
Last edited:
There has been a noted change to the API did you test out the updates? I dont see your exact call listed. Maybe it will lead you to an answer.

Behavior Changes
json=yes calls to the user=fred GET version of these calls have output changes
  • CMD_API_SHOW_USER_DOMAINS
  • CMD_SHOW_USER
  • CMD_API_SHOW_USER_USAGE
  • CMD_API_SHOW_USER_CONFIG
Please test your API calls if you're using these, as the "quota" value was not in the correct format before. It's now an array.

Behavior change: CMD_API_SHOW_USER_DOMAINS, CMD_SHOW_USER, CMD_API_SHOW_USER_USAGE, CMD_API_SHOW_USER_CONFIG: quota


www.directadmin.com
www.directadmin.com
There is a GET option which lets you revert to the old value, as a workaround (but please update your scripts to use the corrected array method)
 
I check it out manually with postman. It's a straightforward update request via API and I can confirmed that it's very slow.

Also, during the time of waiting for the API to response. The server CPU usage somehow goes up to 100%. I think this behaviors is not normal. 😅

slow_update.jpg
 
After upgrade to version 1.62, the API interface took 56 seconds to handle a modify request.
Hi @kam,

My first guess would be a custom hook.. and the fact it's hovering around 1 minute seems like some standard timeout.
With that, it still need to be debugged, so I'd recommend running DA in debug mode 3000, trigger the modification and see what it does.

It will likely stop showing output at some point while it's "doing something", and that, plus the few pages+ before would be relevant for debugging what it's actually doing. Level 3000 is going to spit out a lot, but if it's a hook, we'd be looking for output starting with:
Code:
System::executeIfExists: running:
to spot which script it is.

You can also check for hooks in:
/usr/local/directadmin/scripts/custom/*
/usr/local/directadmin/plugins/*/hooks/*

(sh files, or hook-named folders, with any .sh files inside)

Either, all_pre.sh/all_post.sh, or something related to modifying the domain.

Again, that's only if it's a hook causing it. The debug might show endless output (stuck in a loop perhaps) in which case, let know if you spot repeated info, or just nothing.. in which case let us know the info above it to try and sort out what it's doing.

One other trick (might be simpler than the debug output) is to throw a USR1 signal at the process.
This will tell that process to dump it's last logged process location to the /var/log/directadmin/error.log.
This might be useful, or it might be too specific to know what's going on, but worth a shot, eg:
Code:
killall -USR1 directadmin; tail -f /var/log/directadmin/error.log

Also check if it does the same thing from the GUI (Evolution skin), as it's the same code.. so should do the same thing.
If not, then that might point to something with the login keys, if you're using one.

If you paste anything here, be sure not to include any sensitive information.

John
 
Hello,

I received below errors at the time waiting for the API to response. [Debug Mode enabled]
[root@server]# ./directadmin b3000 | grep string

Nearly 15K line of errors are generated.
I can't find any files named "httpd_tokens" on my server.

I had not set anything for custom httpd.
But somehow after upgrade to 1.62, the script try to read all the custom httpd settings which are not existed and generated errors.

ErrorLog.jpg
 
Last edited:
At the higher debug level, that output might be normal. It's just saying they don't exist, and would likely move on.
What we're interested in would be "non stop" output that's taking too long.. or something in a loop.
Does the above output go for 10+ seconds or is the output changing to something else?
Or.. does the output just stop and you're waiting for the browser response, but nothing is being output in the console?
This might require a ticket to debug, we may need to login to see what's going on (I could probably track it down fairly quickly if logged in)
 
Thanks for your help. I think I made a mistake and run it with `| grep string`. And that's why I unable to get the big picture.

This time, I run it again with
./directadmin b3000

Now I found which hookup process get stuck and consume most of the time during looping.

execute('/usr/bin/openssl x509 -text -certopt no_header,no_version,no_serial,no_signame,no_pubkey,no_sigdump,no_aux -in /etc/httpd/conf/ssl.crt/server.crt', maxsize=145, fd=1, env=0)

When CMD_API_DOMAIN is called, no matter it's add / modify / delete . I found that the script try to loop all domains to execute the (Openssl) process. Even it's just a Alias (Domain Pointer), it still try to loop for the same parent domain again and again for the `openssl` execution.

Most of the time consumed is to wait for that Openssl process to completed.


I decided to try to delete a domain with the directadmin web interface, and I can confirmed that it having the same issue.
It try to loop for the `openssl` execution. I have to wait for > 55 seconds to delete a domain. 😅
 
Problem Solved

I fixed it by change "SSL Certificate" setting from "Use best match certificate" to anythings else.
With the old version, I don't encounter this problems when I choose "Use best match certificate" with no certificates.
May be the version 1.62 change somethings in the script,
It will keep looping to execute Openssl, for those domains with "Use best match certificate" SSL option selected while those domains had no certificates.

Most of the users may not encounter this issues. But for whom already have many domains in DA and using "Use best match certificate" with no certificates as their default option.
After upgrade to 1.62, they will found that the waiting time to add / modify / delete domains are much longer then usual, while it's working well in old version with the same settings.
 
Last edited:
Hi @kam,

Thanks for the feedback. Can you confirm which files are "big"? As that would be my best guess as to why it's slow.
The files that could be in play are (with the commands to generate a line count):
Code:
cat /etc/virtual/domains | wc -l
cat /etc/virtual/domainowners | wc -l
cat /etc/virtual/snidomains | wc -l
so we can narrow down what might be slow and why.
I might be able to hunt for optimizations for the given files (I'll check now anyway)

I'll also look into caching the server.crt info, so it's not called over and over :)

John
 
Thanks for the feedback. Can you confirm which files are "big"? As that would be my best guess as to why it's slow.


root@kam:/usr/local/directadmin/custombuild# cat /etc/virtual/domains | wc -l
955
root@kam:/usr/local/directadmin/custombuild# cat /etc/virtual/domainowners | wc -l
954
root@kam:/usr/local/directadmin/custombuild# cat /etc/virtual/snidomains | wc -l
3

------------------------
I fix above slow problem by choosing "self signed certificate" instead of "Use best match certificate"

But then I encounter 100% cpu problem.

full_load.jpg


I found that dataskq will keep looping for all the Alias domains and attempt to obtain the let's encrypt cert.
But indeed I was already choose to use the "self signed certificate" for the parent domain that all Alias domains pointed to. In general this should not be happened. However, it somehow failed to detect self signed certificate settings and keep looping all Alias domains for let's encrypt cert.

In this case, I have no choice but to disable the Automatic SSL Certificate management to get rid with this 100% loading problem.
/usr/local/directadmin/directadmin set admin_ssl_check_retries 0
service directadmin restart
 
Thanks, I've added extra caching (plus stat() check to cache the previous cases that want to ensure they're not using stale info):

but I've not yet pushed it.. until the above is resolved.

For those multiple dataskq calls, can you confirm if the domain aliases already have a request created?
I assume that when moving away from the auto ssl mode, the requests for the pointers (likely subdomains too) are still there?
For the domain that had the retry disabled (changed away from "use best match" to anything else), check it's pointers/subdomains to see if there are retry files still present here:
Code:
/usr/local/directadmin/data/users/USER/domains/*.ssl.next_retry
where that file being present implies the retries will continue on (slowing over time).

For now, I'll fix the code on that assumption, but please confirm anyway to ensure I'm fixing the correct thing :)

John
 
Update: The pre-release binaries are now available with the above change, but I've also added a change such that, if you pick anything other than "Best Match" (aka: old "Shared Server Cert"), it will now clear the .ssl and .ssl.next_retry files for this domain, plus all subdomains/pointers under this domain, so they don't retry.

If you'd like the binaries now, use the pre-release guide:

If that doesn't resolve the issue, let me know and we can dig furture (I might need more info though. If you can create a ticket, that would speed up the process :))

John
 
For those multiple dataskq calls, can you confirm if the domain aliases already have a request created?


I have two parent domains for the user account named "kam",
The parent domain (end with .cc) have 950 Alias domains point to it and it's using self signed certificate.
While another parent domain (end with .tv) is using Let's encrypt certificate.


DA_Alias.jpg


DA_SSL.jpg



I can confirmed that ssl.next_retry are existed for many Alias domains.
root@kam:/usr/local/directadmin/data/users/kam/domains# ls *.ssl.next_retry | wc -l
580


==================================================

If you want to dig further, you can replicate the problem by create two parent domains, let's say domain1.com and domain2.com
Set the domain1.com use Self signed certificate
Set the domain2.com use Let's encrypt certificate
Then create domain3.com to domain999.com As Alias domains and point them to domain1.com.
I think it will be the best way to investigate into this problem.


Kam
 
Last edited:
Thanks for the info. I've added a few areas for improvement:

Pre-release binaries should be done uploading in about 2 minutes.

I'll be pushing 1.62.1 today with this and other fixes, so let me know ASAP if the issue has not been resolved.
If not, please clarify the exact command being used (I now know the "state" to duplicate, thank you)

John
 
Back
Top