Let's Encrypt stopped renewing for all domains

krisiskris

Verified User
Joined
Jan 2, 2019
Messages
24
We've been running DirectAdmin with Let's Encrypt on one specific server for years, without any issues. But all of a sudden, it's not possible to create or renew SSL certificates for any of the domains on the server.
The error we're getting for all domains is more or less:
my.domain was skipped due to unreachable http://my.domain/.well-known/acme-challenge/letsencrypt_cbc569309e0eaaecea80f917e070bc81 file.
www.my.domain was skipped due to unreachable http://www.my.domain/.well-known/acme-challenge/letsencrypt_0c3667bb8c9b20a82d401c58a038ffde file.
No domains pointing to this server to generate the certificate for.
As this looks like DNS issues at first glance, we double-checked all A and AAAA records but these all point to the correct server. This is also the case for all domains running into the same issues. We also changed the letsencrypt.sh script to use 1.1.1.1 as DNS, instead of Googles default DNS server, with no difference in outcome. The domains also all have a CAA record permitting Let's Encrypt certificate issuing.
We've tried forcing the renewal via IPv4, as I've read IPv6 can cause issues, also without any luck.

To test if the users' public_html/.well-known/ directories are accessible, we created a basic test file. These are all working fine; its contents are readable from remote web browsers.
We did notice when creating the subdirectory public_html/.well-known/acme-challenge and placing the test file in there, it wasn't accessible remotely with a 404-error. This is both the case when accessing via http as https. Not sure if this is normal behaviour though.

We've compared situations with a similar server running the same OS and DirectAdmin versions, but can't seem to find any reason why that server is working correctly, compared to the one we're having issues with.

edit:
DirectAdmin systemlog shows the following error:
2024:08:26-13:22:44: LetsEncrypt(164759): /usr/local/directadmin/scripts/letsencrypt.sh request 'my.domain' secp384r1 /usr/local/directadmin/data/users/user/domains/my.domain.ssltmpVD1g4v
2024:08:26-13:22:50: LetsEncrypt(164759): exit code: 1 for domain='my.domain'
2024:08:26-13:23:48: httpd reloaded

Any help on this would be greatly appreciated. Thanks in advance,
Kris
 
Last edited:
first thing trying stop the firewall and re-test again.

if still not work, dig dns from inside the server.
Code:
dig my.domain.com @8.8.8.8

curl --dns-servers 8.8.8.8 http://my.domain.com
 
Firewall was stopped with no difference in result.
Dig returns correct results from multiple tested locations and using a multitude of DNS servers for queries.
 
We did notice when creating the subdirectory public_html/.well-known/acme-challenge
You should not create that directory yourself. I would suggest to remove that what you created in there.

Have a look here, you need to use a curl command to test and only put the test file in the /var/www/html/.well-known/acme-challenge/ directory.
 
The directory was only created as a test and was promptly removed afterward. This was only done in a single user account, the problem affects hundreds of users/domains.
I went through the entire troubleshooting guide earlier, without any success.

One thing I did notice is that running:
echo "test" >> /var/www/html/.well-known/acme-challenge/test.txt
And then visiting the site with URL:
I actually get a permanent redirect error due to the forced https function in the domain options. Changing the CURL url to use https, it returns perfectly fine results:
HTTP/2 200
last-modified: Mon, 26 Aug 2024 15:37:49 GMT
etag: "a-62097e84a7091"
accept-ranges: bytes
content-length: 10
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
vary: User-Agent
content-type: text/plain
date: Mon, 26 Aug 2024 15:38:23 GMT
server: Apache/2
Does this mean the Let's Encrypt renewal script could be running into an issue between http and https? Is there a way to force the renewal via https?

Another thing that comes to mind is, we recently upgraded the server OS from CentOS 7 to Almalinux 8.10 due to CentOS EOL. Could this be causing issues? Everything else is working perfectly fine on the server.
 
I'm not sure, I never use force https redirect from directadmin panel. Since .htaccess or built in website function working better than option from DA provide to us.

.... maybe if you have https redirect via .htaccess or your own website function and use option from DA in the same time, it could be redirect loop hole.
 
I just tried creating a new SSL for a domain that doesn't have the https redirect enabled. Running the curl against the http url gives the correct response, but I'm still not able to create or renew a certificate for that domain. So that doesn't seem to be the cause of the issue, either.
 
Does this mean the Let's Encrypt renewal script could be running into an issue between http and https?
No I don't think so as I also have lots of domains using the forced https redirect which also renew without issues.

It's odd indeed. Are you using plain Apache or something like nGinx or OLS?
Must be something odd, because it's happening suddenly with all your domains as you say and not only with one.

You cuold try a da build all d command and also the permisions command:
Code:
cd /usr/local/directadmin/scripts
./set_permissions.sh all
 
As far as I know it's just running Apache, nothing has changed to these settings.
1724745891428.png
I ran the da build all d command, which took a while but seemed to complete without any issues. Unfortunately, it made no difference to the certificate issues; still getting the same results.
 
To test if the issue it IPv6 related, I temporarily disabled IPv6 entirely on the server and ran the renew command again.
This was the output:
/usr/local/directadmin/scripts/letsencrypt.sh renew my.domain
/usr/local/directadmin/scripts/letsencrypt.sh: line 84: True: command not found
/usr/local/directadmin/scripts/letsencrypt.sh: line 84: True: command not found
/usr/local/directadmin/scripts/letsencrypt.sh: line 84: True: command not found
/usr/local/directadmin/scripts/letsencrypt.sh: line 84: True: command not found
2024/08/27 11:38:57 [INFO] [my.domain, www.my.domain] acme: Obtaining SAN certificate
2024/08/27 11:38:58 [INFO] [my.domain] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/395734064496
2024/08/27 11:38:58 [INFO] [www.my.domain] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/395734064506
2024/08/27 11:38:58 [INFO] [my.domain] acme: Could not find solver for: tls-alpn-01
2024/08/27 11:38:58 [INFO] [my.domain] acme: use http-01 solver
2024/08/27 11:38:58 [INFO] [www.my.domain] acme: Could not find solver for: tls-alpn-01
2024/08/27 11:38:58 [INFO] [www.my.domain] acme: use http-01 solver
2024/08/27 11:38:58 [INFO] [my.domain] acme: Trying to solve HTTP-01
2024/08/27 11:39:12 [INFO] [www.my.domain] acme: Trying to solve HTTP-01
2024/08/27 11:39:30 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/395734064496
2024/08/27 11:39:30 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/395734064506
2024/08/27 11:39:31 Could not obtain certificates:
error: one or more domains had a problem:
[my.domain] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 136.144.234.99: Fetching https://my.domain/.well-known/acme-challenge/ezCEOUohdqdMcDBfPiDQZVcwx7_OENhSgxXaZQe6eXI: Error getting validation data
[www.my.domain] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 136.144.234.99: Fetching https://www.my.domain/.well-known/acme-challenge/eGGxNsBzbpwyRewwrTQH4GUd49r9_TqcP1RjPExBgjI: Error getting validation data
Failed to issue new certificate
The error regarding line 84 is related to IPv6 and can be ignored in this case, I think.
Although there's slightly more output than previously, the outcome seems to be more or less the same.
 
Fixed!

So, okay, this was a tricky one. After disabling the IPv6 as mentioned above, I dug further down into the IPv6 settings to see if all was correct.
First thing I noticed was errors in the DirectAdmin logs:
2024:08:27-10:03:27: ioctl can't find the server's ip address for eth0.
Running ifconfig from the shell CLI, I noticed the name of the interface was inconsistent, as it should be called ens3, not eth0. This is probably a leftover issue from the migration from CentOS 7 to Almalinux 8. Going into the DirectAdmin's IP management, I noticed the interface was named ens3 correctly there, but the IPv6 was completely different to what it's supposed to be. I tried adding the correct IPv6 but got an error that the IP was already registered (but not visible, anywhere).

Back to the shell CLI, I went through the nmcli settings and noticed the interface was actually called eth0 there and linked to some type of dummy interface named ens3. I decided to get rid of the ens3 interface entirely and just stick with the eth0 naming.
To do this, I had to edit the /etc/default/grub file and edit the GRUB_CMDLINE_LINUX entry to include "net.ifnames=0 biosdevname=0", then regenerate the bootloader and eventually reboot the server.

This fixed the naming issues for the interface, but the IPv6 address was still incorrect. So from here I first deleted all IPv6 information from DirectAdmin's IP management. Then went back into the shell, corrected the IPv6 address on eth0 and added the correct gateway address.
After testing the IPv6 connectivity with ping6 and ip -6 route, I added the correct IP into DirectAdmin, linked it to the IPv4 address and made sure the correct addresses were activated on the devices tab.

To test Let's Encrypt, I requested a renewal for one of the domains, and it now works like a charm:
2024/08/27 13:54:38 [INFO] [my.domain, www.my.domain] acme: Validations succeeded; requesting certificates
2024/08/27 13:54:39 [INFO] [my.domain] Server responded with a certificate for the preferred certificate chains "ISRG Root X1".
Certificate for my.domain,www.my.domain has been created successfully!

As this is a very specific case, where the cause of the issues was related to a Linux distro change and faulty IPv6 information, I doubt it will be of much use to anyone facing similar Let's Encrypt issues. But you never know, I might help someone else out in the future.

Thanks for the support in this case,
Kris
 
ioctl can't find the server's ip address for eth0.

This is an leftover from migration for sure, I had with one of our servers too
I have fixed different but solution is the same, its because of the rename of interface
Thanks for coming back an share the outcome/fix
 
Back
Top