Something is gracefully reloading HTTPD, resulting in error

Wouter0100

Verified User
Joined
Nov 9, 2012
Messages
25
Last months we improved our monitoring and we started to monitor systemd services. As a result of this, we discovered that from time to time - something is gracefully reloading HTTPD at the same time that HTTPD is restarting. This results in an error. I don't know how long this has been going on, could also be started when we added new servers.

For example:
Sep 16 14:51:05 hostname systemd[1]: Stopping The Apache HTTP Server...
Sep 16 14:51:06 hostname systemd[1]: Started /usr/sbin/httpd -k graceful.
Sep 16 14:51:07 hostname httpd[1730168]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80
Sep 16 14:51:07 hostname httpd[1730168]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
Sep 16 14:51:07 hostname httpd[1730168]: no listening sockets available, shutting down
Sep 16 14:51:07 hostname httpd[1730168]: AH00015: Unable to open logs
Sep 16 14:51:07 hostname httpd[1730168]: httpd not running, trying to start
Sep 16 14:51:07 hostname systemd[1]: run-r9269d5650ee14cb18a559d9f75ac83b3.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 14:51:07 hostname systemd[1]: run-r9269d5650ee14cb18a559d9f75ac83b3.service: Failed with result 'exit-code'.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: State 'stop-sigterm' timed out. Killing.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 223181 (httpd) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1347277 (httpd) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1717512 (httpd) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1727902 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728045 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728489 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728542 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728555 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728641 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728651 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728675 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728711 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728787 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1728795 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Killing process 1730179 (lsphp) with signal SIGKILL.
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Main process exited, code=killed, status=9/KILL
Sep 16 14:51:10 hostname systemd[1]: httpd.service: Failed with result 'timeout'.
Sep 16 14:51:10 hostname systemd[1]: Stopped The Apache HTTP Server.

Here you see that Apache is being stopped and directly after, a `/usr/sbin/httpd -k graceful` is executed as a transient systemd service. This last graceful command results in an error - which results in the transient service failing and thus systemd getting in an "unhealthy" or "error" state.

To our current research, the command that is being executed (`/usr/sbin/httpd -k graceful`) is not the `ExecReload=` directive of the httpd service. I tested this theory, but it seems not the case.

It happens from time to time. Last I recorded was:
okt 14 02:17:36 hostname systemd[1]: Stopping The Apache HTTP Server...
okt 14 02:17:40 hostname systemd[1]: Started /usr/sbin/httpd -k graceful.
...
okt 14 02:17:41 hostname httpd[2533510]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80
okt 14 02:17:41 hostname httpd[2533510]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
okt 14 02:17:41 hostname httpd[2533510]: no listening sockets available, shutting down
okt 14 02:17:41 hostname httpd[2533510]: AH00015: Unable to open logs
okt 14 02:17:41 hostname httpd[2533510]: httpd not running, trying to start
okt 14 02:17:41 hostname systemd[1]: run-r515fe7c22ab641e88dfe152e50e3d512.service: Main process exited, code=exited, status=1/FAILURE
okt 14 02:17:41 hostname systemd[1]: run-r515fe7c22ab641e88dfe152e50e3d512.service: Failed with result 'exit-code'.
okt 14 02:17:41 hostname systemd[1]: httpd.service: State 'stop-sigterm' timed out. Killing.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 1116854 (httpd) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 2468867 (httpd) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 2469320 (httpd) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 2531813 (lsphp) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 2532040 (lsphp) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 2532080 (lsphp) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Killing process 2532085 (lsphp) with signal SIGKILL.
okt 14 02:17:41 hostname systemd[1]: httpd.service: Main process exited, code=killed, status=9/KILL
okt 14 02:17:41 hostname systemd[1]: httpd.service: Failed with result 'timeout'.
okt 14 02:17:41 hostname systemd[1]: Stopped The Apache HTTP Server.
The above issue seems not only when stopping HTTPD, but also when starting it.
okt 14 02:17:44 hostname systemd[1]: Started The Apache HTTP Server.
okt 14 02:17:45 hostname systemd[1]: Started /usr/sbin/httpd -k graceful.

Does anyone know what is going on?

Our setup where this occurs is fairly standard: DirectAdmin, CloudLinux and Imunify360. We already e-mailed CloudLinux and Imunify360 asking them if their software is responsible for the above, but they denied it was.
 
"Address already in use"

The investigation at this point should be to find out what, other than Apache, is occupying port 80.
Running something like `netstat -nap | grep 80` should help you work that out.
 
Does anyone know what is going on?

DirectAdmin has its own monitoring system, and if it finds Apache is down, then it tries to restart it. Though I don't have a clue on why your Apache process fails, the reason why is it restarted might be DirectAdmin.

In order to check this version you might review /var/log/directadmin/errortaskq.log for an existence of the following kind of records:

Code:
2022:11:11-19:59:54: service httpd wasn't running, starting httpd

of course the time/date in your case should match time/date from Apache logs.

Then you might check configuration if DirectAdmin for the value of "graceful_restarts".

Code:
[root@server directadmin]# /usr/local/directadmin/directadmin c | grep graceful_restarts
graceful_restarts=1
[root@root@server directadmin]#

If you see =1, then DirectAdmin uses graceful restarts for restarting/reloading a webserver

and check value of "reload_apache_after_rotation":

Code:
[root@server directadmin]# /usr/local/directadmin/directadmin c | grep reload_apache_after_rotation
reload_apache_after_rotation=1
[root@server directadmin]#

Accordingly reloads/restarts DirectAdmin a webserver after a log rotation if reload_apache_after_rotation=1, and any changes are done in Directadmin to PHP/Apache configuration by an user/admin.
 
"Address already in use"

The investigation at this point should be to find out what, other than Apache, is occupying port 80.
Running something like `netstat -nap | grep 80` should help you work that out.
According to the logs it seems apache itself that is still running. For example, Apache is stopping at 14:51:05 and stopped at 14:51:10. But something runs an additional command at 14:51:06, which obviously could return an error as Apache is not fully stopped yet.

Are you using Litespeed server too ?
Only lsphp.

DirectAdmin has its own monitoring system, and if it finds Apache is down, then it tries to restart it. Though I don't have a clue on why your Apache process fails, the reason why is it restarted might be DirectAdmin.

In order to check this version you might review /var/log/directadmin/errortaskq.log for an existence of the following kind of records:

Code:
2022:11:11-19:59:54: service httpd wasn't running, starting httpd

of course the time/date in your case should match time/date from Apache logs.

Then you might check configuration if DirectAdmin for the value of "graceful_restarts".

Code:
[root@server directadmin]# /usr/local/directadmin/directadmin c | grep graceful_restarts
graceful_restarts=1
[root@root@server directadmin]#

If you see =1, then DirectAdmin uses graceful restarts for restarting/reloading a webserver

and check value of "reload_apache_after_rotation":

Code:
[root@server directadmin]# /usr/local/directadmin/directadmin c | grep reload_apache_after_rotation
reload_apache_after_rotation=1
[root@server directadmin]#

Accordingly reloads/restarts DirectAdmin a webserver after a log rotation if reload_apache_after_rotation=1, and any changes are done in Directadmin to PHP/Apache configuration by an user/admin.

Thanks, I searched nearly all log files but was unable to find anything - but in the log file you mentioned I see the following:
2022:10:17-10:33:30: Error gracefuling service httpd : uid 0 gid 0 : /usr/bin/systemctl graceful httpd.service >/dev/null 2>/dev/null : returned 1
This error seems to be returned because graceful is not a thing:
Code:
[root@web001 ~]# /usr/bin/systemctl graceful httpd.service
Unknown operation graceful.

I'm not sure this is directly related to the problem we're seeing, as our other logs don't go back that far - so I'll have to come back if our monitoring picks up a new case.
 
And it happened again:
Nov 22 22:51:45 web001 systemd[1]: Stopping The Apache HTTP Server...
Nov 22 22:51:45 web001 systemd[1]: Started /usr/sbin/httpd -k graceful.
...
Nov 22 22:51:46 web001 httpd[4106904]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80
Nov 22 22:51:46 web001 httpd[4106904]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
Nov 22 22:51:46 web001 httpd[4106904]: no listening sockets available, shutting down
Nov 22 22:51:46 web001 httpd[4106904]: AH00015: Unable to open logs
Nov 22 22:51:46 web001 httpd[4106904]: httpd not running, trying to start
Nov 22 22:51:46 web001 systemd[1]: run-r4094ffd4b11644f8b7d168a68639ef7f.service: Main process exited, code=exited, status=1/FAILURE
Nov 22 22:51:46 web001 systemd[1]: run-r4094ffd4b11644f8b7d168a68639ef7f.service: Failed with result 'exit-code'.

And the /var/log/directadmin/errortaskq.log does not contain anything related to it.

I have for now added `graceful_restarts=0`, to see if that helps.
 
Unfortunately, still no luck. `graceful_restarts=0` is configured.
Jan 19 14:42:39 web001 systemd[1]: Stopping The Apache HTTP Server...
Jan 19 14:42:43 web001 systemd[1]: Started /usr/sbin/httpd -k graceful.
...
Jan 19 14:42:43 web001 httpd[1283428]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80
Jan 19 14:42:43 web001 httpd[1283428]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80
Jan 19 14:42:43 web001 httpd[1283428]: no listening sockets available, shutting down
Jan 19 14:42:43 web001 httpd[1283428]: AH00015: Unable to open logs
Jan 19 14:42:43 web001 httpd[1283428]: httpd not running, trying to start
Jan 19 14:42:43 web001 systemd[1]: run-rd63d7b093c5640479f7eee3c7584388d.service: Main process exited, code=exited, status=1/FAILURE
Jan 19 14:42:43 web001 systemd[1]: run-rd63d7b093c5640479f7eee3c7584388d.service: Failed with result 'exit-code'.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: State 'stop-sigterm' timed out. Killing.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 895241 (httpd) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 895282 (httpd) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 1281286 (httpd) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 1282451 (lsphp) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 1283175 (lsphp) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 1283212 (lsphp) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 1283219 (lsphp) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Killing process 1283252 (lsphp) with signal SIGKILL.
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Main process exited, code=killed, status=9/KILL
Jan 19 14:42:44 web001 systemd[1]: httpd.service: Failed with result 'timeout'.
Jan 19 14:42:44 web001 systemd[1]: Stopped The Apache HTTP Server.
Jan 19 14:42:44 web001 systemd[1]: Starting The Apache HTTP Server...
...
Jan 19 14:42:45 web001 systemd[1]: Started The Apache HTTP Server.

It seems like HTTP is stopped but it takes longer, and that's screwing things up.
 
Have no other ideas, but just general recommendations:

- check monitoring software you might have running on the server
- check system logs for possible clues
- check crontasks for possible custom scripts and services which might effect Apache

So you might dig deeper and do a research on what else might cause it.
 
Back
Top