Connecting to sock failed 11: Resource temporarily unavailable

vancanneyt

Verified User
Joined
Dec 13, 2012
Messages
92
Today I started seeing continues error in a domains Nginx error log:
Code:
2020/02/05 18:50:59 [error] 5227#0: *8848958 connect() to unix:/usr/local/php73/sockets/domain.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 116.2.77.205, server: domain.com, request: "GET /valid/link HTTP/1.1", upstream: "fastcgi://unix:/usr/local/php73/sockets/domain.sock:", host: "www.domain.com", referrer: "https://www.domain.com/valid/link"

Checked the PHP-fpm log but that doesn't show anything interesting except:
Code:
[05-Feb-2020 19:15:00] NOTICE: [pool domain] child 9953 exited with code 0 after 4541.393172 seconds from start

When those errors are logged, the access log shows 502 HTTP response as a result
Code:
113.58.235.13 - - [05/Feb/2020:19:18:26 +0100] "GET /valid/link" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

Anyone an idea what's happening ?
 
Likely a segfault from a 3rd party PHP extension. Any additional PHP extensions loaded?
 
Updated the memcached extension to latest version as a start.
I do suspect something else, during the time it occurred CPU usage and load was above normal and it didn't happen on other pages of that domain that also used memcached. So if php would segfault it should have reflected on all pages which wasn't the case. Further I discovered more invalid traffic that gave me suspicion of being an attack to a specific part of the domain that caused the load to rise but still doesn't explain why the error occurred. The load and traffic returned to normal a few hours after I posted the topic without restarting or disabling the extension and the errors didn't show up anymore in the logs.
 
When Linux (and some other operating systems) are under load the "Resource temporarily unavailable" error is more likely to occur. In my programs that get errno = 11 (EAGAIN/Resouse temporarily not available), I just keep retrying unless a different error was returned, it was successful or it hits a retry limit. It is quite possible that one of the extensions does not have similar retry logic and instead does a hard fail. In the case of the connect(), it would get this error when the recv queue is full (probably due to the high load preventing it from processing the queue fast enough).
 
the memcached hasn't reached it's maxcon setting and the number of simultaneous connected ip's to the server was low, the server has handled ten times more connections last week and didn't have any of these issues. So how can it be with less requests/second it all went to fail?

update: looks like it was a hardware issue that caused the issue
 
Last edited:
Back
Top