Server goes down every night.

prale

Verified User
Joined
Nov 21, 2008
Messages
31
Hello I have DA on Debian Etch.
Since 5days in a row the system goes down every night.
Usually just after the time the tally script runs.

I know I have a cronjob every minute to a php script, that checks a mailbox of a support account to create ticketnr's etc.
At first I thought maybe this can be the problem.
I've read something about apache kill process instead of restart on help.directadmin.com.

But anyway the complete server goes down, not just apache.
I can't ping, ssh, ftp, http, really the complete server is not reachable from outside, until I do a reboot from recoveryconsole.

I hope someone can help me finding the problem, the system was perfect running for months, and since 5days ago this problems occour.

I must say that the last 2days, it din't happend, but today it happend again.

Thanks in advance
 
Last edited:
Error.log:

Code:
009:04:05-18:10:02: Socket write error: fd is connected to a pipe or socket whose reading end is closed.  When this  happens the writing process will also receive a SIG_PIPE signal.  (Thus, the write return value is seen only if  the program catches, blocks or ignores this signal.)
2009:04:05-18:10:02: Error reading from xx.xx.xxx.xxx: 
2009:04:05-18:10:07: Socket write error: fd is connected to a pipe or socket whose reading end is closed.  When this  happens the writing process will also receive a SIG_PIPE signal.  (Thus, the write return value is seen only if  the program catches, blocks or ignores this signal.)
2009:04:05-18:10:07: Error reading from xx.xx.xxx.xxx:
2009:04:10-21:24:03: Socket write error: fd is connected to a pipe or socket whose reading end is closed.  When this  happens the writing process will also receive a SIG_PIPE signal.  (Thus, the write return value is seen only if  the program catches, blocks or ignores this signal.)
2009:04:10-21:24:03: Error reading from xx.xx.xxx.xxx:

errortask.log
Code:
009:04:09-12:25:04: Error rereading service proftpd : uid 0 gid 0 : /etc/init.d/proftpd reread                           >>/dev/null 2>>/dev/null

system.log:
Code:
2009:04:09-00:10:05: Tally User xxx Complete
2009:04:09-00:10:05: Tally User xxx Begin
2009:04:09-00:10:06: Tally User xxx Complete
2009:04:09-00:10:06: Tally User xxx Begin
2009:04:09-00:10:06: Tally User xxx Complete
2009:04:09-00:10:06: Tally Reseller xxx Complete
2009:04:09-00:10:06: Tally All Complete
2009:04:09-00:11:04: httpd restarted
2009:04:09-12:25:04: httpd restarted
2009:04:09-12:25:04: named reloaded
2009:04:09-12:25:04: sshd reloaded
2009:04:10-00:10:02: Tally All Begin
2009:04:10-00:10:02: Tally Reseller xxx Begin
2009:04:10-00:10:02: Tally User xxx Begin
2009:04:10-00:10:02: Tally User xxx Complete
2009:04:10-00:10:02: Tally User xxx Begin
2009:04:10-00:10:02: Tally User xxx Complete
2009:04:10-00:10:02: Tally User xxx Begin
2009:04:10-00:10:02: Tally User xxx Complete
2009:04:10-00:10:02: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:04: Tally User xxx Complete
2009:04:10-00:10:04: Tally User xxx Begin
2009:04:10-00:10:05: Tally User xxx Complete
2009:04:10-00:10:05: Tally User xxx Begin
2009:04:10-00:10:05: Tally User xxx Complete
2009:04:10-00:10:05: Tally User xxx Begin
2009:04:10-00:10:05: Tally User xxx Complete
2009:04:10-00:10:05: Tally User xxx Begin
2009:04:10-00:10:06: Tally User xxx Complete
2009:04:10-00:10:06: Tally User xxx Begin
2009:04:10-00:10:06: Tally User xxx Complete
2009:04:10-00:10:06: Tally User xxx Begin
2009:04:10-00:10:06: Tally User xxx Complete
2009:04:10-00:10:06: Tally Reseller xxx Complete
2009:04:10-00:10:06: Tally All Complete
2009:04:10-00:11:04: httpd restarted
 
I found something in messages.log
At all the times the server was down is see this:

Apr 10 20:29:09 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:14 serverxxx kernel: printk: 53 messages suppressed.
Apr 10 20:29:14 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:19 serverxxx kernel: printk: 52 messages suppressed.
Apr 10 20:29:19 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:24 serverxxx kernel: printk: 71 messages suppressed.
Apr 10 20:29:24 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:29 serverxxx kernel: printk: 43 messages suppressed.
Apr 10 20:29:29 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:34 serverxxx kernel: printk: 38 messages suppressed.
Apr 10 20:29:34 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:39 serverxxx kernel: printk: 26 messages suppressed.
Apr 10 20:29:39 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:44 serverxxx kernel: printk: 44 messages suppressed.
Apr 10 20:29:44 serverxxx kernel: ip_conntrack: table full, dropping packet.
Apr 10 20:29:49 serverxxx kernel: printk: 45 messages suppressed.

Not only for today, I can see the same happened a few days back when the server also went down.
 
Ok, /proc/sys/net/ipv4/ip_conntrack_max is set to 65448.
This is default so I guess this should be ok.

But I found this in APF firewall settings:

# This is the maximum number of "sessions" (connection tracking entries) that
# can be handled simultaneously by the firewall in kernel memory. Increasing
# this value too high will simply waste memory - setting it too low may result
# in some or all connections being refused, in particular during denial of
# service attacks.
SYSCTL_CONNTRACK="34576"

I changed it to 65448 now, hopefully this solves my problem.
 
Back
Top