Partial Server Freezing

bclancey

Verified User
Joined
May 19, 2004
Messages
35
In each of the last two days, we have needed to hard reboot our server because it had become unresponsive. The command line at the monitor revealed it was closing numerous exim and mysql processes because it was out of memory.
The errortaskq log contained entries such as:

2005:01:03-13:34:18: DaTaskQ is already running. Delete the file /var/run/dataskq.pid to reset
2005:01:03-13:34:30: DaTaskQ is already running. Delete the file /var/run/dataskq.pid to reset
2005:01:03-13:34:32: service mysqld wasn't running, starting it
2005:01:03-13:34:37: service mysqld wasn't running, starting it
2005:01:03-13:34:42: DaTaskQ is already running. Delete the file /var/run/dataskq.pid to reset
2005:01:03-13:34:43: service mysqld wasn't running, starting it
2005:01:03-13:34:49: service mysqld wasn't running, starting it
2005:01:03-13:34:49: service mysqld wasn't running, starting it
2005:01:03-13:34:49: service mysqld wasn't running, starting it
2005:01:03-13:39:46: DaTaskQ is already running. Delete the file /var/run/dataskq.pid to reset
2005:01:03-13:40:52: DaTaskQ is already running. Delete the file /var/run/dataskq.pid to reset
. . . etc

I have looked through the messages here and I cannot find a definitive answer to what to do to resolve this problem.
Any suggestions?
 
jlasman said:
Can you post the output of your "top" command?

Jeff

Following is the current output. I am not having a problem at the moment. Hopefully, this provides some insight.

Code:
top - 18:34:22 up 3 days,  4:16,  1 user,  load average: 0.16, 0.24, 0.10
Tasks:  89 total,   2 running,  87 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7% us,  0.7% sy,  0.0% ni, 98.0% id,  0.7% wa,  0.0% hi,  0.0% si
Mem:    517164k total,   411888k used,   105276k free,   148504k buffers
Swap:  1048312k total,     2864k used,  1045448k free,   154892k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15521 apache    15   0 20576  13m  16m S  0.3  2.6   0:08.79 httpd
15836 apache    15   0 20996  13m  16m S  0.3  2.7   0:08.34 httpd
28131 statpub   16   0  2784  912 1620 R  0.3  0.2   0:00.03 top
    1 root      16   0  1928  360 1316 S  0.0  0.1   0:06.00 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0
    3 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 events/0
    4 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 kblockd/0
    6 root      15 -10     0    0    0 S  0.0  0.0   0:00.00 khelper
    5 root      15   0     0    0    0 S  0.0  0.0   0:00.00 khubd
    8 root      15   0     0    0    0 S  0.0  0.0   0:01.92 pdflush
   10 root      15 -10     0    0    0 S  0.0  0.0   0:00.00 aio/0
    9 root      15   0     0    0    0 S  0.0  0.0   0:03.50 kswapd0
  118 root      19   0     0    0    0 S  0.0  0.0   0:00.00 kseriod
  154 root      15   0     0    0    0 S  0.0  0.0   0:10.05 kjournald
  972 root      15   0     0    0    0 S  0.0  0.0   0:00.00 kjournald
  973 root      15   0     0    0    0 S  0.0  0.0   0:05.02 kjournald
 1396 root      15   0  2896  600 1600 S  0.0  0.1   0:00.03 dhclient
 1457 root      16   0  3432  496 1296 S  0.0  0.1   0:06.07 syslogd
 1461 root      16   0  2164  388 1244 S  0.0  0.1   0:00.00 klogd
 1487 rpc       15   0  2456  416 1372 S  0.0  0.1   0:00.00 portmap
 1506 rpcuser   19   0  2312  472 1380 S  0.0  0.1   0:00.00 rpc.statd
 1531 root      16   0  2136  452 1296 S  0.0  0.1   0:00.01 rpc.idmapd
 1650 root      16   0  4696  772 3424 S  0.0  0.1   0:01.34 sshd
 1663 root      16   0  2396  488 1684 S  0.0  0.1   0:00.00 xinetd
 1672 root      16   0   704  240  512 S  0.0  0.0   0:00.16 da-popb4smtp
 1681 nobody    15   0  4304   28 3868 S  0.0  0.0   0:00.45 directadmin
 1695 mail      16   0  7976 1224 6232 S  0.0  0.2   0:09.76 exim
 1705 root      17   0  3076  420 1444 S  0.0  0.1   0:00.00 gpm
 1769 root      16   0  1676  436 1364 S  0.0  0.1   0:01.29 vm-pop3d
 1789 root      16   0  2884  504 1356 S  0.0  0.1   0:00.31 crond
 1842 daemon    16   0  2440  484 1348 S  0.0  0.1   0:00.00 atd
 1857 root      16   0  2672  256 1292 S  0.0  0.0   0:00.00 mdadm
 1872 root      18   0  2344  252 1232 S  0.0  0.0   0:00.00 mingetty
 1873 root      18   0  1828  252 1232 S  0.0  0.0   0:00.00 mingetty
 1874 root      18   0  2348  252 1232 S  0.0  0.0   0:00.00 mingetty
 1875 root      18   0  1636  252 1232 S  0.0  0.0   0:00.00 mingetty
 1876 root      18   0  3124  252 1232 S  0.0  0.0   0:00.00 mingetty
 1877 root      18   0  2636  252 1232 S  0.0  0.0   0:00.00 mingetty
 2058 named     19   0 37260 1096 4760 S  0.0  0.2   0:00.00 named
 8211 nobody    16   0  4304   28 3868 S  0.0  0.0   0:00.00 directadmin
 8212 nobody    16   0  4304   28 3868 S  0.0  0.0   0:00.00 directadmin
 8213 nobody    16   0  4304   28 3868 S  0.0  0.0   0:00.00 directadmin
 8216 nobody    16   0  4304   28 3868 S  0.0  0.0   0:00.00 directadmin
 8217 nobody    15   0  4304   28 3868 S  0.0  0.0   0:00.00 directadmin
14074 root      15   0     0    0    0 S  0.0  0.0   0:03.38 pdflush
 7085 root      16   0 20016  11m  16m S  0.0  2.4   0:00.17 httpd
12466 apache    15   0 21096  13m  16m S  0.0  2.7   0:13.01 httpd
13494 apache    16   0 21088  13m  16m S  0.0  2.7   0:09.31 httpd
13496 apache    15   0 21004  13m  16m S  0.0  2.7   0:13.81 httpd
15399 apache    15   0 20544  13m  16m S  0.0  2.6   0:11.92 httpd
 
Last edited:
It would be much easier to read if you post it so it'll appear as monospaced text.

But no, without a review of top while the problem is occurring it's impossible to even guess.

:(

Jeff
 
I can appreciate that.
Unfortunately, when the problem occurs, I cannot access the machine via ssh or via the console -- by plugging a monitor and a keyboard into the actual computer. When doing that, I notice the OS complaints about shutting down Exim and MySql processes because of a lack of memory.
I cannot access the website on the machine and I cannot exchange email. The machine is -- from my perspective -- completely frozen. The only way to reboot is to power the machine down -- though I do a few Ctrl-Alt-Dels in the hopes it comvinces the OS to start shutting down.
That makes it impossible to run top or any other diagnostic tool.

We run a separate machine which does not have DA installed and we do not encounter the same problem. However, that machine is not as busy.

I am completely up to date on DA patches -- though I have not done any interim patches on the installed modules.

The problem occured Monday and Tuesday and has not surfaced again since that time. It had occured from time to time in the previous few weeks.

I have reviewed the system logs and can find no more clues than those posted in the original message about what is happening around the time of the problem.

Maybe I will see something in the future or run across someone who had a similar problem and fixed it.
 
I have not had the problem I reported at the start of this message.
But, I have been watching closely and noticed that the number of instances of mysql reported by the "top" command increases each day.
Secondly, I noticed that "netstat -an" revealed the instances of mysql.sock running and listening for connections was also increasing on a daily basis.
Is this normal behavior? We do not have a lot of scripts accessing mysql because we use an inhouse database system for content. However, our website is relatively busy.

I manually restarted /etc/rc.d/init.d/mysqld and this brought the number of instances of mysql down.
 
My server froze again overnight. Since the machine is frozen, I cannot run top. I did a hard reboot by turning the power on and off.

Upon restart, the DA system logs showed it was constantly restarting mysql during the night.
I am trying a simple fix to scripts accessing mysql. Instead of using mysql_pconnect, I will use mysql_connect and close the connection when the script exits. Hopefully, this will solve the problem.
 
Back
Top