Investigating Exim Dumps on FBSD6.x

BigWil

Verified User
Joined
Aug 5, 2004
Messages
313
I have had discussions on other threads in regards to alot of crashes. Having all of our machines at the datacenter in which they reside it is hard to actually grab the panic information as it occurs. Though I was finally able to catch a little info finally in /var/log/messages.

I would like to get some dialog going and maybe we can figure out the cause. I have heard nothing about these panics except for on FreeBSD 6.x machines and then I have had the reports from other FreeBSD 6.x users and from many of our machines. So this is not a fluke but a recurring problem that occurs on various machines with a common problem of Exim on FreeBSD 6.x.

Looking at 6 different dumps that have been received I see some similarities. Any ideas as to what I should do next. Should I take this to Exim or FreeBSD or both? See dumps below.

BigWil

machine1# kgdb kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x5c
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc06d6024
stack pointer = 0x28:0xf5922b10
frame pointer = 0x28:0xf5922b2c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 20950 (exim)
trap number = 12
panic: page fault
Uptime: 2d21h45m21s
Dumping 991 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 991MB (253696 pages) 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) exit
Undefined command: "exit". Try "help".
(kgdb) quit


machine1# kgdb kernel.debug /var/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x5c
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc06d6024
stack pointer = 0x28:0xf588bb10
frame pointer = 0x28:0xf588bb2c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 65479 (exim)
trap number = 12
panic: page fault
Uptime: 20h21m43s
Dumping 991 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 991MB (253696 pages) 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) quit


machine1# kgdb kernel.debug /var/crash/vmcore.2
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:

code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 18992 (exim)
trap number = 12
panic: page fault
Uptime: 2d23h10m58s
Dumping 991 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 991MB (253696 pages) 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) quit


machine1# kgdb kernel.debug /var/crash/vmcore.3
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:

instruction pointer = 0x20:0xc06d6024
stack pointer = 0x28:0xf5792b10
frame pointer = 0x28:0xf5792b2c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 84201 (exim)
trap number = 12
panic: page fault
Uptime: 3d2h40m13s
Dumping 991 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 991MB (253696 pages) 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) quit


machine1# kgdb kernel.debug /var/crash/vmcore.4
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:

panic: page fault
Uptime: 3d15h27m6s
Dumping 991 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 991MB (253696 pages) 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) quit


machine1# kgdb kernel.debug /var/crash/vmcore.5
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:

trap number = 12
panic: page fault
Uptime: 2d14h14m47s
Dumping 991 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 991MB (253696 pages) 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) quit
 
I too have setup FreeBSD boxes under the 6.1 version that have mysteriously crashed and restarted without any reason why. But that debug information is certainly interesting. I cant say that our crashes were attributed to exim and libthread_db.so but its an interesting thought. At the moment we only have one 6.1 boxes in production all the rest are 6.2 and none of those have crashed since going 6.2. But the 6.1 box we do have did have some mysterious and random reboots when we first placed it in production. It would reboot mysteriously every 9 - 12 days. After the 3rd reboot it never happened again and since then the uptime on that server is 287 days.

Instead of looking for a needle in a haystack it may just be in your best interest to rebuild the entire userland. I know for a fact that the DC will install a version of FreeBSD, there are always updates that have come out over the course of time which should be the very first thing you do when you deploy a new FreeBSD box. For instance, we recently deployed 2 new FreeBSD 6.2 boxes. The first thing we did was do a complete source upgrade then recompiled everything including the kernel to bring the userland up to date.

I recommend that you CVSUP to the latest version of 6.1 and compile the source according to the type of CPU installed in your box to optimize the code. Use CPUTYPE=YOUR CPU in make.conf.

Since doing this on every new box we havent had a single reboot. Trying to chase debug code without any conclusive answers is just as frustrating as dealing with a box that crashes. Try it, it may just save you a whole lot of headaches.
 
Pucky,

Unfortunately it does it on our FreeBSD 6.2 boxes also. I upgraded two of them in hopes that some bug in 6.1 had been fixed. No dice. The ONLY reason I used the 6.1 dump info in the example above is because it is the only machine that seems to want to dump properly. The rest don't dump the core and reboot.... they just reboot. But I think that is a whole different problem.

I sure miss FBSD 4.10. I never needed to enable debugging at all as it never crashed and never dumped at all. It just purrrrred.

I did come across a couple of instances out on the net where people asked if Exim was built on this machine or whether it was a binary from elsewhere. If da_exim has an incompatibility I guess I could be seeing it. I will keep this thread posted on any progress.

BigWil
 
Then if its happening on other boxes have you looked at your hardware setup?Are you setting up all these boxes with the same hardware type etc? If you are could it be that something is simply incompatable and causing issues? Also, have you though of ram? Its highly unlikely that all of them have bad memory modules but i know for a fact that sometimes the ram modules can affect just how the box stays online. For eg, if your using 256 ram modules have you tried using 1gb modules single stick? You could try mixing them eg 4 x 256 and 1 x 1gb for example or whatever.

You could also trying compiling exim from ports and doing away with da_exim??.

Also, if your in a position I'd take a good look at your Power supply. It is powerful enough for the server hardware? PSU's that are flakey and or running out of juice are classic examples reboot issues too.

But really, i think its just going down when its reporting exim and possibly a false positive only because thats what was loaded when she went down, but i could be wrong.
 
Last edited:
Back
Top