Fatal trap 12: page fault while in kernel mode

patrik · Mar 1, 2007

Yesterday one of our server crashed. I logged into DRAC console and rebooted the machine. A few minutes it crashed again so I rebooted once more. Then it crashed again a few minutes later but this time I managed to get a screenshot of the terminal.

This is what it said:

Code:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 06
fault virtual address	= 0x4
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc05c6094
stack pointer		= 0x28:0xea6b98cc
frame pointer		= 0x28:0xea6b98d0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1269 (da-popb4smtp)
trap number		= 12
panic: page fault
cpuid = 2
Uptime: 2m5s
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort

After next reboot I instantly stopped the da-popb4smtp process and since then it has run smoothly. I'm not 100% sure that da-popb4smtp has anything to do with this, though. It could be the case that the page fault doesn't occur anymore and when I managed to get a screenshot it happend to be da-popb4smtp which was the current process.
I did update DirectAdmin to 1.292 (from 1.28 I think) earlier that day (14.57). The server first crashed somewhere around 15:50.

It is running FreeBSD 6.1-STABLE.
What could be the cause of the problem? Bad memory?
Is there any point of "defining a dump device"? And how is that done?

pucky · Mar 1, 2007

Have you searched Google? http://www.google.com/search?hl=en&q=Fatal+trap+12:+page+fault+while+in+kernel+mode+freebsd

I know we had a problem with the box rebooted every 14days at one stage and then it mysteriously stopped. That was 85 days ago and since then it never happened again. First i quested the NOC about them rebooting the box and they denied it. Then it happened about 3 more times at 10 and 20 day uptime interval with no evidenance as to why. But it has not happend in the last 85 days. Maybe it was something to a related update that fixed a bug or serious something along the way but it was impossible to find. FreeBSD 6.1 here also. Thankly we have not seen the above problems.

Your problem on the other hand could be related to hardware i feel. Could be memory and its a place to start. Doesnt hurt to get your memory modules replaced then go from there. No point in trying to get them checked as sometimes those results are completely unreliable.

patrik · Mar 1, 2007

Yes, I have searched on google but thought I would get better answers in a discussion here. It's easy to change memory modules so I guess that could be a start.

nobaloney · Mar 1, 2007

We remove two 512 sticks from one of our server and replaced it with to 1G sticks. It started failing within 24 hours of starting every time.

We pulled those two sticks and installed a different set of two 1G sticks. Works for a week already.

Jeff

BigWil · Jun 12, 2007

Jeff,

Hey did your kernel panic ever come back after replacing those sticks?

We're still getting these kernel trap 12 panics. These machines with same sticks ran fine back in the 4.10 days but every since going to 6.1 and 6.2 these panics happen and their isn't the slightest log as to the reason why.

I see many others out on the web having the same issues and not a word out of the FreeBSD groups and core as to why it could be happening or how to fix it.

BigWil

nobaloney · Jun 12, 2007

Nope on our (CentOS) servers it was the memory.

I don't know enough about FreeBSD to even guess.

May I ask who your avatar is? (Mine is me ... no really it's Einstein but when I let my hair grow out it looks like me

.)

Jeff

BigWil · Jun 13, 2007

Oh I don't remember. Went through a collection of avatars a long time ago and said "hey I used to be that hot before age caught up to me" and presto I have had it since.

BigWil

patrik · Jun 13, 2007

We haven't replaced our memory sticks (yet). The system is working just fine without pop4smtp started.

BigWil · Jun 13, 2007

I don't think, at least in our case, that it is popb4smtp because we stopped using that about 2 years ago. We use authentication only.

Back in the days of 4.10 we had never even heard of a Fatal trap 12. Heck we never had a kernel panic at all back then. I remember the glorious uptime messages of 100 or more days. But now that we have gone to 6.x and we scan our own spam we are luck on most machines if we see a week. The good news is that most of the machines dump and restart. Though we have one supermicro in partiular that is stubborn and requires a manual power cycle. The guys at the datacenter know her by name at this point.

BigWil

patrik · Jun 13, 2007

You're happy if your 6.2 servers runs for a week? Heck, sounds like you're having some serious issues. We run several 6.x servers and we aren't experiencing much problems at all actually. The panic I talked about in this thread is the first panic we've got and that has been solved by now.

nobaloney · Jun 17, 2007

BigWil said:
I remember the glorious uptime messages of 100 or more days.

And I remember uptime on BSD-OS (a commercial version of BSD) and also on early slackware linux distributions, of over a year, so it's all relative.

One of our CentOS servers today shows an uptime of 162 days. Another shows 113 days. Heck, my linux desktop has an uptime of 43 days, though X crashed last week and I had to re-login.

Generally the problems requiring you reboot are memory usage related; often swapfile related. You may just need more memory.

But now that we have gone to 6.x and we scan our own spam we are luck on most machines if we see a week.

If you're scanning all spam, and not using blocklists first to get rid of 70% of it, then yes, you're going to have a memory problem; remember that SpamAssassin is a perl script and is not the best at memory management.

The good news is that most of the machines dump and restart.

And is the bad news that it has to do that

?

Though we have one supermicro in partiular that is stubborn and requires a manual power cycle. The guys at the datacenter know her by name at this point.

Scary to me. One of our routers is a SuperMicro system running one of the BSDs (sorry, don't know which one or which version) and it's only required a reboot once in three years; that time it failed because of a power outage. I'd look into the memory issue if I were you.

Jeff

BigWil · Jun 18, 2007

Generally the problems requiring you reboot are memory usage related; often swapfile related. You may just need more memory.

Dual Xeon, 2GB RAM, 4GB SWAP. Not so sure that more is needed, but possibly better use of it could at least help avoid the problem. See next.

If you're scanning all spam, and not using blocklists first to get rid of 70% of it, then yes, you're going to have a memory problem; remember that SpamAssassin is a perl script and is not the best at memory management.

Yes this is my suspicion. Either SpamAssasin or Clam as the problem didn't occur on any of the machines until we were put in the situation where we needed to run them full time on the servers. We used to pass everything through Postini which kept everything pretty clean before it got to the machines.

So which blocklists are you recommending? We already run RBL. Are you talking about the /etc/virtual blocklists specific to hosts? Do you have a list of well known spam relays? Care to share?

And is the bad news that it has to do that ?

Yah that would be the bad news as well.

BigWil

elvandar · Jun 19, 2007

Please refer to http://www.freebsd.org/doc/en/books/developers-handbook on how to obtain kernel crash information for the dump you mentioned, then send-pr the information towards the freebsd-bugs team (http://www.freebsd.org/send-pr.html) with an abstract of the information obtained with the kernel dump and a location where we (FreeBSD team) Can download the dump if needed.

Only that way you can see what is going on, the information you just presented is just worthless for investigation (Sorry to put it this hard).

Regards,
Remko
FreeBSD.org

nobaloney · Jun 21, 2007

BigWil, we find that if we run SpamBlocker on all domains (see /etc/virtual/use_rbl_domains) it cuts down enough on the email coming into the box so SpamAssassin works well with the rest.

Jeff

BigWil · Jun 22, 2007

Jeff,

I've been doing that since day one and wouldn't have it any other way:
lrwxr-xr-x 1 root mail 7 May 3 2006 use_rbl_domains -> domains

However we still get TONS of mail and all of it that makes it through gets scanned for virus using clamd and spamassasin. Very heavy footprint but I guess that is all that can be done at this point.

Alot of the traffic that I do notice while I am watching is out of China, Korea, Argentina, Russia and Indonesia and most all of it is spam. RBL catches most and SA catches the rest but at the cost of alot of resources.

Well I got a cold one waiting for me.... I haven't had a break in days.

Cheers,

BigWil

nobaloney · Jun 23, 2007

BigWil,

Have you updated to SpamBlocker3? The way it calls ClamAV is supposed to use less resources than the methods posted here for earlier versions of the exim.conf file.

It's till beta, and it's not perfect. But it may work for you.

Jeff

BigWil · Jun 23, 2007

Jeff,

We have been using Spamblocker3 for a very long time now. Unless you added something and I was unaware:

# uncomment to define AntiVirus scanner here:
av_scanner = clamd:/var/run/clamav/clamd

BigWil

nobaloney · Jun 23, 2007

No further answer today, then

.

Jeff

BigWil · Jun 23, 2007

Jeff,

Though I appreciate your help I think the only answer is that the level of spam our domains receive is so high that it is kicking our mem and vmem butts because of some deficiency in FBSD 6.x. Not much we can do but to reboot a machine once in awhile.

Oh and to answer elvandar which I forgot to do until now.... The issue and dumps have been submitted to FBSD a couple of times now. Their only answer is that it has to be an issue with hardware. Hardware on 15 different machines all which worked with 4.x perfectly until some but all coincidently went bad when we made the 6.x upgrades. ;-)

You both enjoy your days..... especially you Jeff you need a break.

BigWil

nobaloney · Jun 25, 2007

I've decided I'm going to take a vacation this year, but I don't know when yet.

I need it

.

Jeff

Fatal trap 12: page fault while in kernel mode

Verified User

Verified User

Verified User

NoBaloney Internet Svcs - In Memoriam †

Verified User

NoBaloney Internet Svcs - In Memoriam †

Verified User

Verified User

Verified User

Verified User

NoBaloney Internet Svcs - In Memoriam †

Verified User

Verified User

NoBaloney Internet Svcs - In Memoriam †

Verified User

NoBaloney Internet Svcs - In Memoriam †

Verified User

NoBaloney Internet Svcs - In Memoriam †

Verified User

NoBaloney Internet Svcs - In Memoriam †