Rsyslogd and Journald

jonium · Dec 20, 2023

Hello,
can you confirm that rsyslogd is mandatory in order to correctly make csf running?
I have some centos box running both Rsyslogd and Journald and sometimes rsyslogd uses 100% of cpu and the server go down...

DircetAdmin latest versions
Custombuild 2

Richard G · Dec 20, 2023

The rsyslogd is part of the OS. It is mandatory for the system to be able to generate the system logs and write to them.
Not only CSF uses it, everything is using it.

It's best to investigate as to why it starts using 100% of the cpu. Check your logs if there are massive log entry's in the system logs or something else in some logfile is giving a clue.

Maybe others can also give good tips for this. Maybe @Zhenyapan because he has a lot of servers, maybe he has good idea's.

Ohm J · Dec 20, 2023

Maybe you got ddos on your bind9(named) service ?

jonium · Jan 3, 2024

jamgames2 said:
Maybe you got ddos on your bind9(named) service ?

Hello jamgames2,
why do you suspect that?
Is that the most probably service that cause that effect?

Ohm J · Jan 3, 2024

If you not silence/disable the logs of some channel, it could fill up the logs system.

From my experience, I just suspect the bind9 is doing bad. For better ways, just hire someone to check directly for you.

jonium · Jan 3, 2024

Here are some recent Bind logs:

Code:

Jan 03 17:58:05 myserver.xxx named[3833]: limit responses to 66.249.93.0/24 for www.domain2.it IN A  (0b0f904b)
Jan 03 17:58:05 myserver.xxx named[3833]: client @0x7f57a41bbfd0 66.249.93.160#60012 (www.domain2.it): rate limit slip response to 66.249.93.0/24 for www.domain2.it IN A  (0b0f904b)
Jan 03 17:58:05 myserver.xxx named[3833]: client @0x7f57a41bbfd0 66.249.93.14#36571 (www.domain2.it): rate limit drop response to 66.249.93.0/24 for www.domain2.it IN A  (0b0f904b)
Jan 03 17:59:05 myserver.xxx named[3833]: stop limiting responses to 66.249.93.0/24 for www.domain2.it IN A  (0b0f904b)
Jan 03 18:19:45 myserver.xxx named[3833]: limit responses to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a41908f0 51.178.111.233#17725 (www.domain1.fr): rate limit slip response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a41908f0 51.178.111.240#21270 (www.domain1.fr): rate limit drop response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a4182150 51.178.111.233#47406 (www.domain1.fr): rate limit slip response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a4182150 51.178.111.225#33627 (www.domain1.fr): rate limit drop response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a4182150 51.178.111.237#53646 (www.domain1.fr): rate limit slip response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a4182150 51.178.111.240#52638 (www.domain1.fr): rate limit drop response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a4182150 51.178.111.225#37799 (www.domain1.fr): rate limit slip response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)
Jan 03 18:19:45 myserver.xxx named[3833]: client @0x7f57a4212d90 51.178.111.240#50440 (www.domain1.fr): rate limit drop response to 51.178.111.0/24 for www.domain1.fr IN A  (a0ed7875)

jonium · Jan 3, 2024

Code:

named[3833]: increase from 500 to 750 RRL entries with 503 bins; average search length 2.0
named[3833]: client @0x7f57a4212d90 213.219.38.223#43752 (tripadvisor.com): query (cache) 'tripadvisor.com/A/IN' denied
named[3833]: client @0x7f57a41bbfd0 213.219.38.223#27749 (uber.com): query (cache) 'uber.com/A/IN' denied

obviously tripadvisor.com and uber.com aren't in hosting at my server...

Ohm J · Jan 3, 2024

idk, but that's could be a reason.

trying disable ratelimit logs and limits other logs into 20m size.

put it at the end of file, or relate code.
#named.conf

Code:

logging {

category rate_limiting_log  { null; };
        
channel default_log {
          file "/var/named/log/default" versions 3 size 20m;
          print-time yes;
          print-category yes;
          print-severity yes;
          severity info;
     };
};

"/var/named/log/default" About this location, please matching with your server config. Between RHEL and Debian have diference store location.

jonium · Jan 3, 2024

is that the right path?
/var/named/data

Also found:

Code:

named[3833]: client @0x7f57a41739b0 199.43.206.245#43085 (sl): query (cache) 'sl/ANY/IN' denied

Ohm J · Jan 3, 2024

yes, maybe you need to manual create the logs folder.

That's how ddos/bot doing. They don't care what's will happening, just spamming to your server while they could.

jonium · Jan 5, 2024

got the following error restarting named:

/etc/named.conf:65: undefined category: 'rate_limiting_log'

jonium · Jan 5, 2024

the category name is rate_limit.
I retried and now seems to work, it created the /var/named/log/default but it seems not to populate it

jonium · Jan 5, 2024

so, now I have no logs for Bind service?

Ohm J · Jan 5, 2024

Normally all logs should be on "default" channel.
Maybe you need to change owner of that folder "log" into "named:named", Basic I turn on the log when need to debug something and turnoff when finish for prevent any weird bot/ddos.

jonium · Jan 5, 2024

I already changed the owner.
Ok, I imagined it, just need a confirm, Thank you.
I'll monitor the server in the next days and hope the problems is fixed.

greetings

jonium · Jan 12, 2024

the server seems to be more stable anyway sometimes it reboots.
After last umpteenth reboots I found the following log:

Code:

Jan 12 15:08:43 localhost kernel: Switched APIC routing to cluster x2apic.
Jan 12 15:08:43 localhost kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
Jan 12 15:08:43 localhost kernel: smpboot: CPU0: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz (fam: 06, model: 9e, stepping: 0d)
Jan 12 15:08:43 localhost kernel: TSC deadline timer enabled
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff91e0553b MISC ffffffff91e0553b
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: PROCESSOR 0:906ed TIME 1705068520 SOCKET 0 APIC 0 microcode f0
Jan 12 15:08:43 localhost kernel: Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
Jan 12 15:08:43 localhost kernel: ... version:                4
Jan 12 15:08:43 localhost kernel: ... bit width:              48
Jan 12 15:08:43 localhost kernel: ... generic registers:      4
Jan 12 15:08:43 localhost kernel: ... value mask:             0000ffffffffffff
Jan 12 15:08:43 localhost kernel: ... max period:             00007fffffffffff
Jan 12 15:08:43 localhost kernel: ... fixed-purpose events:   3
Jan 12 15:08:43 localhost kernel: ... event mask:             000000070000000f
Jan 12 15:08:43 localhost kernel: NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: Machine check events logged
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 3: be00000000800400
Jan 12 15:08:43 localhost kernel: smpboot: Booting Node   0, Processors  #1 #2 #3 #4
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: TSC 0
Jan 12 15:08:43 localhost kernel: ADDR ffffffff91e0553b MISC ffffffff91e0553b
Jan 12 15:08:43 localhost kernel: mce: [Hardware Error]: PROCESSOR 0:906ed TIME 1705068520 SOCKET 0 APIC 6 microcode f0
Jan 12 15:08:43 localhost kernel:  #5 #6 #7 #8
Jan 12 15:08:43 localhost kernel: TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
Jan 12 15:08:43 localhost kernel: MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
Jan 12 15:08:43 localhost kernel:  #9 #10 #11 #12 #13 #14 #15 OK

and now I'm reading that pages (

TAA - TSX Asynchronous Abort — The Linux Kernel documentation

and
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html )

Has anyone else ever had this problem ?

Ohm J · Jan 12, 2024

look like something wrong with your CPU.

Since I never work with pure server. Mostly I work with VPS like Xen, Proxmox. so it hard to answer this if there have problem with kernel or hardware.

jonium · Jan 29, 2024

jamgames2 said:
idk, but that's could be a reason.

trying disable ratelimit logs and limits other logs into 20m size.

put it at the end of file, or relate code.
#named.conf

Code:

logging { category rate_limiting_log { null; }; channel default_log { file "/var/named/log/default" versions 3 size 20m; print-time yes; print-category yes; print-severity yes; severity info; }; };

"/var/named/log/default" About this location, please matching with your server config. Between RHEL and Debian have diference store location.

After more than 3 weeks I didn't have that problem anymore.
Thank you

Richard G · Jan 29, 2024

jonium said:
I didn't have that problem anymore.

Care to share how you fixed it? Was it indeed a hardware issue or did the issue disappeared by itself (also happens sometimes).

jonium · Jan 29, 2024

Richard G said:
Care to share how you fixed it? Was it indeed a hardware issue or did the issue disappeared by itself (also happens sometimes).

Sorry,
I thought it was clear, I simply applied the jamgames2's hints: https://forum.directadmin.com/threads/rsyslogd-and-journald.69690/post-369724
Greetings
Pier Paolo

Rsyslogd and Journald

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User

Verified User