Offline responses to tbird question about loghost performance

Response 1)

for those of you who are running centralized loghosts, what's the
average amount of data you collect in a day? what's the average
CPU utilization on your server?

<<1MB/day, and negligible, respectively.

We do distributed data decimation; our filters reside on the endpoints, we regard anything that spews more than a message per hour or thereabouts to be in need of tuning. This approach scales gracefully over time, as that central loghost never becomes a bottleneck, and the ongoing tuning gets associated with the deployment of new kit.

I've been at shops that Keep Everything, and some of them have horking big logservers running warm all the time, accumulating hundreds of GB/day. Making any actual use of that data is a major project.

i'm trying to wave my hands in the air violently and make up some
performance numbers.

Figure a range somewhere between <1KB/day/machine (tightly tuned, only the most interesting events forwarded, or relatively idle machines) and rates approaching an interesting fraction of 1MB/sec/machine (busy machines, completely untuned, full logging of everything including such audit trace info as logs of every syscall that does a permission check --- open/create/mkdir/unlink, exec, kill, ...). I do like to keep all log data for a long time, but I like to keep it distributed about the boxes where it's generated; I don't like the labor of trying to consolidate huge volumes of log data, most of which is noise.

Response 2)

I have a box which collects Cisco syslogs.

On a weekday there are on average 28000 messages (about 3.2 Mbytes).

The syslog service take about 0.3% of the CPU (syslog cputime/uptime Is this a good way of working it out?)

The box is running Linux and is a PIII at 750MHz with a IDE hard disk.

Response 3)

500K per active server? A lot less for relatively inactive servers. Multiply by five or six servers, and that's a couple of megs. Totally unscientific. I haven't configured these servers to log bloody everything, though, but even so there's a heck of a lot more chaff than wheat. I would suspect that I'm at the low end of the folks on this list...

Response 4)

CPU utilization hasn't been our problem. . .

I did a quick check of the logs I haven't archived yet (about 90 days worth) and we average 11,035 events per logged server per day. We're currently collecting from 91 servers that amounts to about 1,000,000 events per day. We have a mixture of PIX, Unix and Windows machines sending their logs. The PIX logs could overwhelm everything else (sometimes they would climb to 2,000,000 or more events per hour; so, we had to start ignoring most messages from those until we can figure out how to pre-process them). We're also not collecting our WEB logs this way; they are just too bulky, we collect an average of 1GB of web log per day on our main web server which we immediately compress using gzipto about 70MB of flat-file and process mainly for statistics once per month.

For our central syslog collector, we're using a 2 processor Dell Poweredge 2550 with a Gigabyte of RAM, running SQL Server 2000 and SL4NT as its syslog daemon:

Processor Processor1 Processor Family Pentium III
Processor Version Model 8 Stepping 10
Current Speed 1000 MHz

Processor2 Processor Family Pentium III
Processor Version Model 8 Stepping 10
Current Speed 1000 MHz

Memory Total Physical Memory Size 1024 MB
Total Physical Memory available to OS 1023 MB
Free Physical Memory 10 MB
Total Virtual Memory Size 2064 MB
Free Virtual Memory 2028 MB

It talks to a DELL Powervault with the database logs mirrored but the actual syslog data on a Raid-5. Disk I/O rate is the main problem; CPU utilization is usually less than 10% except during queries. To alleviate the Disk I/O problem we're looking at moving to a SAN environment with RAID 0+1. Backups are the biggest problem, that's why we decided a SAN was the way to go if we got much more than 100GB online.

Right now we have space for about 100GB of logs. Periodically (whenever looks like my disks might fill up), I export events older than 90 days to XML files, compress them and burn them to CD. That's been working pretty well. The XML files can be easily and reliably imported back into a database if they need to be searched. That's never happened. :-)

Response 5)

My machine:

FreeBSD 4.6-STABLE #15: Tue Aug 20 14:37:29 EDT 2002
CPU: Pentium III/Pentium III Xeon/Celeron (1125.77-MHz 686-class CPU) [1 cpu]
avail memory = 516698112 (504588K bytes)
Lines of log/day: 1.6M
Megs of log/day: 200M
Avg cpu %: 1-3%

I have a log watcher running continuously as well. The CPU utilization above doesn't count when my log report program runs (2x a day).

Response 6)

I have no idea about CPU utilization, but it could not have been much, even when I was on a 100 MHz Pentium. I ran a central log server for a software development project and when I left the project I had about 80 UNIX systems, four Windows systems and a Cisco router forwarding about 4 Megabytes per day. The peak was about 15 Megabytes per day and was with fewer systems and without the router. The change was primarily due to getting the systems properly configured (shutting off unused and chatty services). The central log server only collected data, any processing was done offline. I guess there would have been a spike to 100% CPU utilization every time the logs were copied to the analysis system.

Good luck in making up numbers. It is very situation dependent. The key question you missed asking, was "what kind of realtime processing is done on the data?". Systems that only collect will have minimal CPU utilization, and systems that are doing IDS and trend analysis could keep multiple CPUs busy.

NEW!
We support the SANS Log Management Summit 2006 (Click Here for more information)

(Bonus: Now You can download Tbird's SANS Webcast Slides)

[Optional]
[Optional]
[Optional]