[logs] Analyzing tons of logs

Anton Chuvakin anton at chuvakin.org
Wed Mar 28 20:20:22 PDT 2007


Chetan and all,

> How do we go about log analysis if we have tons (maybe in trillions) of logs
> from lets say tcpdump (raw logs) or some firewall (like netscreen or pix)?
> What would be the best way to normalize and analyze these logs in the
> shortest possible time?

Let's see here: assuming 1 trillions records of 200 bytes (typical
PIX, way too small for a packet), we are looking at roughly 180TB of
data. To analyze... not just to store.

So, I have a sneaking suspicion that ALL the mentioned solutions will
fail miserably, albeit without embarrassing their creators (cause
that's a looooooooooot of data!). I have to admit that Jose is
probably right: you might need to write some purpose-specific code
here. Look up some old posts by Marcus Ranum (here
http://www.andrews.hu/guru/msg583.html and around) for useful tips on
super-fast but purpose-specific log processing.

Best,
-- 
Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA
      http://www.chuvakin.org
  http://chuvakin.blogspot.com
    http://www.info-secure.org


More information about the LogAnalysis mailing list