[logs] Analyzing tons of logs

Jose Nazario jose at monkey.org
Wed Mar 28 10:23:03 PDT 2007


On Wed, 28 Mar 2007, Chetan Gupta wrote:

> How do we go about log analysis if we have tons (maybe in trillions) of 
> logs from lets say tcpdump (raw logs) or some firewall (like netscreen 
> or pix)? What would be the best way to normalize and analyze these logs 
> in the shortest possible time? Import them into a database? Use a 
> commercial application like arcsight? loglogic? simple text editor like 
> editplus?

don't kick me for saying this, but you haven't posed any questions about 
what you're trying to address with this log analysis. traffic over time? 
failed logins? attacks? application usage? server usage and utilization?

what you want to do will dictate what tools you'll use, and hence what 
normalization you'll do.

first things first, make sure all logs have the same timestamp references 
(ie UTC). if not, normalize that.

this next step will explode the data storage requirements, but gives you a 
bunch more indices to query on.

secondly, for network traces, a few breakdowns can be useful:
- split all traffic into sessions and save those out in individual files
- run an IDS over it and look for known attacks and alerts
- run AV over the session payloads to look for known bad stuff
- identify what applications are in use and tag the traces that way
- organize the traces by source, dest, service (proto/port), payloads, and 
alerts
   - you can use a database with foreign keys or even just a filesystem
    with links

for syslog data, a number of high performance engiines exist. you can dump 
your data into those systems, run some analysis on the content and then 
you have a nice searchable database.

find the lowest common denominator that preserves the info for text based 
logs (ie those PIX logs, Windows server logs, etc) and use that, ie 
syslog.

hope that helps.

________
jose nazario, ph.d.		    jose at monkey.org
http://monkey.org/~jose/ 	    http://monkey.org/~jose/secnews.html
 				    http://www.wormblog.com/


More information about the LogAnalysis mailing list