[logs] Analyzing tons of logs
Mikael Kuisma
kuisma at ping.se
Thu Mar 29 07:54:55 PDT 2007
Chetan Gupta,
I am very familiar with your problem, since I have been working with it for
the last eight years.
The Problem is well known. Too much data in an never ending paste. A
firewall generates a new log entry for each new session, and for example,
the traffic created by one single user surfing to one single web server can
over time generate a near-infinite number of log entries. Take this times
the number of clients times the number of services, and we all realize it is
not possible to handle, even using databases or the excellent Splunk IT
search engine. You are talking about trillions of entries, and I guess you
mean that literary. Still, you are really not interested in the individual
entries, but the traffic they are representing, or "normalize" it.
The Solution is data mining. Take the example above for a bottom-up approach
of the problem; assume you among all the log data have thousand entries all
about that user surfing to a particular host (e.g. alice:1234 -> bob:80
etc). What you really are interested in here, are not the thousand of log
entries of all the sessions, but the fact that alice surfs to bob, and maybe
when and how much. Or even more likely, you are not at all interested in
this particular traffic, and wish to remove it silently. Therefore, you can
replace the thousand entries with one single entry in the form of the
pattern "alice:* -> bob:80", plus attaching some meters to the pattern to
keep track of when, how much and so. Then take the next entry and form a new
pattern and repeat this for all log entries in the log file until you only
got patterns and no individual log entries left.
The Gain is not only that you reduce the logs with a factor of thousand (as
in this example, often much more in real life situations), but also the fact
that the number of patterns do not increase with the traffic, as the log
data is. Instead, the number of patterns will converge to a fix number, and
only increase with new kinds of traffic in the network. This way, you can
also use this method to detect anomalies in your network environment. Each
pattern can also be studied and measured individually, both because they are
in total quite limited in numbers, and in some way extracts the actual
meaning of the log entries ("alice surfs to bob").
The Tool to do this is of course of your own choice. You can use regular
expressions in perl scripts so similar, but for performance and scalability
I must recommend a tool named ASDIC from Ping Research, since I am the main
architect behind it. =) It will have no problem with a trillion (sic) log
entries, and processes about 50,000 entries per second. You can read more
about it and download it from http://info.ping.se Unfortunately, ASDIC
requires to be installed on a dedicated server, and is not very quick
installed, but if Sun Microsystems gives us permission, we will start
distribute it as a vmware appliance with Solaris and ASDIC preinstalled. We
have contacted Sun in this matter, but we have got no response yet.
Although this posting might be quite technical, it is much simplified, and
do not address many of the issues with this approach of data mining firewall
logs and network traffic. Please see http://info.ping.se for more detailed
information, or feel free to send me a mail.
Best Regards,
Mikael Kuisma, CTO (kuisma at ping.se)
Ping Research
On 3/28/07, Chetan Gupta <Chetan.Gupta at in.ey.com> wrote:
>
>
> Dear List Members,
>
> I am looking for opinion from the experts for a particluar problem.
>
> How do we go about log analysis if we have tons (maybe in trillions) of
> logs from lets say tcpdump (raw logs) or some firewall (like netscreen or
> pix)?
> What would be the best way to normalize and analyze these logs in the
> shortest possible time?
> Import them into a database? Use a commercial application like arcsight?
> loglogic? simple text editor like editplus?
> Any suggestions/comments would be appreciated.
>
> Regards,
>
> Thanks and Regards,
> ERNST & YOUNG (r)
> Ernst & Young Pvt. Ltd
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.loganalysis.org/pipermail/loganalysis/attachments/20070329/7fca4a64/attachment.html
More information about the LogAnalysis
mailing list