[logs] Analyzing tons of logs

Peter Sicilia pete at sensage.com
Thu Mar 29 13:05:01 PDT 2007


As a point of reference, in our 100B record environment that Dan Barahona
mentioned, we were able to load 26TB worth of data at a rate of 300K records
per second sustained. Also, rather than taking a data bloat hit, as you
might expect using an RDBMS system, we actually reduced the size of the data
and provided a fully redundant HA solution. We were then able to scan that
data at a rate of roughly 25M rps specifically for "needle in the haystack"
types of queries.

This was all done using low cost Dell servers and EMC Centera storage... so
we're not talking about millions of dollars in hardware and services either.
Of course we could scale for whatever size problem you might have.

There are viable options available.

Cheers,
Pete

Peter Sicilia

Director of Technical Business Development
SenSage Inc.
http://www.sensage.com/ 

cell: +1 415-902-7908
email: pete at sensage.com

-----Original Message-----
From: loganalysis-bounces at loganalysis.org
[mailto:loganalysis-bounces at loganalysis.org] On Behalf Of Raffael Marty
Sent: Thursday, March 29, 2007 1:48 PM
To: Daniel Cid
Cc: dcid at ossec.net; loganalysis at loganalysis.org; Chetan Gupta
Subject: Re: [logs] Analyzing tons of logs

All nice and interesting. I agree with a couple of the postings. Anton
is spot on, you too, Chetan. Have you imported a few terabytes of data
into ANY system? Have you done the math with hard drive access times,
network bandwidth, etc.? Good luck!

Even assuming that you _can_ get the data into some system in acceptable
time. The next challenge is going to be the analysis part. I am not sure
about your specific case, but I am assuming you don't know too much
about the data that you are analyzing. Or do you have a particular goal
in mind? (e.g., looking for all the traffic from a certain user/IP)

If you are trying to analyze the data and get a grip of what is going
on, you might want to try visualization (http://secviz.org). Take a
smaller junk of you data and visualize it to see what the "big themes"
are. Once you have a handle of that, you can try to filter your data
down and visualize the filtered results to get yet another overview of
what you are dealing with. This can go on until you find what you need.

There are quite a few tools (e.g., afterglow.sourceforge.net) in the
open source space that can help you with all of this.

Good luck!

  -raffy

PS: Anton, this _is_ the place for a visualization discussion :)

> Hi Chetan,
> 
> For the amount of data that you want to analyze, I
> agree with Anton, there is no single solution
> (commercial or open source) that can handle that.
> They will all fail miserably... First of all, you
> will have a huge network bandwidth usage, not speaking
> about disk space and cpu/memory power to analyze
> all of that (specially considering most tools use
> regex).
> 
> What I would suggest is some form of segmentation or
> partition of all this data. You can create one
> log analysis "station" for each department or each
> section of your company. This way you can perform
> your analysis based on the goals of each department.
> 
> For example, on your main servers inside the DMZ, you 
> can setup a "DMZ" log station, where you can monitor
> the logs from there.. This way traffic doesn't need to
> leave each segment and the memory/disk/cpu
> requirements
> can be easily manageable (oh, and it is scalable).
> 
> 
> Hope it helps...
> 
> --
> Daniel B. Cid
> dcid ( at ) ossec.net
> 
> 
> 
> 
> 
> 
> --- Chetan Gupta <Chetan.Gupta at in.ey.com> escreveu:
> 
> > Dear List Members,
> > 
> > I am looking for opinion from the experts for a
> > particluar problem.
> > 
> > How do we go about log analysis if we have tons
> > (maybe in trillions) of
> > logs from lets say tcpdump (raw logs) or some
> > firewall (like netscreen or
> > pix)?
> > What would be the best way to normalize and analyze
> > these logs in the
> > shortest possible time?
> > Import them into a database? Use a commercial
> > application like arcsight?
> > loglogic? simple text editor like editplus?
> > Any suggestions/comments would be appreciated.
> > 
> > Regards,
> > 
> > Thanks and Regards,
> > ERNST & YOUNG ®
> > Ernst & Young Pvt. Ltd
> > 
> > Chetan Gupta
> > Consultant
> > Risk and Business Solutions
> > FIDS
> >
> _______________________________________________________
> > 
> > 
> > Mobile:      +91 - 9810718489
> > Fax:          +91 - 11 - 2661 1012
> > URL:          http://www.ey.com/in
> >
> _______________________________________________________
> > 
> > 
> > 
> >
> ----------------------------------------------------------
> > The information contained in this communication is
> > intended solely for the use of the individual or
> > entity to whom it is addressed and others authorized
> > to receive it.   It may contain confidential or
> > legally privileged information.   If you are not the
> > intended recipient you are hereby notified that any
> > disclosure, copying, distribution or taking any
> > action in reliance on the contents of this
> > information is strictly prohibited and may be
> > unlawful. If you have received this communication in
> > error, please notify us immediately by responding to
> > this email and then delete it from your system.
> > Ernst & Young is neither liable for the proper and
> > complete transmission of the information contained
> > in this communication nor for any delay in its
> receipt.>
> _______________________________________________
> > LogAnalysis mailing list
> > LogAnalysis at loganalysis.org
> >
> http://www.loganalysis.org/mailman/listinfo/loganalysis
> 
> 
> __________________________________________________
> Fale com seus amigos  de graça com o novo Yahoo! Messenger 
> http://br.messenger.yahoo.com/ 
> _______________________________________________
> LogAnalysis mailing list
> LogAnalysis at loganalysis.org
> http://www.loganalysis.org/mailman/listinfo/loganalysis

-- 

Raffael Marty, GCIA, CISSP                    raffael.marty at arcsight.com
Manager                                  Strategic Application Solutions
ArcSight, Inc.                                         +1 (408) 864 2662
Security Data Visualization:                           http://secviz.org

_______________________________________________
LogAnalysis mailing list
LogAnalysis at loganalysis.org
http://www.loganalysis.org/mailman/listinfo/loganalysis





More information about the LogAnalysis mailing list