[logs] Analyzing tons of logs

Raffael Marty rmarty at arcsight.com
Thu Mar 29 11:35:05 PDT 2007


Be careful what you are looking at. There are two cases: 

1. You are collecting the events in real-time. In that case you won't have much of a problem as the events are trickling in over time.
2. If you have to load all that data at once. Good luck. Do the math: 10^11 events / 10^6 eps = 10^5 s ... That's quite a lot of seconds! 

  -raffy

-----Original Message-----
From: loganalysis-bounces at loganalysis.org on behalf of Dan Barahona
Sent: Thu 3/29/2007 10:40 AM
To: Daniel Cid
Cc: dcid at ossec.net; loganalysis at loganalysis.org; Chetan Gupta
Subject: Re: [logs] Analyzing tons of logs
 
I agree with all of the posters on the challenge of taking that volume of data, and with being able to do anything meaningful with it. I don't agree that it's an impossible challenge though.

<disclaimer>I work for SenSage - a vendor in this space</disclaimer>

I say it's not impossible because we have customers today who are collecting massive volumes of tcpdump data, store long histories of this data, and do have the ability to analyze the data, re-sessionize the data, search the data, etc. In terms of scalability, you can read about a recent 100 billion record dataset we created for call record analysis: http://www.net-security.org/secworld.php?id=4251

Sorry for the blatant vendor plug, but the point is that new technologies do exist, that are highly optimized for analyzing this type of data.

Best regards,

Dan

Dan Barahona
Vice President, Emerging Markets
SenSage, Inc.
dan.barahona at sensage.com
415.808.5911 (w)
415.505.3007 (m)

Daniel Cid wrote: 

	Hi Chetan,
	
	For the amount of data that you want to analyze, I
	agree with Anton, there is no single solution
	(commercial or open source) that can handle that.
	They will all fail miserably... First of all, you
	will have a huge network bandwidth usage, not speaking
	about disk space and cpu/memory power to analyze
	all of that (specially considering most tools use
	regex).
	
	What I would suggest is some form of segmentation or
	partition of all this data. You can create one
	log analysis "station" for each department or each
	section of your company. This way you can perform
	your analysis based on the goals of each department.
	
	For example, on your main servers inside the DMZ, you 
	can setup a "DMZ" log station, where you can monitor
	the logs from there.. This way traffic doesn't need to
	leave each segment and the memory/disk/cpu
	requirements
	can be easily manageable (oh, and it is scalable).
	
	
	Hope it helps...
	
	--
	Daniel B. Cid
	dcid ( at ) ossec.net
	
	
	
	
	
	
	--- Chetan Gupta <Chetan.Gupta at in.ey.com> <mailto:Chetan.Gupta at in.ey.com>  escreveu:
	
	  

		Dear List Members,
		
		I am looking for opinion from the experts for a
		particluar problem.
		
		How do we go about log analysis if we have tons
		(maybe in trillions) of
		logs from lets say tcpdump (raw logs) or some
		firewall (like netscreen or
		pix)?
		What would be the best way to normalize and analyze
		these logs in the
		shortest possible time?
		Import them into a database? Use a commercial
		application like arcsight?
		loglogic? simple text editor like editplus?
		Any suggestions/comments would be appreciated.
		
		Regards,
		
		Thanks and Regards,
		ERNST & YOUNG ®
		Ernst & Young Pvt. Ltd
		
		Chetan Gupta
		Consultant
		Risk and Business Solutions
		FIDS
		
		    

	_______________________________________________________
	  

		
		Mobile:      +91 - 9810718489
		Fax:          +91 - 11 - 2661 1012
		URL:          http://www.ey.com/in
		
		    

	_______________________________________________________
	  

		
		
		
		    

	----------------------------------------------------------
	  

		The information contained in this communication is
		intended solely for the use of the individual or
		entity to whom it is addressed and others authorized
		to receive it.   It may contain confidential or
		legally privileged information.   If you are not the
		intended recipient you are hereby notified that any
		disclosure, copying, distribution or taking any
		action in reliance on the contents of this
		information is strictly prohibited and may be
		unlawful. If you have received this communication in
		error, please notify us immediately by responding to
		this email and then delete it from your system.
		Ernst & Young is neither liable for the proper and
		complete transmission of the information contained
		in this communication nor for any delay in its
		    

	receipt.>
	_______________________________________________
	  

		LogAnalysis mailing list
		LogAnalysis at loganalysis.org
		
		    

	http://www.loganalysis.org/mailman/listinfo/loganalysis
	
	
	__________________________________________________
	Fale com seus amigos  de graça com o novo Yahoo! Messenger 
	http://br.messenger.yahoo.com/ 
	_______________________________________________
	LogAnalysis mailing list
	LogAnalysis at loganalysis.org
	http://www.loganalysis.org/mailman/listinfo/loganalysis
	
	  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.loganalysis.org/pipermail/loganalysis/attachments/20070329/13ec2fd8/attachment.html


More information about the LogAnalysis mailing list