[logs] regexless parsing, again?

Daniel Cid danielcid at yahoo.com.br
Mon Sep 17 12:30:32 PDT 2007


Hi Anton,

Wow, you know how to start a good discussion in here
:)

Anyway, after reading some of the replies, I had to
jump in and share a bit of my thoughts...

First of all, I think most projects do log analysis
wrong. They confuse log decoding with rule matching
and end up with hundred of regexes that are checked on
every log. Regexes can be used to extract some bits of
patterns from the logs, but not as the main method to
do the log analysis...

How do I do with ossec? Well, first, I divide the
process in two: log decoding and then "classification"
or rule matching. Second, instead of thousands of
regexes for every log, I build a decoding/rule tree,
limiting the number of checks per log.

For example, when you receive an sshd message, it
first looks for the program_name sshd, and only if it
matches , it goes to the sshd rules. Same of all the
other message formats. After the decoding is done, it
goes to the rules specific for that decoder (sshd,
proftpd, etc, etc). At the end, we have more than 500
rules, but only 8 or 10 are checked for each log in
average...

What do I want to say with that? Tools that depend
only on Regexes are slow hard to maintain. But if you
combine that with fast word matching, proper decoding,
tree-based searching, etc you can go somewhere :)

*Btw, I am the author/developer of ossec.

Thanks,

--
Daniel B. Cid
dcid ( at ) ossec.net


--- Anton Chuvakin <anton at chuvakin.org> escreveu:

> All,
> 
> I think it is a good time to revisit this fun
> subject that we
> _revisited_ back in 2005: regexless log message
> processing. (e.g. see
> my post "regex-less parsing of messages" and the
> prolonged discussion
> that followed here:
>
http://lists.jammed.com/loganalysis/2005/12/index.html)
> 
> So, has the world changed since that glorious time?
> :-) I think it
> did, but only a little. We do have a lot more weird
> logs to analyze,
> log indexing got much better (but the quality and
> presentation of
> parsed data still beats the indexed data) and more
> people want to do
> the log management right (there is also this
> compliance thing, but I
> digress..)
> 
> Anybody care to restart the discussion and see what
> the collective
> wisdom of loganalysis can produce?
> 
> As a semi-humorous warning, please don't suggest the
> following - we've
> seen these before:
> 
> - wait until all logs are in a common XML schema (we
> know how this one
> ends: MJR emerges out of the darkest part of the
> woods and kicks
> everybody's ass :-))
> - use our award-losing UI to "easily" create the
> regexes
> - be happy with keyword searching
> - just write the darn regexes
> (also see
>
http://lists.jammed.com/loganalysis/2005/12/0025.html)
> 
> Ready, set, GO!!!
> 
> Best,
> -- 
> Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA
>       http://www.chuvakin.org
>   http://chuvakin.blogspot.com
>     http://www.info-secure.org
> _______________________________________________
> LogAnalysis mailing list
> LogAnalysis at loganalysis.org
>
http://www.loganalysis.org/mailman/listinfo/loganalysis
> 



      Flickr agora em português. Você clica, todo mundo vê.
http://www.flickr.com.br/


More information about the LogAnalysis mailing list