[logs] regexless parsing, again?

David Corlette dcorlette at novell.com
Fri Sep 14 12:59:45 PDT 2007


Hmmm...

Although I tend to agree that logs are nasty, and that regexes can be a pain, I'm not sure it makes sense to try a completely new solution.  In our product we actually use several different types of parsers depending on the format of the input; for example, WMI messages, database rows, and some syslog rows can often be read with out name-value pair parser, even in cases where a regex would be impossible because order is not guaranteed.  In other cases we use a "split"-like operator - CSV and similar. But then there are plenty of cases where we use regexes as that seems the most appropriate.

That said, it would make sense to extend the language we use for a parser such as the one you describe.  I guess what I'm saying though is that I'm not sure I'd ditch regexes or other parsers in favor of a single other parser - why not let the author pick the parser that's most appropriate for the message format?  Or perhaps there are situations where regexes are appropriate for some subsection of a message?

>>> On Fri, Sep 14, 2007 at 12:45 PM, in message
<6.2.0.14.2.20070914122810.02e90960 at ranum.com>, "Marcus J. Ranum"
<mjr at ranum.com> wrote: 
> Mordechai T. Abzug wrote:
>>Is the intent to eliminate the need for manual configuration, or just
>>to exchange manual regex configuration for a manual lex+yacc-style
>>setup?
> 
> A couple years ago Abe and I burned a bunch of cycles (literally!
> It's cool to have friends with supercomputers...) trying a couple
> of approaches we came up with to auto-generate log parsing
> rules. It turned out (pretty much what we expected) that it's a
> really hard problem. It also taught me that logs are nastier than
> I thought they were - which was a real eye-opener because I
> already knew logs are pretty nasty.



More information about the LogAnalysis mailing list