[logs] regexless parsing, again?

Marcus J. Ranum mjr at ranum.com
Sat Sep 15 13:40:19 PDT 2007


Tom Le wrote:
>No you don't.  You only have to try the # of times until you match.

If you do that, then you have no way of detecting multiple matches
on a single input line. Which raises the question of "how do you know
which one is right?"    If you don't, then you have to worry about the
ordering of your match rules and that's absolute in(s)anity.

>  Even if you get no match you don't have to try them all.  You can use preparsers to limit the # of regex rules to match against.

I said that earlier. Under the category of "lipstick on a pig."
You can definitely do clever work-arounds to make regexps
less awful. But that doesn't mean that they're not at least
a little bit awful.

>This sounds like you are looking at more traditional use of regex where you have one regex = one log event.  You can use regexes to perform matching on just parts of a log message (or data stream or whatever).  By matching parts within a log message, you can build vectors, hierarchical, or state-space approaches to build a faster "matching engine" while still using regexes. 

If you are so fortunate as to have log messages that are matchable
in that way, yes. Snort log messages, for example, can be almost all
matched by 3 or 4 regexes. But try HP printers. :)  Just getting the
quoting rules for HP printer log messages is enough to make a
rational person think seriously about putting a gun to their head.

>Another example: use regex to define your "matching rules" and then convert from regex to DFA at implementation.

Yes; that's "industrial strength lipstick."

I'm trying to understand your point. If I can summarize it, it appears
to be: "No. Marcus, you're wrong. Regexes CAN still be used even
though using them is awkward, brain-damaging, and ugly. If you are
stubborn enough, there is no need to try to do better."

I'm guessing you must love Microsoft Windows, too.

mjr. 


More information about the LogAnalysis mailing list