[logs] regexless parsing, again?

Tom Le dottom at gmail.com
Mon Sep 24 16:24:30 PDT 2007


> BTW,  here is a patent for log management , which (among other things)
> "explains" how to "parse" unknown logs, apparently with no manually
> written regexes in sight...

http://www.freshpatents.com/System-and-method-for-analysis-and-management-of-logs-and-events-dt20060817ptan20060184529.php?type=description

> "[0031] Another preferred embodiment of the present invention
> describes a method for parsing log data with undefined grammar. The
> method comprises the following steps: a) storing more than one pattern
> object record of different grammar types, b) receiving at least a
> portion of raw log data input from at least one computerized system,
> c) identifying the delimiter of the portion of raw log data's grammar,
> d) using the delimiter for generating a new pattern object
> representing the grammar type of the log data, the new pattern object
> comprising a list of terms, and e) storing the new pattern object. "

Sounds like a standard tokenization methodology.  Other network vendors have
implemented similar methods using dynamic token dictionaries of byte
stream.  Same approach can be applied to log messages.  You identify log
messages not by a regex but by token values within the message, the
"grammar" of the tokens, etc.  You can have dictionary tokens, grammar
tokens, tokens-of-tokens, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.loganalysis.org/pipermail/loganalysis/attachments/20070924/98b9b523/attachment.html


More information about the LogAnalysis mailing list