[logs] open source artificial ignorance-like systems

Marcus J. Ranum mjr at ranum.com
Tue Apr 17 14:35:24 PDT 2007


Chris Buechler wrote:
>What I've spent a lot of time looking for, with no success, is an open source system that will implement something similar but smarter than a simple grep from cron.

There's a thingie on my website called "retail" which was designed
to feed data in chunks into the Fargo log processor. Retail is here:
http://www.ranum.com/security/computer_security/code/index.html

Fargo was an engine I wrote in 2001 that was way ahead of everything
else I've ever done. UNfortunately, the source code was lost in a
tragic series of mistakes. Some of the ideas are encapsulated in the
manual which I wrote for it, here:
http://www.ranum.com/security/computer_security/archives/fargo-tutorial.pdf
I'm still too demoralized about what happened to try to rewrite it - I
get angry every time I think about it. :(

Since then I have done a few small experiments with better
ways to process truly ginormous amounts of log data at
ridiculous speed. I realized a few years ago that a wrapper around
a set of inputs into lex(1) would result in an amazingly fast
optimal parse-tree specification for selecting log data. The
wrapper would have to manage a work-flow based on a
white/black/grey-list model with a simple feedback loop. I coded
a small piece of such a beastie for Ron Dilley a couple years
ago and the proof of concept ripped logs at whatever speed
I could throw data off the hard drive, at it. So the approach
works - basically you're building nested calls to scanf(...)
to implement a tree-structured matching engine. Coding
such a thing manually is "quite a drag" but someone could
write a pretty user interface to manage the process and
handle recompilation, etc.

By the way - that's the "trick" for handling really large amounts
of log data really fast. Don't write crap in a scripting language
that parses against a list of regexps. Write a specification
language that outputs a nested tree of C function calls,
which get fed into a compiler to produce a direct executable.
It's such a simple and obvious idea...  It doesn't productize,
I think that's why nobody's done it. :(

mjr. 



More information about the LogAnalysis mailing list