[logs] ugliest application logs ever?

Fredrik Bengtsson fredrik.bengtsson at fortego.se
Thu Jan 24 23:09:37 PST 2008


Me personally, I dislike the regex approach for all but the most 
simplistic formats and instead have customized parsers that can easily 
and through well documented code deal with any quirks of the format in 
question. They usually end up with not that many lines of code anyway 
(and a whole lot faster), especially with a decent library of support 
routines that grows with time.

Like a good NVP parser - I added the option bits MULTIWORD_KEY and 
MULTIWORD_VALUE to ours so it can deal with the case of both (Fortigate)

   [...] Original Address=172.10.10.1 [...]

and (Clavister?)

   [...] flags=SYN ACK [...]

(but not both at the same time, obviously). Plus quoted keys and values 
of course. Once you've written this, parsers for name-value formats 
usually just contain a long list of lines like

   VALIDATE_RESULT(Parser::FetchStringParameter("recvif", &ptr, &rint));
   VALIDATE_RESULT(Parser::FetchIPv4Parameter("srcip", &ptr, &cip));
   VALIDATE_RESULT(Parser::FetchIPv4Parameter("destip", &ptr, &sip));

where these helpers use the underlying ParseSingleKeyValuePair 
iterator-like call and does additional value validation where necessary. 
That way, you can also get error results like "Expected 'recvif' 
parameter at character position 24, found 'gengis khan'" which is 
usually infinitely more useful than most regex parser error messages.

Makes sense?

/Fredrik


David Corlette wrote:
> We just do a replace on those before we do the NVP parse. E.g.:
> 
> src zone --> src_zone
> dst zone --> dst_zone
> 
> Then we can run our standard NVP parser routine and it works like a charm...
> 
>>>> On Thu, Jan 24, 2008 at  5:20 PM, in message
> <47990F41.2040603 at packetnexus.com>, Jason Lewis <jlewis at packetnexus.com> wrote:
> 
>> Except they didn't standardize the keys....
>>
>> proto=6 src zone=Trust dst zone=Untrust action=Permit
>>
>> There is a space before zone that hoses things up.
>>
>> Dilley, Ron wrote:
>>> Jas,
>>>
>>> This does not look too bad as long as you don*t use regex to parse it.
>>>
>>> Key=value all the way . . .
>>>
>>> Ron
>>>
>>>
>>>
>>> On 1/24/08 11:52 AM, "Jason Lewis" <jlewis at packetnexus.com> wrote:
>>>
>>>     I don't know about ugly, but logs that are difficult to parse suck.
>>>
>>>     Netscreen:
>>>     messages:Dec 17 09:35:27 10.14.93.7 ns5xp: NetScreen device_id=ns5xp
>>>     system-notification-00257(traffic): start_time="2002-12-17 09:40:18"
>>>     duration=4 policy_id=0 service=tcp/port:8000 proto
>>>     =6 src zone=Trust dst zone=Untrust action=Permit sent=715 rcvd=6561
>>>     src=10.14.94.221 dst=10.14.90.217 src_port=1039 dst_port=8000
>>>     translated
>>>     ip=10.14.93.7 port=1217
>>>     messages:Dec 17 09:35:27 10.14.93.7 ns5xp: NetScreen device_id=ns5xp
>>>     system-notification-00257(traffic): start_time="2002-12-17 09:40:18"
>>>     duration=4 policy_id=0 service=tcp/port:8000 proto
>>>     =6 src zone=Trust dst zone=Untrust action=Permit sent=651 rcvd=2782
>>>     src=10.14.94.221 dst=10.14.90.217 src_port=1040 dst_port=8000
>>>     translated
>>>     ip=10.14.93.7 port=1218
>>>
>>>     There isn't a good delimiter to break the log up, so it requires an
>>>     custom regex. Trying to use a space is a nightmare. Give me something
>>>     so I can quickly grab only what I need. I like pipe delimited.
>>>
>>>     jas
>>>
>>>
>>>     Anton Chuvakin wrote:
>>>     > All,
>>>     >
>>>     > Ah, long time - no post! :-)
>>>     >
>>>     > I wanted to turn this into a formal contest but figured I'd poll the
>>>     > list first: what are the ugliest, most useless application logs that
>>>     > you've seen? Logs that defy log analysis, that are full of numeric
>>>     > codes not explained anywhere? Logs that don't say what they mean (and
>>>     > vice versa)? Logs that omit the most critical piece of info?
>>>     >
>>>     > Here is my example:
>>>     >
>>>     > |22:22:32|BTC| 7|000|DDIC | |R49|Communication error, CPIC
>>>     > return code 020, <application> return code 456
>>>     >
>>>     > Why it sux: numeric codes (twice), ambiguous language, no sense of
>>>     > priority, etc.
>>>     >
>>>     > More?
>>>     >
>>>     > Best,


More information about the LogAnalysis mailing list