[logs] Error messages from syslogd

Rainer Gerhards rgerhards at hq.adiscon.com
Wed Jul 11 23:32:35 PDT 2007


Hi Marcus,

I am following you into the somewhat OT area, so it again is the
moderator's decision ;)

I tend to mostly agree with you. However, I'd like to add a subtle
point.

First of, the most important thing for a logging subsystem is that it
never dies. I fully agree. I go even further - a crashdump and abort
should only happen if there is absolutely no other way of handling the
situation. In my projects, I prefer losing messages (even multiple) than
to shut down the logging subsystem. It's always better to lose many than
to lose all. And after all, there is always hope that things clear up.

HOWEVER, I have a different view on service startup. My projects, too,
log things like bind errors IF they happen during service startup. In
the spirit of what I have said before, they still continue to run and
perform as much work as possible. For example, if I can't bind the TCP
port, I can still listen to incoming UDP messages. While I lose TCP, I
do not lose everything (so this is preferred). And you may argue
rightfully that the logging subsystem should try to recover, e.g. by
re-trying the bind somewhat later (I am doing this partly, but have not
yet reached the goal 100%). In any case, I think it is useful to log
those kind of errors when they occur in a very early phase of subsystem
initialization. I fear if I do not log them to the system log, nobody
well ever find the reason, because who looks for a crashdump if it is a)
unusual to happen b) at a remote location c) there is no failure
indication in your log.

When reading my lines please always keep in mind that I keep the logging
subsystem as whole running under all circumstances.

Any ideas for an improved handling of such situations are appreciated.

Rainer

> -----Original Message-----
> From: loganalysis-bounces at loganalysis.org [mailto:loganalysis-
> bounces at loganalysis.org] On Behalf Of Marcus J. Ranum
> Sent: Thursday, July 12, 2007 2:07 AM
> To: Mordechai T. Abzug
> Cc: loganalysis at loganalysis.org
> Subject: Re: [logs] Error messages from syslogd
> 
> Mordechai T. Abzug wrote:
> 
> [Moderator: this is partially OT and I'll probably applaud if you kill
> it]
> 
> >And what's wrong with this?
> 
> Plenty.
> 
> >I do this, but I didn't learn it in school.  I learned it the hard
way
> >while working as a sysadmin+programmer.  If a process suddenly stops
> >working due to some obscure error condition that I thought would
never
> >happen
> 
> Code you put in production should never stop working as a result
> of obscure error conditions. And, yes, maybe it means that your
> process needs to create a runtime crashdump file. But that's
> not what the system log is for. System logs are not a replacement
> for:
> 1) reliable code
> 2) debuggers
> 3) graceful abnormal program termination
> 
> A rule of thumb is that if a program needs to abend because of
> some kind of system condition, then it makes sense to put it
> in the log (i.e; file system full, inability to fork a new process,
> etc)
> because system problems may affect other running processes.
> 
> There's a general problem with UNIX that Eric was trying
> to solve when he wrote syslogd (I pestered him about
> this at length at USENIX in 2000 when I was working
> on the notes for my syslog tutorial) - lots of programs
> that kept their own runtime logs and no centralized
> management of them. Eric wrote syslogd so that all
> the logs could be brought to one place and cleanly thrown
> away at once. In this regard, Eric was - once again - a
> visionary. Eric also confirmed that the reason that
> the syslog(3) function looks a lot like the fprintf(3)
> function was because he went through the whole
> BSD source tree replacing fprintf wherever appropriate.
> He did say out that if he had to do it all over again
> he'd have done it differently - though the problem that
> bedevils UNIX system logging (free format error message
> spewage)* was not his fault: prior to syslogd UNIX
> applications wrote whatever they wanted to their own
> log files - all syslogd was intended to do was
> centralize the spewage.
> 
> The way syslogs are used on UNIX is largely
> laziness, and it shows. Once you've opened the
> syslog, well, heck, why not just syslog everything?
> It'd be maybe a dozen extra lines of code to write
> a crash-dump function that wrote abend data to
> a place where it actually won't get deleted like
> it might with syslog. In my career as a programmer
> I once had to try to figure out a complex system
> failure one aspect of which was that an rarely
> failing subsystem was syslogging its death to a
> syslog server that was already dead. I wasted
> 2 days scratching my head trying to figure out
> how to reproduce an error condition that, if the
> programmer who had written the code** wasn't
> a retard, would have been in a crash dump file.
> 
> Other times, you're dealing with software for an
> embedded system (like, say, a firewall....) and it
> doesn't make sense to expect the user to be able
> to
> "cat /var/log/whumpus/crash.dump | mail mjr at ranum.com"
> in which case, sure, letting them grep through
> 1.3gb of syslog spewage to try to find the relevant
> line - yeah, that's much more convenient.
> 
> I've been guilty of syslog spewage myself, plenty
> of times. But that was before I got into the syslog
> analysis side of things and realized that most people
> stick stuff into the log that they should be cleaning
> out as part of the process of writing code that
> does not suck.
> 
> >Isn't this one of the main
> >values of having a logging system, if not *the* main value?
> 
> Only if you're the kind of programmer who thinks
> printf(3) is a debugger.
> 
> mjr.
> ----
> (* That's the technical term for it)
> (** mjr at welch.jhu.edu was his email address)
> _______________________________________________
> LogAnalysis mailing list
> LogAnalysis at loganalysis.org
> http://www.loganalysis.org/mailman/listinfo/loganalysis



More information about the LogAnalysis mailing list