[logs] Error messages from syslogd
Marcus J. Ranum
mjr at ranum.com
Wed Jul 11 17:07:10 PDT 2007
Mordechai T. Abzug wrote:
[Moderator: this is partially OT and I'll probably applaud if you kill it]
>And what's wrong with this?
Plenty.
>I do this, but I didn't learn it in school. I learned it the hard way
>while working as a sysadmin+programmer. If a process suddenly stops
>working due to some obscure error condition that I thought would never
>happen
Code you put in production should never stop working as a result
of obscure error conditions. And, yes, maybe it means that your
process needs to create a runtime crashdump file. But that's
not what the system log is for. System logs are not a replacement
for:
1) reliable code
2) debuggers
3) graceful abnormal program termination
A rule of thumb is that if a program needs to abend because of
some kind of system condition, then it makes sense to put it
in the log (i.e; file system full, inability to fork a new process, etc)
because system problems may affect other running processes.
There's a general problem with UNIX that Eric was trying
to solve when he wrote syslogd (I pestered him about
this at length at USENIX in 2000 when I was working
on the notes for my syslog tutorial) - lots of programs
that kept their own runtime logs and no centralized
management of them. Eric wrote syslogd so that all
the logs could be brought to one place and cleanly thrown
away at once. In this regard, Eric was - once again - a
visionary. Eric also confirmed that the reason that
the syslog(3) function looks a lot like the fprintf(3)
function was because he went through the whole
BSD source tree replacing fprintf wherever appropriate.
He did say out that if he had to do it all over again
he'd have done it differently - though the problem that
bedevils UNIX system logging (free format error message
spewage)* was not his fault: prior to syslogd UNIX
applications wrote whatever they wanted to their own
log files - all syslogd was intended to do was
centralize the spewage.
The way syslogs are used on UNIX is largely
laziness, and it shows. Once you've opened the
syslog, well, heck, why not just syslog everything?
It'd be maybe a dozen extra lines of code to write
a crash-dump function that wrote abend data to
a place where it actually won't get deleted like
it might with syslog. In my career as a programmer
I once had to try to figure out a complex system
failure one aspect of which was that an rarely
failing subsystem was syslogging its death to a
syslog server that was already dead. I wasted
2 days scratching my head trying to figure out
how to reproduce an error condition that, if the
programmer who had written the code** wasn't
a retard, would have been in a crash dump file.
Other times, you're dealing with software for an
embedded system (like, say, a firewall....) and it
doesn't make sense to expect the user to be able
to
"cat /var/log/whumpus/crash.dump | mail mjr at ranum.com"
in which case, sure, letting them grep through
1.3gb of syslog spewage to try to find the relevant
line - yeah, that's much more convenient.
I've been guilty of syslog spewage myself, plenty
of times. But that was before I got into the syslog
analysis side of things and realized that most people
stick stuff into the log that they should be cleaning
out as part of the process of writing code that
does not suck.
>Isn't this one of the main
>values of having a logging system, if not *the* main value?
Only if you're the kind of programmer who thinks
printf(3) is a debugger.
mjr.
----
(* That's the technical term for it)
(** mjr at welch.jhu.edu was his email address)
More information about the LogAnalysis
mailing list