[logs] Error messages from syslogd
Mordechai T. Abzug
morty at frakir.org
Wed Jul 11 22:03:13 PDT 2007
On Wed, Jul 11, 2007 at 08:07:10PM -0400, Marcus J. Ranum wrote:
> Code you put in production should never stop working as a result
> of obscure error conditions. And, yes, maybe it means that your
> process needs to create a runtime crashdump file. But that's
> not what the system log is for. System logs are not a replacement
> for:
> 1) reliable code
> 2) debuggers
> 3) graceful abnormal program termination
I would love to live in a world where all code put into production is
bug-free. In practice, we live in a world where code put into
production usually contains bugs. Which is why vendors release
patches, right?
This is especially true when working with $vendor's buggy COTS
program, for which I cannot apply a debugger, make it more reliable,
or prevent it from crashing or otherwise misbehaving without more
information. When a COTS product has a problem, looking at logs,
including syslogs, can be instrumental in fixing said problem. I
don't want to send the vendor a crashdump, and wait 6 months for a
patch, I want to see a debug message saying that the product had
successfully started and finished X, then crashed while doing Y, so I
can figure out how Y differs from X and maybe workaround the problem.
This is not theoretical -- I do this kind of thing all the time, and
it's part of why I care about logs.
> A rule of thumb is that if a program needs to abend
This is not just about abnormal termination, it's also about a program
*not* *working*. For example, the firewall is up, but it has stopped
allowing telnet connections. The logs have tn-gw whining that it has
too many policies -- apparently, the vendor in question had a
hardcoded limit of 100 tn-gw policies in netperm. And I'm sure you
know which firewall vendor I'm talking about. ;) Thanks to having a
useful failure message in the logs, I could then workaround the
problem (merge some policies temporarily) and request a patch from the
vendor more easily than by just saying "it's broke, I don't know why,
fix it." [BTW: I had to wait just under three months for the vendor
in question to actually issue the patch.] I could also have
experimented around (i.e. restore from config management the last
config that worked and spend a while trying to figure out what was the
bug), but again, that would have taken a lot longer than just reading
a friendly log message.
Log messages, even for obscure error conditions, can help save a lot
of time when troubleshooting. Sure, you might be able to find the
problem without the log message, but the log message makes it a lot
easier. And the log message has no cost under normal operations --
it's only in the code path that deals with the error, not in the code
path for normal operations.
> The way syslogs are used on UNIX is largely laziness, and it
> shows. Once you've opened the syslog, well, heck, why not just
> syslog everything?
No, it's pure genius. raw syslog has been so successful precisely
because it takes so little effort to get value. Speaking from a
sysadmin perspective, I would *way* rather have syslog spewage than a
more austere logging system that is better-structured but might not
contain the critical clue I need to actually fix a problem quickly.
The ease of logging errors means that errors do actually get logged,
which is *wonderful*.
One thing that makes me nervous about all the hotshot logging
frameworks is that, by imposing too many rules on programmers, they
may discourage programmers from actually reporting errors.
The goal of logging is to get useful information, not to make life
easier for the log analysis vendors.
- Morty
More information about the LogAnalysis
mailing list