Jeff Saxe's intro to multilog

From my reading of his Web pages, djb's general attitude seems to be this: qmail is an ncredibly high-performance mail server. All mail server activity should be logged, but when he tried to send this output to standard syslog on various Unix flavors, syslog dropped lines under high load. Therefore syslog is completely broken and should be discarded, and since he suffers from an extreme case of Not-Invented-Here syndrome, his replacement is multilog.

NEW!
We support the SANS Log Management Summit 2006 (Click Here for more information)

(Bonus: Now You can download Tbird's SANS Webcast Slides)

The output of qmail (or of tcpserver, his inbound SMTP or other socket listener, or of any program) is piped into standard input of multilog. multilog takes a number of arguments on the command line which tell it where to put the logs, whether to timestamp each line, how big one log file should get (bytes, not length of time) before rotating and starting a new file, etc. The current log is named "current", and when it fills up, it gets renamed to "bignumerictimestamp.s", where all the timestamps are in an extremely precise and long-ranged format called TAI64 (precise down to the picosecond or something, range out to the age of the universe). When you're reading a log back out, he has a utility called tai64nlocal through which you can pipe the lines, and it will manufacture a date/time in local format and timezone. If multilog terminates cleanly, it renames the "current" file to another name, and when it starts up again, it names it back to "current", so you know nothing funny happened. If the machine crashes or multilog is otherwise killed harshly, then when it starts up again "current" already exists, and it renames that file to a ".u" file to flag it as possibly incomplete or corrupted, then begins a new "current". After a while, then, a log directory will look like...

-rw-r--r-- 1 root root 572455 Feb 14 2003 @400000003e4d69602cfb6f2c.u
-rw-r--r-- 1 root root 3454 Feb 15 2003 @400000003e512fd21b7d8b54.u
-rwxr--r-- 1 root root 998025 Mar 10 20:55 @400000003e6d422a21b5f77c.s
-rwxr--r-- 1 root root 998110 Mar 10 21:02 @400000003e6d43b6224da39c.s
-rwxr--r-- 1 root root 998035 Mar 10 21:09 @400000003e6d456b0674a814.s
-rwxr--r-- 1 root root 998088 Mar 10 21:16 @400000003e6d4717022821b4.s
-rwxr--r-- 1 root root 998004 Mar 10 21:23 @400000003e6d48b010a48944.s
-rwxr--r-- 1 root root 998039 Mar 10 21:30 @400000003e6d4a470f609f64.s
-rwxr--r-- 1 root root 998020 Mar 10 21:37 @400000003e6d4bd8023e0efc.s
-rwxr--r-- 1 root root 998088 Mar 10 21:43 @400000003e6d4d5b1b89b0b4.s
-rwxr--r-- 1 root root 998086 Mar 10 21:50 @400000003e6d4ee323f53d04.s
-rwxr--r-- 1 root root 998019 Mar 10 21:56 @400000003e6d506a0f36bcbc.s
-rwxr--r-- 1 root root 998081 Mar 10 22:03 @400000003e6d52010319cce4.s
-rw-r--r-- 1 root root 323250 Mar 11 15:54 current
-rw------- 1 root root 0 Feb 14 15:22 lock
-rw-r--r-- 1 root root 0 Feb 17 13:54 state

(Note that the finished files are ".s" and have the execute bit set, while the unexpectedly-cut-off ones are ".u" and do not have x set.) A file will contain lines like...

@400000003e6e4cff0ece1f7c tcpserver: status: 1/400
@400000003e6e4cff0ed0790c tcpserver: pid 5171 from 127.0.0.1
@400000003e6e4cff0ed658f4 tcpserver: ok 5171 localhost:127.0.0.1:25 localhost:127.0.0.1::39480
@400000003e6e4cff118c4194 tcpserver: end 5171 status 0
@400000003e6e4cff118c4d4c tcpserver: status: 0/400

...which when piped through tai64nlocal in a shell set for the Eastern U.S. time zone, produces...

2003-03-11 15:54:13.248389500 tcpserver: status: 1/400
2003-03-11 15:54:13.248543500 tcpserver: pid 5171 from 127.0.0.1
2003-03-11 15:54:13.248928500 tcpserver: ok 5171 localhost:127.0.0.1:25 localhost:127.0.0.1::39480
2003-03-11 15:54:13.294404500 tcpserver: end 5171 status 0
2003-03-11 15:54:13.294407500 tcpserver: status: 0/400

Since (in his opinion) there's no standardized and good-quality way to maintain services that are easily started, signaled, stopped, and restarted automatically when they crash, he also wrote the "supervise" program to do these jobs. If you make a "/service/qmail" directory for supervising qmail itself, plus a "/service/qmail/log" subdirectory for the logging service that's associated with it, they will be both started nicely together and the output of one will be automatically piped into the other, and if either one crashes it will be restarted and the pipes reattached without text loss, so you won't have a situation where the service continues but the logs are lost forever. So djb's whole worldview fits together nicely, if you buy into it.

Main differences from standard logging:

  • There is absolutely no option for remote logging or permanent database archival; all logging goes to flat files in a local disk filesystem. This is what allows him to be so sure of not dropping lines, assuming the local filesystem has atomic renaming of files, journaling or other crash-protection features, etc. What happens to the log entries after multilog is done with them is the sysadmin's problem, although if all you want is for the oldest logs to be thrown away, that's built in.
  • Because of hard-coded filenames and other features, two copies of multilog may not log into the same directory. You can't produce files /var/log/somelog and /var/log/anotherlog from two different processes; you have to collect the logs in subdirectories, /var/log/somelog/* and /var/log/anotherlog/*.
  • The currently-appended log is always named "current" (so a tail -f current always gets the right one), but then it does not get renamed "current.0" -- it doesn't use the "roll" naming convention. Old logs are renamed with a TAI64 timestamp indicating when they were cut off, so a file's name is basically coordinated with its place in the chronological order of total logs. This implies that an ordinary alphanumeric sort of the filenames of the logs is sufficient to concatenate a bunch of them in the correct order (you don't have to merge and sort the possibly gigabytes of text lines within the files), and if you're looking for a specific time or range of times in a set of log files, you can convert the times in which you're interested into TAI64 format and "dive into" the set of logs without needlessly over-grepping.
  • Normally the only thing multilog does upon cutting off a chunk of log is to rename it, but if you want some batch process to go through each file after it's cut off, you can pass a program to the command line arguments. Note that it's not processing in real time, only after each chunk is done, although if you want to tell multilog to cut off a chunk right now (for instance, to analyze recent suspicious activity) you can send it a SIGALRM. (If you're using supervise, you can do "svc -a /service/qmail/log", so you need not go hunting for the PID of the multilog process.)

djb also has an incredibly terse, blunt, unsympathetic, and one even might say hard-headed style of documentation for man pages and INSTALL documents, which you'll see if you poke around his site a little. His apparent rationale is this: Here I have presented "just the facts, ma'am", precisely those facts and steps which should be necessary and sufficient for you to accomplish the task you're attempting with my software. If you can't follow directions precisely, then you will fail and you deserve to fail; just ask someone else more competent to help you. If there aren't any usage examples for a program, that's because you should be able to completely digest the man page of a program you've never seen before, juggle all the command-line option combinations in your head, and come up with the exact configuration that will do what you want, and if you can't, then again you probably need to consult an oracle or a more experienced admin." With that said, then, my simple logging for my qmail server sends its output to "multilog t s1000000 n2000 /var/log/qmail", which means to collect logs in /var/log/qmail/*, timestamp each line, keep up to 1MB in each file, and keep up to 2000 files before throwing away the oldest. (Therefore this will retain about the most recent 2GB of logs.) If you use the above "supervise" directory structure, the upper-level service "run" script starts the program sending output to normal standard out (probably with the clever "exec 2>&1" shell trick to collapse standard error into standard out as well), and then the "run" script for the log subdirectory is simply something like...

#!/bin/sh
exec setuidgid qmaill multilog t s1000000 n2000 /var/log/qmail

This uses his setuidgid program to demote the logging process into running non-root, so the /var/log/qmail directory must be writable by the ordinary user qmail.

Also, I guess djb knows that some people are accustomed to syslog and won't instantly switch from it, so he includes a utility with qmail called "splogger" that simply takes lines from standard-in and sends them to syslog.

[Optional]
[Optional]