How did you implement log management on your servers?

Solution 1:

I've got about 30 servers, and I just use straight up syslog to send all the logs to a single logging server. For backup, all of the machines are also configured to store their own logs locally for a few days, using logrotate to take care of the rotation and deletion of old logs.

Each of my application servers runs a small perl script to send their logs to syslog, which then forwards on to the loghost (perl script below).

Then on the loghost we have some custom scripts that are similar to logcheck that basically watch the incoming logs for anything suspicious.

We also have all of the email from every host going to one place, so that if any program complains that way, we get all the messages. This could theoretically go to a single mailbox that a program could act on and analyze.

Here is my logging perl script. It works by piping the program's output into it, and then it syslogs the output and spits it back out so you can send it elsewhere (I send to multilog). You can also give it the -q option to just go to syslog.

#!/usr/bin/perl

use Sys::Syslog;
use Getopt::Long;

$SERVER_NAME = `hostname`;
chomp $SERVER_NAME;
$FACILITY = 'local0';
$PRIORITY = 'info';

GetOptions ('s=s' => \$SERVER_NAME, 'f=s' => \$FACILITY, 'p=s' => \$PRIORITY, 'q+' => \$quiet);

#print "$SERVER_NAME\n$FACILITY\n$PRIORITY\n";

#Sys::Syslog::setlogsock('unix');
openlog ($SERVER_NAME,'ndelay',$FACILITY);

if (!($quiet)) {syslog($PRIORITY,"Logging Started -- Logger version 1.1");}

$| = 1;

while (<>) {
    if (!($quiet)) {print $_ unless $_ =~ /^\s+$/};
    chomp;
    syslog($PRIORITY,$_) if $_;
}

closelog;

$| = 0;

Solution 2:

Although I haven't implemented it yet, I'm planning on moving all of my log-generating machines to rsyslog, and implementing a bastion-type server which will function as the collector of syslogs. From there, I think the free version of Splunk can do everything I need to pull out information.

Now just to implement it...

Solution 3:

I use a central syslog host. Each edge system sends *.debug to the central loghost. The central syslog host runs syslog-ng, and has rules to split logs so that each machine generates its own files named for that day. It also dumps everything into a single file, against which I run a descendant of logcheck.sh.

Once a day I run a log compacter, which zips up any logs older than 7 days, and deletes anything older than 28 days. Between the two, it gives logs an expected life of 35 days on the server, which means that all logs should make it to monthly backups, where they can be recovered for up to two years.

It's storage-intense, but seems to be the best way to assure coverage.