I'm pretty impressed with Splunk, especially version 4. Pretty graphs, alerting (Enterprise only), and fast, accurate, searching. It's a great product.

However, the cost just way too high to consider for full production use for our company. All we really need is to be able to index different logs in a central place, and have reasonable searching on that. Having alerts based on a saved search is also really nice. We don't really go beyond that.

In fact, our biggest usage has been in deploying new applications. Everything gets logged via log4net to either the Event log on Windows or a text file on Linux. Splunk makes it pretty easy to quickly search across those to make sure all the parts of the app are working ok -- that's saved us tons of time versus hunting down individual logging sources.

What alternatives exist in this market? I have a sinking feeling Splunk's pricing is so high because they have the best product by far, and they know it. We want the server to run on Windows.

I'd be open to a split model, using one product for general logs (collect via syslog/Snare), and a dedicated product for our custom apps (like Log4Net Dashboard).

Would using a simple syslog server such as Kiwi, sent to SQL Server (perhaps with fulltext enabled) work?

I'd hope the cost should be well under 5 figures, USD. (And yes, I know, we're cheap. We're a startup with little money, and BizSpark takes care of all our MS licensing.)

Edit: I should add, we have about 10 physical servers, 20 VMs, and a couple firewalls and switches. 90% is Windows.


Solution 1:

Note : This is all regarding Linux and free software, as that's what I mostly use, but you should be fine with a syslog client on Windows to send the logs to a Linux syslog server.

Logging to an SQL server: With only ~30 machines, you should be fine with pretty much any centralised syslog-alike and an SQL backend. I use syslog-ng and MySQL on Linux for this very thing.

Pretty frontends for graphing are the main problem -- It seems that there is a lot of hacked-up front-ends which will grab items from the logs and show how many hits, alerts etc but I've not found anything integrated and clean. Admittedly this is the main thing that you're looking for... (If I find anything good then I'll update this section!)

Alerting: I use SEC on a Linux server to find bad things happening in the logs and alert me via various methods. It's incredibly flexible and not as clicky as Splunk. There's a nice tutorial here which guides through a lot of the possible features.

I also use Nagios for graphs of various stats and some alerting which I don't get from the logs (such as when services are down etc). This can be easily customized to add graphs of anything you like. I have added graphs of items such as the number of hits made to an http server, by having the agent use the check_logfiles plugin to count the number of hits in the logs (it saves the position it gets up to for each check period).

Overall, it depends on how much your time will cost to set this up, as there are many options which you can use but they aren't as integrated as Splunk and will probably require more effort to get doing what you want. The Nagios graphs are straightforward to set up but don't give you historical data from before you add the graph, whereas with Splunk (and presumably other front-ends) you can look back at the past logs and graph things you've only just thought of to look at from them.

Note also that the SQL database format and indexing will have a huge effect on the speed of queries, so your idea of fulltext indexing will make a tremendous increase to the speed of searches. I'm not sure if MySQL or PostgreSQL will do something similar.

Edit : MySQL will do fulltext indexing, but only on MyISAM tables prior to MySQL 5.6. In 5.6 Support was added for InnoDB.

Edit: Postgresql can do full text search of course: http://www.postgresql.org/docs/9.0/static/textsearch.html

Solution 2:

More aimed at *nix than windows, but octopussy does support windows, and seems to aim at the same kind of thing as splunk.

Solution 3:

I'm in the middle of trying out a number of monitoring solutions - but I want to mainly monitor windows. Most of the systems are geared to SNMP monitoring which manage to pull out a remarkable amount of info without agents.

These are some of the systems I've tried so far:

Nagios - Open source. A pig to configure but highly rated and seems very flexible. It seems to be essentially a counter recorder and does not allow for remote script execution and so cannot be used to pick up on configuration problems, ala MS system center or Kaseya. Agentless but is essentially useless without the NSclient tool installed on each client.

Cacti - Pretty and straightforward graphing tool based on pulling snmp stats. Agentless.

OpsView - Based on Nagios but easier to configure and has a better front end.

HypericHQ - Easy to get up and running under Windows. The base version is free and does plenty. There is a commercial HypericHQ enterprise. Agent has to be installed on each client.

Zabbix - Another nice monitoring tool. Its easier to use than nagios. Has an agent you can install on windows and client machines. I've only explored this one a bit so far.

Zenoss - Open source. I have been very impressed by how professional Zenoss is. Its an SNMP based monitor and has loads of extensions to permit the monitoring of HP proliants, windows services, ms sql server, mysql. The extensions all work via SNMP so nothing needs to be installed on the client machines. I haven't explored it all yet and there appears to be much functionality which I have yet to exploit. Its based on Zope so unless you are up to speed on Zope installs I'd recommend downloading the pre-prepared VM - it works like a dream straight out of the box.

On the commercial front you could take a look at a few tools:

Kaseya - costs about 6k per year for 250 nodes , if I remember correctly, but is a superb tool and has a very active user community. Its aimed at the msp market and allows monitoring of multiple companies systems. It can be used internally without problems.

GFI Hounddog - simpler than Kaseya but very cheap at the moment. Definitely worth a look.

There are a number of solutions out there sold as MSP systems but which are essentially monitors + remote admin combined.

Ian