Does anyone use check_mk for Nagios? Anything I should be aware of before considering it?
http://mathias-kettner.de/check_mk.html
I've been testing it out on a couple of development machines and it seems pretty nifty. I cannot however find much information on deployments of it. Does anyone run this actively? Did anyone rule this out as an option for some reason?
Solution 1:
Disclaimer: I used to work on that project because I felt it's extremely powerful. (and i still think so)
I use it since 2009ish and have except for legacy setups never touched a "normal" (one may say legacy) Nagios setup again. It would feel like a waste of time.
The largest setup I know of is ~1200 monitoring servers. (not: monitored servers) That one is also published, but the original question predates it.
It's now being used in quite many places that weren't happy with plain nagios as opposed to larger scale NMS like OpenView and changed their minds.
The key difference is not scalability (as 37signals seem to quite much enjoy), or the autodetection of monitorable things in a remote system which makes it all a nobrainer and even alerts you if something new is added but not being monitored.
No, the really big thing in the long run is the configuration, which is strictly rule based (and written out as python). A few 100 lines of Check_MK config are enough to let it generate 200K lines of old boring nagios syntax you'll never look back to.
- It also has a web-based config editor. With inheritance. And validation.
- The GUI is, among other stuff, optimized for WAN links. And it's actually a full web framework, which is why there's also dashboards and a log classification engine that can take in syslog or snmp for Nagios processing with flexible rulesets.
- All the checks are written to high quality standards and it shows in time saved for the user.
There's no ponies though.
- People often get confused about the interaction between Check_MK and Nagios, which is not trivial but actually nicely separated: It writes config, Nagios runs with that config and calls Check_MK to monitor systems.
- If someone is not using the graphical config editor "WATO" they're assumed to be on an expert level in Nagios.
- There's no GUI Ops manual! (but: inline help that can be enabled on the fly)
- perfectly working IPv6 support patches have been floating for years and gone nowhere, yet.
There's many more pros and cons to bring up, but I think I already showed both sides quite well. Personally I like the efficiency of Check_MK setups and am really annoyed if I have to work with oldskool Nagios setups. Even if they use nice template frameworks or are commandeered from Puppet it still feels stone-aged and helpless in comparism to me.
Disclaimer: see above ;)
Solution 2:
Does anyone use it? Yes.
37signals (a software company) just posted an overview of how they monitor their systems using nagios, and the major benefits they saw when they started using check_mk. http://37signals.com/svn/posts/3178-nagios-monitoring-performance