I would be interested to hear the serverfault community's experiences with Ksplice in production.

Quick blurb from wikipedia:

Ksplice is a free and open source extension of the Linux kernel which allows system administrators to apply security patches to a running kernel without having to reboot the operating system.

and

Ksplice can, without restarting the kernel, apply any source code patch that only needs to modify the kernel code. Unlike other hot update systems, Ksplice takes as input only a unified diff and the original kernel source code, and it updates the running kernel correctly, with no further human assistance required. Additionally, taking advantage of Ksplice does not require any preparation before the system is originally booted (the running kernel does not need to have been specially compiled, for example). In order to generate an update, Ksplice must determine what code within the kernel has been changed by the source code patch.

So a few questions:

How has the stability been? any odd issues that you have encountered with its 'rebootless live patching' of the kernel? Kernel panics or horror stories?

I have been running it on a few test systems and so far its been working as advertised, but I am interested in what other sysadmins experiences have been with Ksplice before going 'all in' and deploying this on our production servers.

So, anybody using Kspice in production?

update: hmm, not seeing any real activity on this question after a couple of hours (besides some kind upvotes and favs). Maybe to spark some activity I'll also ask a few more questions and see if we can get this discussion going...

"If you are aware of Ksplice, is there a reason you are not using it?"

"Do you feel its still too bleeding edge, unproven or untested?"

"Does Ksplice not fit well within your current patch-management system?"

"Do you hate having systems that have long (and secure) uptimes?" ;-)


Solution 1:

(First, a disclaimer: I work for Ksplice.)

We use it on our own production infrastructure, naturally, but more importantly, so do our our 500+ corporate customers (number as of Dec '10).

One sysadmin asks the same question on a Red Hat Enterprise Linux user mailing list, and is met with a number of answers, a few of which are excerpted below:

We've been running Ksplice in production for a few months on a dozen or so hosts. So far it works as advertised.

and

I have > 500 machines under my control, about 445 of them are connected to uptrack (rhel 4 & 5). We used ksplice to block a few root exploits before we had a chance to reboot machines. Since we are still testing we rolled out the new kernel anyway but I've run for weeks ksplice'd without a problem.

One concern expressed by folks is not about the stability, but rather its integration with existing auditing and monitoring tools:

The only "gotcha" about using ksplice is that there aren't any "ksplice-aware" auditing tools available yet.

As you might expect, this is now an area in which we're investing heavily.

Solution 2:

I heard about Ksplice and at the time I thought that it was a good idea. No down time, No reboot. But then I looked into it a bit further and I became scared to try it.

My reasons for avoiding it are:

  • The Linux kernel is very complex already. Ksplice adds to the complexity. More complexity = more to fail.

  • It will be reckless to experiment with Ksplice on a remote server where failure would cause a long downtime and costly repair.

  • The only benefit in my case would be a higher uptime statistic.

Solution 3:

I've been using Ksplice on my home server (where uptime isn't critical but is a nice-to-have). Haven't had any problems with it at all - occasional updates through Apt to the client, never any problems with the kernel updates themselves, and no (noticeable) instability.

The usual "YMMV" disclaimer applies, though! ;-)

Solution 4:

Ksplice is an open-source kernel extension, but bear in mind that while the software is free and available for anyone to use, it's created specifically by and for a company that does Linux patch management (also called "Ksplice"). Ksplice (the kernel mod) is really only useful if you have ksplice-usable patches for your kernel, which you're probably never going to see unless you have a support contract with Ksplice (the company).

So, while ksplice (the tool) is reasonably mature, that's really only relevant if you're considering using Ksplice (the company) for your patch management.

Solution 5:

Good question. My initial response would be something along the lines of "why do i need this?"

Most probably don't need it. Even in a five-nines setup, "scheduled maintenance" is often a clause in an SLA that allows for this kind of downtime. If you have an HA setup, then switch to the failover, install the kernel on one box, reboot, and repeat on the other. If you can't afford even five minutes of downtime on a box, then you need a failover setup anyway.

While it is a novel technology, I don't see much pragmatic use for it yet. Kernel security updates are necessary, of course, and should be patched ASAP, but how much time/effort/worry does this save you vs simply installing a new kernel and rebooting? What if something goes wrong? How much time have you then lost by re-imaging the system, assuming you are fortunate enough to have a PXE-type recovery option?

Also, as mentioned above, remotely experimenting with a technology like this could be a catastrophe if it goes wrong on multiple servers. In your testing, are you using the exact same hardware as you are in the DC? What plays nice on one machine may not play nice on another.

Just my $0.02.