How can I deploy rolling OS upgrades & reboots with Puppet or MCollective?

I'm looking for the best way to perform regular rolling upgrades for my infrastructure.

Typically, this involves doing this on each host, one at a time:

sudo yum update -y && sudo reboot

But, I'm hitting limits of that being a scalable.

I want to only reboot one node at a time within each of my roles, so that, say, I don't take down all of my load balancers, or DB cluster members, at the same time.

Ideally, I'd wanna do something like:

for role in $(< roles_list.txt) ; do
    mco package update_all_and_reboot \
        --batch 1 --batch-sleep 90 \
        -C $role -F environment=test
done

But, that doesn't quite seem to exist. I'm not sure if using the "shell" agent is the best approach, either?

mco shell run 'yum update -y && reboot' \
    --batch 1 --batch-sleep 90

Am I just looking at the wrong sort of tool for this job, though? Is there something better for managing these sort of rolling reboots, but that I can somehow link up with my Puppet-assigned roles, so that I can be comfortable that I'm not taking down anything important all at once, but that I can still do some parallel updates & reboots?


Configuration

Deploy

cd /usr/share/ruby/vendor_ruby/mcollective/application
wget https://raw.githubusercontent.com/arnobroekhof/mcollective-plugin-power/master/application/power.rb

and

cd /usr/libexec/mcollective/mcollective/agent
wget https://raw.githubusercontent.com/arnobroekhof/mcollective-plugin-power/master/agent/power.ddl
wget https://raw.githubusercontent.com/arnobroekhof/mcollective-plugin-power/master/agent/power.rb

on both hosts, i.e. test-server1 and test-server2.

Services

Restart mcollective on both services:

[vagrant@test-server1 ~]# sudo service mcollective restart

and

[vagrant@test-server2 ~]# sudo service mcollective restart

Commands

Run the following commands on the mcollective server node:

The host test-server2 is listening:

[vagrant@test-server1 ~]$ mco ping
test-server2                             time=25.32 ms
test-server1                             time=62.51 ms


---- ping statistics ----
2 replies max: 62.51 min: 25.32 avg: 43.91

Reboot the test-server2:

[vagrant@test-server1 ~]$ mco power reboot -I test-server2

 * [ ============================================================> ] 1 / 1

test-server2                             Reboot initiated

Finished processing 1 / 1 hosts in 123.94 ms

The test-server2 is rebooting:

[vagrant@test-server1 ~]$ mco ping
test-server1                             time=13.87 ms


---- ping statistics ----
1 replies max: 13.87 min: 13.87 avg: 13.87

and it has been rebooted:

[vagrant@test-server1 ~]$ mco ping
test-server1                             time=22.88 ms
test-server2                             time=54.27 ms


---- ping statistics ----
2 replies max: 54.27 min: 22.88 avg: 38.57

Note that it is possible to shutdown a host as well:

[vagrant@test-server1 ~]$ mco power shutdown -I test-server2

 * [ ============================================================> ] 1 / 1

test-server2                             Shutdown initiated

Finished processing 1 / 1 hosts in 213.18 ms

Original code

/usr/libexec/mcollective/mcollective/agent/power.rb

module MCollective
  module Agent
    class Power<RPC::Agent

      action "shutdown" do
  out = ""
  run("/sbin/shutdown -h now", :stdout => out, :chomp => true )
  reply[:output] = "Shutdown initiated"
      end

      action "reboot" do
  out = ""
  run("/sbin/shutdown -r now", :stdout => out, :chomp => true )
  reply[:output] = "Reboot initiated"
      end

    end
  end
end

# vi:tabstop=2:expandtab:ai:filetype=ruby

/usr/libexec/mcollective/mcollective/agent/power.ddl

metadata    :name        => "power",
            :description => "An agent that can shutdown or reboot them system",
            :author      => "A.Broekhof",
            :license     => "Apache 2",
            :version     => "2.1",
            :url         => "http://github.com/arnobroekhof/mcollective-plugins/wiki",
            :timeout     => 5

action "reboot", :description => "Reboots the system" do
    display :always

    output :output,
           :description => "Reboot the system",
           :display_as => "Power"
end

action "shutdown", :description => "Shutdown the system" do
    display :always

    output :output,
           :description => "Shutdown the system",
           :display_as  => "Power"
end

/usr/share/ruby/vendor_ruby/mcollective/application/power.rb

class MCollective::Application::Power<MCollective::Application
  description "Linux Power broker"
  usage "power [reboot|shutdown]"

  def post_option_parser(configuration)
    if ARGV.size == 1
      configuration[:command] = ARGV.shift
    end
  end

  def validate_configuration(configuration)
    raise "Command should be one of reboot or shutdown" unless configuration[:command] =~ /^shutdown|reboot$/

  end

  def main
    mc = rpcclient("power")

    mc.discover :verbose => true
    mc.send(configuration[:command]).each do |node|
      case configuration[:command]
      when "reboot"
        printf("%-40s %s\n", node[:sender], node[:data][:output])
      when "shutdown"
        printf("%-40s %s\n", node[:sender], node[:data][:output])
      end 
    end

    printrpcstats

    mc.disconnect

  end

end

# vi:tabstop=2:expandtab:ai

Modified code

/usr/libexec/mcollective/mcollective/agent/power.ddl

metadata    :name        => "power",
            :description => "An agent that can shutdown or reboot them system",
            :author      => "A.Broekhof",
            :license     => "Apache 2",
            :version     => "2.1",
            :url         => "http://github.com/arnobroekhof/mcollective-plugins/wiki",
            :timeout     => 5

action "update-and-reboot", :description => "Reboots the system" do
    display :always

    output :output,
           :description => "Reboot the system",
           :display_as => "Power"
end

/usr/libexec/mcollective/mcollective/agent/power.rb

module MCollective
  module Agent
    class Power<RPC::Agent    
      action "update-and-reboot" do
        out = ""
        run("yum update -y && /sbin/shutdown -r now", :stdout => out, :chomp => true )
        reply[:output] = "Reboot initiated"
      end
    end
  end
end

# vi:tabstop=2:expandtab:ai:filetype=ruby

Command

[vagrant@test-server1 ~]$ mco power update-and-reboot -I test-server2

 * [ ============================================================> ] 1 / 1


Finished processing 1 / 1 hosts in 1001.22 ms