Chef recipe order of execution redux

Given the following recipe:

ruby_block "block1" do
    block do
        puts "in block1"
    end
    action :create
end


remote_file "/tmp/foo" do
    puts "in remote_file"
    source "https://yahoo.com"
end

I'd expect the ruby_block to run first (because it comes first) and then the remote_file.

I'd like to use the ruby_block to determine the url for the remote_file to download from, so the order is important.

If it wasn't for my puts() statements I'd assume that these are getting run in the expected order, because the log says:

==> default: [2014-06-12T17:49:19+00:00] INFO: ruby_block[block1] called
==> default: [2014-06-12T17:49:19+00:00] INFO: remote_file[/tmp/foo] created file /tmp/foo
==> default: [2014-06-12T17:49:20+00:00] INFO: remote_file[/tmp/foo] updated file contents /tmp/foo

But above that, my puts() statements come out as follows:

==> default: in remote_file
==> default: in block1

If you think that the resources are being run in the expected order, consider this recipe:

ruby_block "block1" do
    block do
        node.default['test'] = {}
        node.default['test']['foo'] ='https://google.com'
        puts "in block1"
    end
    action :create
end


remote_file "/tmp/foo" do
    puts "in remote_file"
    source node.default['test']['foo']
end

This one fails as follows:

==> default: [2014-06-12T17:55:38+00:00] ERROR: {} is not a valid `source` parameter for remote_file. `source` must be an absolute URI or an array of URIs.
==> default: [2014-06-12T17:55:38+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

The string "in block1" doesn't appear in the output, so the ruby_block was never run.

So the question is, how can I force the ruby_block to run, and run first?


Good question - both of your examples work the way that I would expect, but it isn't immediately obvious why.

As StephenKing wrote in his response, the first thing to understand is that recipes are compiled (to produce a set of resources), and then resources are converged (to effect changes to your system). These two phases are often interleaved - some of your resources might be converged before Chef has finished compiling all of your recipes. Erik Hollensbe covers this in some detail in his post "The Chef Resource Run Queue".


Here's your first example again:

ruby_block "block1" do
    block do
        puts "in block1"
    end
    action :create
end    

remote_file "/tmp/foo" do
    puts "in remote_file"
    source "https://yahoo.com"
end

These are the steps that Chef will go through in processing that example.

  1. First, the ruby_block declaration is compiled, which results in a resource called ruby_block[block1] being added to the resource collection. The contents of the block (the first puts statement) don't run yet - it is saved to be run when this resource is converged.
  2. Next, the remote_file declaration is compiled. This results in a resource called remote_file[/tmp/foo/] being added to the resource collection, with a source of "https://yahoo.com". In the process of compiling this declaration, the second puts statement will be executed - this has side effect of printing "in remote_file", but it doesn't affect the resource that that is put into the resource collection.
  3. With nothing else to compile, Chef starts converging the resources in the resource collection. The first one is ruby_block[block1], and Chef runs the ruby code in the block - printing "in block1". After it finishes running the block, it logs a message to say that the resource was called.
  4. Finally, Chef converges remote_file[/tmp/foo]. Again, it logs a message (or two) associated with that activity.

That should produce the following sequence of output:

  1. Nothing printed when the ruby_block is compiled.
  2. "in remote_file" will be printed while the remote_file is compiled.
  3. "in block1" will be printed while the ruby_block is converged.
  4. A Chef log message will be printed after the ruby_block is converged.
  5. Other Chef logs messages will be printed during/after the remote_file is converged.

Onto your second example:

ruby_block "block1" do
    block do
        node.default['test'] = {}
        node.default['test']['foo'] ='https://google.com'
        puts "in block1"
    end
    action :create
end

remote_file "/tmp/foo" do
    puts "in remote_file"
    source node.default['test']['foo']
end

As with the first example, we don't expect anything to be printed while the ruby_block is compiled - the whole "block" is saved, and its contents won't run until that resource is converged.

The first output we see is "in remote_file", as the puts statement is executed when Chef compiles the remote_file resource. On the next line, we set the source parameter to the value of node.default['test']['foo'], which is apparently {}. That's not a valid value for source, so the Chef run terminates at that point - before the code in the ruby_block ever runs.

Therefore, the expected output of this recipe is:

  1. No output while compiling the ruby_block
  2. "in remote_file" printed while compiling the remote_file
  3. An error due to the invalid source parameter

Hopefully that helps you to understand the behaviour you're seeing, but we still have a problem to solve.

Although you asked "how can I force the ruby_block to run first?", your comment to StephenKing suggests this isn't really what you want - if you really wanted that block to run first, you could put it directly into your recipe code. Alternatively, you could use the .run_action() method to force the resource to be converged as soon as it is compiled - but you say that there are still more resources that need to converge before the ruby_block can be useful.

As we've seen above, resources aren't "run", they're first "compiled" and then "converged". With that in mind, what you need is for the the remote_file resource to use some data that is not known when it is compiled, but will be known when it is converged. In other words, something like the "block" parameter in the ruby_block - a piece of code that doesn't run until later. Something like this:

remote_file "/tmp/foo" do
    puts "in remote_file"
    # this syntax isn't valid...
    source do 
        node.default['test']['foo']
    end
end

Fortunately, such a thing does exist - it's called Lazy Attribute Evaluation. Using that feature, your second example would look like this:

ruby_block "block1" do
    block do
        node.default['test'] = {}
        node.default['test']['foo'] = 'https://google.com'
        puts "in block1"
    end
    action :create
end

remote_file "/tmp/foo" do
    puts "in remote_file"
    source lazy { node['test']['foo'] }
end

And the expected output of this recipe?

  1. No output while compiling the ruby_block
  2. "in remote_file" printed while compiling the remote_file
  3. "in block1" printed while converging the ruby_block
  4. Chef Log message showing the ruby_block was converged
  5. Chef Log messages showing the remote_file was converged