Why are Ruby blocks constructed like this?

There is an obvious choice here, and it’s not this one

I took up a position with a new company that involves the use of SaltStack. Since I’d never used it before, I decided to try my hand at setting up a proof of concept, and for that I wanted to use multiple machines. So I decided to use Vagrant, another tool I’m not familiar with, to declaratively spin up a few VMs.

This is what I first tried:

machines = {
  master:   { box: "bento/ubuntu-22.04", master: true  },
  superior: { box: "bento/ubuntu-22.04", master: false },
  michigan: { box: "bento/ubuntu-22.04", master: false },
  ontario:  { box: "bento/ubuntu-22.04", master: false },
  erie:     { box: "bento/ubuntu-22.04", master: false },
  huron:    { box: "bento/ubuntu-22.04", master: false }
}

Vagrant.configure("2") do |config|
  for hostname, options in machines
    config.vm.define hostname do |machine|
      machine.vm.box = options[:box]
      machine.vm.hostname = hostname
      machine.vm.network "private_network", type: "dhcp"
      machine.vm.provision :salt, install_master: options[:master]
    end
  end
end

This magically worked, or so I thought, until many failed attempts at automatically settings up DNS resolution for the machine’s friendly names. I noticed that the hostnames of my machines were sometimes all huron or all michigan or something other than master, even though the name of the machine on the Vagrant side was correct. Then it dawned on me: it’s using whatever the last value of options in the loop was!

This will become clearer with the simple examples I wrote to verify that this was indeed what was happening. We’ll start with a naive example that actually works as expected:

def simply_yield
  yield
end

simply_yield do
  for i in 1..10
    simply_yield do
      puts i
    end
  end
end

This prints the numbers 1 through 10, as expected, and not 10 ten times. This makes sense, following the line of execution, assuming blocks are immediately executed after the method to which they were passed, which they are in this case. So it must be that the blocks passed to config.vm.define in Vagrant are not!

That makes sense, as provisioners can be executed separately from bringing the machines up. It stands to reason that the block themselves, as closures, are stored in state somewhere, so they can then be invoked. Let’s write an example that does exactly that:

$functions = []

def store_for_later(&block)
  $functions << block
end

for i in 1..10
  store_for_later do
    puts i
  end
end

$functions.each do |f|
  f.call
end

If you run this, you’ll see that it prints 10 ten times. We’ve found the problem! In other words, the i in the block / closure that was stored for future use is a reference to the loop variable being updated. This is weird, because in theory the loop variable should be scoped to the loop block and die when the loop ends.

Adding a puts i after the loop to verify just in case shows that it is still accessible. Maybe Ruby suffers from the same problem as Python 2, where variables in list comprehensions and loops leak into the outer scope.

Curious to see if Python 3 would be better behaved, I wrote the following bit of code:

functions = [(lambda: i) for i in range(10)]
# print(i) here yields a NameError, as expected
print([f() for f in functions])

This prints [9, 9, 9, 9, 9, 9, 9, 9, 9, 9], which is not what I expected. Other ways of defining the functions, such as using def, resulted in the same problem. Python suffers from the same problem!

So I guess we should just use sane languages where state isn’t a problem. Like Haskell! The following just works.

functions :: [Int -> Int]
functions = [const i | i <- [1..10]]
-- can also be defined as map const [1..10]

main :: IO ()
main = mapM_ (print . ($ 0)) functions