Ruby hash default value behavior

Solution 1:

The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.

The question is not whether Arrays are mutable, the question is whether you mutate it.

You can get both the behaviors you see above, just by using Arrays. Observe:

One default Array with mutation

hsh = Hash.new([])

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value

hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!

One default Array without mutation

hsh = Hash.new([])

hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']

hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array

hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.

A new, different Array every time with mutation

hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.

hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!


hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.

hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }

Solution 2:

It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):

  1. get object referenced by i
  2. get it internal value (lets name it raw_tmp)
  3. create new object that internal value is raw_tmp + 1
  4. assign reference to created object to i

So as you can see, we created new object, and i reference now to something different than at the beginning.

In the other hand, when we use Array#<< it works that way:

  1. get object referenced by arr
  2. to it's internal state append given element

So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).