Performance of Arrays and Hashes in Ruby

Solution 1:

Hashes are much faster for lookups:

require 'benchmark'
Document = Struct.new(:id,:a,:b,:c)
documents_a = []
documents_h = {}
1.upto(10_000) do |n|
  d = Document.new(n)
  documents_a << d
  documents_h[d.id] = d
end
searchlist = Array.new(1000){ rand(10_000)+1 }

Benchmark.bm(10) do |x|
  x.report('array'){searchlist.each{|el| documents_a.any?{|d| d.id == el}} }
  x.report('hash'){searchlist.each{|el| documents_h.has_key?(el)} }
end

#                user     system      total        real
#array       2.240000   0.020000   2.260000 (  2.370452)
#hash        0.000000   0.000000   0.000000 (  0.000695)

Solution 2:

When using unique values, you can use the Ruby Set which has been previously mentioned. Here are benchmark results. It's slightly slower than the hash though.

                 user     system      total        real
array        0.460000   0.000000   0.460000 (  0.460666)
hash         0.000000   0.000000   0.000000 (  0.000219)
set          0.000000   0.000000   0.000000 (  0.000273)

I simply added to @steenslag's code which can be found here https://gist.github.com/rsiddle/a87df54191b6b9dfe7c9.

I used ruby 2.1.1p76 for this test.

Solution 3:

Ruby has a set class in its standard library, have you considering keeping an (additional) set of IDs only?

http://stdlib.rubyonrails.org/libdoc/set/rdoc/index.html

To quote the docs: "This is a hybrid of Array’s intuitive inter-operation facilities and Hash’s fast lookup".