Performance of Arrays and Hashes in Ruby
Solution 1:
Hashes are much faster for lookups:
require 'benchmark'
Document = Struct.new(:id,:a,:b,:c)
documents_a = []
documents_h = {}
1.upto(10_000) do |n|
d = Document.new(n)
documents_a << d
documents_h[d.id] = d
end
searchlist = Array.new(1000){ rand(10_000)+1 }
Benchmark.bm(10) do |x|
x.report('array'){searchlist.each{|el| documents_a.any?{|d| d.id == el}} }
x.report('hash'){searchlist.each{|el| documents_h.has_key?(el)} }
end
# user system total real
#array 2.240000 0.020000 2.260000 ( 2.370452)
#hash 0.000000 0.000000 0.000000 ( 0.000695)
Solution 2:
When using unique values, you can use the Ruby Set which has been previously mentioned. Here are benchmark results. It's slightly slower than the hash though.
user system total real
array 0.460000 0.000000 0.460000 ( 0.460666)
hash 0.000000 0.000000 0.000000 ( 0.000219)
set 0.000000 0.000000 0.000000 ( 0.000273)
I simply added to @steenslag's code which can be found here https://gist.github.com/rsiddle/a87df54191b6b9dfe7c9.
I used ruby 2.1.1p76 for this test.
Solution 3:
Ruby has a set class in its standard library, have you considering keeping an (additional) set of IDs only?
http://stdlib.rubyonrails.org/libdoc/set/rdoc/index.html
To quote the docs: "This is a hybrid of Array’s intuitive inter-operation facilities and Hash’s fast lookup".