es6 Map and Set complexity, v8 implementation
Is it a fair assumption that in v8 implementation retrieval / lookup is O(1)?
(I know that the standard doesn't guarantee that)
Is it a fair assumption that in v8 implementation retrieval / lookup is O(1)?
Yes. V8 uses a variant of hash tables that generally have O(1)
complexity for these operations.
For details, you might want to have a look at https://codereview.chromium.org/220293002/ where OrderedHashTable
is implemented based on https://wiki.mozilla.org/User:Jorend/Deterministic_hash_tables.
For people don't want to dig into the rabbit hole too deep:
1: We can assume good hash table implementations have practically O(1) time complexity.
2: Here is a blog posted by V8 team explains how some memory optimization was done on its hashtable implementation for Map
, Set
, WeakSet
, and WeakMap
: Optimizing hash tables: hiding the hash code
Based on 1 and 2: V8's Set and Map's get
& set
& add
& has
time complexity practically is O(1).
let map = new Map();
let obj = {};
const benchMarkMapSet = size => {
console.time("benchMarkMapSet");
for (let i = 0; i < size; i++) {
map.set(i, i);
}
console.timeEnd("benchMarkMapSet");
};
const benchMarkMapGet = size => {
console.time("benchMarkMapGet");
for (let i = 0; i < size; i++) {
let x = map.get(i);
}
console.timeEnd("benchMarkMapGet");
};
const benchMarkObjSet = size => {
console.time("benchMarkObjSet");
for (let i = 0; i < size; i++) {
obj[i] = i;
}
console.timeEnd("benchMarkObjSet");
};
const benchMarkObjGet = size => {
console.time("benchMarkObjGet");
for (let i = 0; i < size; i++) {
let x = obj[i];
}
console.timeEnd("benchMarkObjGet");
};
let size = 2e6;
benchMarkMapSet(size);
benchMarkObjSet(size);
benchMarkMapGet(size);
benchMarkObjGet(size);
benchMarkMapSet: 382.935ms
benchMarkObjSet: 76.077ms
benchMarkMapGet: 125.478ms
benchMarkObjGet: 2.764ms
benchMarkMapSet: 373.172ms
benchMarkObjSet: 77.192ms
benchMarkMapGet: 123.035ms
benchMarkObjGet: 2.638ms
The original question is already answered, but O(1) doesn't tell a lot about how efficient the implementation is.
First of all, we need to understand what variation of hash table is used for Maps. "Classical" hash tables won't do as they do not provide any iteration order guarantees, while ES6 spec requires insertion to be in the iteration order. So, Maps in V8 are built on top of so-called deterministic hash tables. The idea is similar to the classical algorithm, but there is another layer of indirection for buckets and all entries are inserted and stored in a contiguous array of a fixed size. Deterministic hash tables algorithm indeed guarantees O(1) time complexity for basic operations, like set
or get
.
Next, we need to know what's the initial size of the hash table, the load factor, and how (and when) it grows/shrinks. The short answer is: initial size is 4, load factor is 2, the table (i.e. the Map) grows x2 as soon as its capacity is reached, and shrinks as soon as there is more than 1/2 of deleted entries.
Let’s consider the worst-case when the table has N out of N entries (it's full), all entries belong to a single bucket, and the required entry is located at the tail. In such a scenario, a lookup requires N moves through the chain elements.
On the other hand, in the best possible scenario when the table is full, but each bucket has 2 entries, a lookup will require up to 2 moves.
It is a well-known fact that while individual operations in hash tables are “cheap”, rehashing is not. Rehashing has O(N) time complexity and requires allocation of the new hash table on the heap. Moreover, rehashing is performed as a part of insertion or deletion operations, when necessary. So, for instance, a map.set()
call could be more expensive than you would expect. Luckily, rehashing is a relatively infrequent operation.
Apart from that, such details as memory layout or hash function also matter, but I'm not going to go into these details here. If you're curious how V8 Maps work under the hood, you may find more details here. I was interested in the topic some time ago and tried to share my findings in a readable manner.