Why is allocating an array of Union{T, Missing} an order of magnitude slower than an array of T?

Allocating an array of Union{T, Missing} is very expensive in Julia. Is there any workaround it?

julia> @time Vector{Union{Missing, Int}}(undef, 10^7);
  0.031052 seconds (2 allocations: 85.831 MiB)

julia> @time Vector{Union{Int}}(undef, 10^7);
  0.000027 seconds (3 allocations: 76.294 MiB)

Solution 1:

Because if you make a Union of Missing with a bitstype like Int then Julia sets the flag that such a vector initially stores missing in each of its entries:

julia> Vector{Union{Missing, Int}}(undef, 10^7)
10000000-element Vector{Union{Missing, Int64}}:
 missing
 missing
 ⋮
 missing
 missing

If you used non-bitstype then such a flag for each entry does not have to be set as you can see here:

julia> Vector{Union{Missing, String}}(undef, 10^7)
10000000-element Vector{Union{Missing, String}}:
 #undef
 #undef
   ⋮
 #undef
 #undef

and in consequence the performance is the same:

julia> @btime Vector{Union{String}}(undef, 10^7);
  11.672 ms (3 allocations: 76.29 MiB)

julia> @btime Vector{Union{Missing, String}}(undef, 10^7);
  11.480 ms (2 allocations: 76.29 MiB)

Solution 2:

The difference is that union arrays get zero-initialized. You can see the code that decides this here:

https://github.com/JuliaLang/julia/blob/3f024fd0ab9e68b37d29fee6f2a9ab19819102c5/src/array.c#L191

This ends up as a call to memset:

https://github.com/JuliaLang/julia/blob/3f024fd0ab9e68b37d29fee6f2a9ab19819102c5/src/array.c#L144-L145

So as a check, we can compare zeros vs allocating the union array:

julia> @time Vector{Union{Missing, Int}}(undef, 10^7);
  0.020609 seconds (2 allocations: 85.831 MiB)

julia> @time zeros(Int, 10^7);
  0.018375 seconds (2 allocations: 76.294 MiB)

Quite comparable timings.

However, I don't think this performance difference should end up mattering in your application unless you have structured it in a quite strange way. There is very little work you can do with that array until the allocation time becomes insignificant. For example, just setting the values of the uninitialized array makes the timing vs the union array quite similar:

julia> function f()
           a = Vector{Int}(undef, 10^7)
           for i in eachindex(a)
               a[i] = 1
           end
           a
       end;

julia> function f_union()
           a = Vector{Union{Missing, Int}}(undef, 10^7)
           for i in eachindex(a)
               a[i] = 1
           end
           a
       end;

julia> @time f();
  0.015566 seconds (2 allocations: 76.294 MiB)

julia> @time f_union();
  0.026414 seconds (2 allocations: 85.831 MiB)