What is the best way to create a sparse array in C++?

I am working on a project that requires the manipulation of enormous matrices, specifically pyramidal summation for a copula calculation.

In short, I need to keep track of a relatively small number of values (usually a value of 1, and in rare cases more than 1) in a sea of zeros in the matrix (multidimensional array).

A sparse array allows the user to store a small number of values, and assume all undefined records to be a preset value. Since it is not physically possibly to store all values in memory, I need to store only the few non-zero elements. This could be several million entries.

Speed is a huge priority, and I would also like to dynamically choose the number of variables in the class at runtime.

I currently work on a system that uses a binary search tree (b-tree) to store entries. Does anyone know of a better system?

For C++, a map works well. Several million objects won't be a problem. 10 million items took about 4.4 seconds and about 57 meg on my computer.

My test application is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <map>

class triple {
public:
    int x;
    int y;
    int z;
    bool operator<(const triple &other) const {
        if (x < other.x) return true;
        if (other.x < x) return false;
        if (y < other.y) return true;
        if (other.y < y) return false;
        return z < other.z;
    }
};

int main(int, char**)
{
    std::map<triple,int> data;
    triple point;
    int i;

    for (i = 0; i < 10000000; ++i) {
        point.x = rand();
        point.y = rand();
        point.z = rand();
        //printf("%d %d %d %d\n", i, point.x, point.y, point.z);
        data[point] = i;
    }
    return 0;
}

Now to dynamically choose the number of variables, the easiest solution is to represent index as a string, and then use string as a key for the map. For instance, an item located at [23][55] can be represented via "23,55" string. We can also extend this solution for higher dimensions; such as for three dimensions an arbitrary index will look like "34,45,56". A simple implementation of this technique is as follows:

std::map data<string,int> data;
char ix[100];

sprintf(ix, "%d,%d", x, y); // 2 vars
data[ix] = i;

sprintf(ix, "%d,%d,%d", x, y, z); // 3 vars
data[ix] = i;

The accepted answer recommends using strings to represent multi-dimensional indices.

However, constructing strings is needlessly wasteful for this. If the size isn’t known at compile time (and thus std::tuple doesn’t work), std::vector works well as an index, both with hash maps and ordered trees. For std::map, this is almost trivial:

#include <vector>
#include <map>

using index_type = std::vector<int>;

template <typename T>
using sparse_array = std::map<index_type, T>;

For std::unordered_map (or similar hash table-based dictionaries) it’s slightly more work, since std::vector does not specialise std::hash:

#include <vector>
#include <unordered_map>
#include <numeric>

using index_type = std::vector<int>;

struct index_hash {
    std::size_t operator()(index_type const& i) const noexcept {
        // Like boost::hash_combine; there might be some caveats, see
        // <https://stackoverflow.com/a/50978188/1968>
        auto const hash_combine = [](auto seed, auto x) {
            return std::hash<int>()(x) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
        };
        return std::accumulate(i.begin() + 1, i.end(), i[0], hash_combine);
    }
};

template <typename T>
using sparse_array = std::unordered_map<index_type, T, index_hash>;

Either way, the usage is the same:

int main() {
    using i = index_type;

    auto x = sparse_array<int>();
    x[i{1, 2, 3}] = 42;
    x[i{4, 3, 2}] = 23;

    std::cout << x[i{1, 2, 3}] + x[i{4, 3, 2}] << '\n'; // 65
}

Boost has a templated implementation of BLAS called uBLAS that contains a sparse matrix.

https://www.boost.org/doc/libs/release/libs/numeric/ublas/doc/index.htm

Eigen is a C++ linear algebra library that has an implementation of a sparse matrix. It even supports matrix operations and solvers (LU factorization etc) that are optimized for sparse matrices.

Why does Qt use its own make tool, qmake?

How to modify key in a dictionary in C#

jquery: get value of custom attribute

Which of sprintf/snprintf is more secure?

Create a dataframe from another df which selected only those columns having min value of each row excluding 0 as min and exclude operation on few rows

Changing class on radio button change

Mirror at most 1 value from source, then temporarily drop values until another observable emits

What format does the wxpython Image object expect for data when instantiating it using the wx.Image(width, height, data) constructor

Tiller: dial tcp 127.0.0.1:80: connect: connection refused

After upgrading to Catalina `csrutil disable` doesn't seem to work anymore [duplicate]

Once a link to a photo has been created in iCloud, is there a way to invalidate the link & remove the photo?

Cannot mount HFS partition - imaged using dd