I asked already something similar, but this time I will be more specific.

I need to perform, within a for loop, the Cholesky factorization of a generally large positive definite symmetrix matrix (about 1000x1000). Now, to do this, I have been giving a try to:

1) Apache Math library

2) Parallel Colt library

3) JLapack library

In any of the three above-mentioned cases, the time consumption is terribly long, if compared to MATLAB, for instance.

Therefore I am wondering if there is any highly-optimized external tool for Cholesky factorization in Java: I have been thinking, for example, of the CHOLMOD algorithm, which is actually internally called within MATLAB and other tools.

I'd really appreciate having a thorough feedback on this matter.


Here is a good summary of some BLAS libraries for Java: performance-of-java-matrix-math-libraries. You can also see a benchmark of many of many of these libraries at Java-Matrix-Benchmark.

However, most of these libraries don't appear to be tuned for solving large sparse matrices in my experience. In my case what I did was implement the solving using Eigen through JNI.

Eigen has a good discussion on its linear solvers including one on CHOLMOD.

For my case for a 8860x8860 sparse matrix using Eigen's solver through JNI was 20 times faster than parallel colt and 10 times faster than my own dense solver. More importantly is that it appears to scale as n^2 rather than n^3 and it uses much less memory than my dense solver (I was running out of memory scaling up).

There is actually a wrapper for Eigen with Java called JEigen which uses JNI. However, it did not have the sparse matrix solving implemented so it does not wrap everything.

I initially used JNA but was not happy with the overhead. Wikipedia has a good example on how to use JNI. Once you write the function declarations and compile them with javac you use javah to create the header file for C++.

For example for

//Cholesky.java
package cfd.optimisation;
//ri, ci, v : matrix row indices, column indices, and values
//y = Ax where A is a nxn matrix with nnz non-zero values
public class Cholesky {
    private static native void solve_eigenLDLTx(int[] ri, int[] ci, double[] v, double[] x, double[] y, int n, int nnz);
}

Using javah produced a header file cfd_optimization_Cholesky.h with a declaration

JNIEXPORT void JNICALL Java_cfd_optimisation_Cholesky_solve_1eigenLDLTx
        (JNIEnv *, jclass, jintArray, jintArray, jdoubleArray, jdoubleArray, jdoubleArray, jint, jint); 

And here is how I implemented the solver

JNIEXPORT void JNICALL Java_cfd_optimisation_Cholesky_solve_1eigenLDLTx(JNIEnv *env, jclass obj, jintArray arrri, jintArray arrci, jdoubleArray arrv, jdoubleArray arrx, jdoubleArray arry, jint jn, jint jnnz) {
    int n = jn;
    int *ri = (int*)env->GetPrimitiveArrayCritical(arrri, 0);
    int *ci = (int*)env->GetPrimitiveArrayCritical(arrci, 0);
    double *v = (double*)env->GetPrimitiveArrayCritical(arrv, 0);
    int nnz = jnnz;

    double *x = (double*)env->GetPrimitiveArrayCritical(arrx, 0);
    double *y = (double*)env->GetPrimitiveArrayCritical(arry, 0);

    Eigen::SparseMatrix<double> A = colt2eigen(ri, ci, v, nnz, n);
    //Eigen::MappedSparseMatrix<double> A(n, n, nnz, ri, ci, v);

    Eigen::VectorXd a(n), b(n);
    for (int i = 0; i < n; i++) a(i) = x[i];
    //a = Eigen::Map<Eigen::VectorXd>(x, n).cast<double>(); 
    Eigen::SimplicialCholesky<Eigen::SparseMatrix<double> > solver;
    solver.setMode(Eigen::SimplicialCholeskyLDLT);
    b = solver.compute(A).solve(a);
    for (int i = 0; i < n; i++) y[i] = b(i);
    env->ReleasePrimitiveArrayCritical(arrri, ri, 0);
    env->ReleasePrimitiveArrayCritical(arrci, ci, 0);
    env->ReleasePrimitiveArrayCritical(arrv, v, 0);
    env->ReleasePrimitiveArrayCritical(arrx, x, 0);
    env->ReleasePrimitiveArrayCritical(arry, y, 0);
}

The function colt2eigen creates a sparse matrix from two integer arrays containing the row and column indices and a double array of the values.

Eigen::SparseMatrix<double> colt2eigen(int *ri, int *ci, double* v, int nnz, int n) {
    std::vector<Eigen::Triplet<double>> tripletList;
    for (int i = 0; i < nnz; i++) { 
        tripletList.push_back(Eigen::Triplet<double>(ri[i], ci[i], v[i]));  
    }
    Eigen::SparseMatrix<double> m(n, n);
    m.setFromTriplets(tripletList.begin(), tripletList.end());
    return m;
}

One of the tricky parts was getting these arrays from Java and Colt. To do this I did this

//y = A x: x and y are double[] arrays and A is DoubleMatrix2D
int nnz = A.cardinality();
DoubleArrayList v = new DoubleArrayList(nnz);
IntArrayList ci = new IntArrayList(nnz);
IntArrayList ri = new IntArrayList(nnz);

A.forEachNonZero((row, column, value) -> {
    v.add(value); ci.add(column); ri.add(row); return value;}
);

Cholesky.solve_eigenLDLTx(ri.elements(), ci.elements(), v.elements(), x, y, n, nnz);

I haven't worked with any of these tools, but my suspicion is you're getting hit by the fact that Java doesn't use the native processor floating-point square root instruction in some versions/on some platforms.

See: Where can I find the source code for Java's Square Root function?

If absolute accuracy isn't a requirement, you could try switching one of the above implementations to use an approximation of square root (see Fast sqrt in Java at the expense of accuracy for suggestions), which should be somewhat faster.