Could not initialize class org.nd4j.linalg.factory.Nd4j in docker container

I am trying to import a KERAS file in a docker container with a program that has the following java code sample:

String weights = getFile(_modelFile).getAbsolutePath();
String modelConfiguration = getFile(_configFile).getAbsolutePath();
_model = KerasModelImport.importKerasSequentialModelAndWeights(modelConfiguration, weights);

I am using Docker with WSL2 and the docker container has the following OS:
dsasad

If I run the program in WSL2 it works fine, but if I run it in the docker container I get the following error:

2022-01-11 09:34:36,124 [ListenerHTTP-46 ] WARN wedo.jaf.protocols.json.JSONServlet JAF_G1000 [] - wedo.jaf.JafError: Failed to invoke operation method - java.lang.NoClassDefFoundError - java.lang.NoClassDefFoundError: Could not initialize class org.nd4j.linalg.factory.Nd4j at wedo.jaf.services.operations.MethodOperationManager$JavaMethodOperation.execute(MethodOperationManager.java:851) at wedo.jaf.services.operations.WrappedOperation.execute(WrappedOperation.java:141) at wedo.jaf.services.sessions.SessionBase.execute(SessionBase.java:263) at wedo.jaf.protocols.json.JSONServlet.processOperation(JSONServlet.java:217) at wedo.jaf.protocols.json.JSONServlet.doPost(JSONServlet.java:161) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:713) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.nd4j.linalg.factory.Nd4j at org.deeplearning4j.nn.modelimport.keras.Hdf5Archive.readDataSet(Hdf5Archive.java:295) at org.deeplearning4j.nn.modelimport.keras.Hdf5Archive.readDataSet(Hdf5Archive.java:109) at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelUtils.importWeights(KerasModelUtils.java:284) at org.deeplearning4j.nn.modelimport.keras.KerasSequentialModel.(KerasSequentialModel.java:151) at org.deeplearning4j.nn.modelimport.keras.KerasSequentialModel.(KerasSequentialModel.java:57) at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildSequential(KerasModelBuilder.java:326) at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasSequentialModelAndWeights(KerasModelImport.java:296) at wedo.ml.model.ModelH5.getModel(ModelH5.java:327) at wedo.ml.model.ModelH5.validateH5Files(ModelH5.java:238) at wedo.ml.operations.H5Operations.validateH5Model(H5Operations.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at wedo.jaf.services.operations.MethodOperationManager$JavaMethodOperation.execute(MethodOperationManager.java:841) ... 36 common frames omitted

Nd4j is a Dl4j dependency and it is present in both executions which makes me think the problem could be the Docker itself or the OS on the docker container.

Hope to have explained my problem well. Thanks in advance.

EDIT: Both versions of Dl4j and Nd4j are 1.0.0-M1.1


Solution 1:

This still doesn't show a cause. NoClassDeffFoundErrors are usually related to a clashing dependencies. You could have different versions of dl4j/nd4j on your classpath but I doubt it. Most of the time this is the side effect of a native dependency crash somehow.

Of note here:

I wouldn't recommend running the keras converter (or any model import process) in line. I would recommend converting the models separately. This is mainly for performance reasons

Whatever your problem is there are usually a few:

  1. glibc version with hdf5. Keras import uses hdf5 underneath the covers which means c code.

  2. Nd4j native dependency crash: this is also usually glibc related. We load nd4j in to memory to create and set native arrays (which means more java calling in to c++) that then can trigger a crash depending on what OS you're running on

  3. Another hdf5 error: this could be an invalid model or some hdf5 version error.

In any case, we would need more information before we can help you. Whatever you're reporting here isn't enough. Could you mention your docker container OS and what version of dl4j/nd4j is bundled here?

Edit: I see it's oracle linux 7 which is effectively RHEL/Centos. If you're using docker I would recommend a newer image maybe.

Beyond that if it is an nd4j related crash (still not verifiable from your stack trace) if you are using the latest version you might be seeing a crash due to glibc version.

If so there was a recent update to the nd4j classifiers you can find here: https://repo1.maven.org/maven2/org/nd4j/nd4j-native/1.0.0-M1.1/

Older glibcs need to use linux-x86_64-compat as a migration path