HDBSCAN: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

I try to inititialize HDBSCAN for clustering in JupytherLab. I use Python 3.7.6..

import numpy as np
import pandas as pd

from sklearn.datasets import load_digits

from sklearn.manifold import TSNE
import hdbscan

There always always appears the same error (see headline) and until now I do not know, from what exactly it comes from.

I have looked in several post after solutions, but no solution has helped me until yet.

For example:

  1. uninstalled and installed numpy.
  2. installed numpy >= 1.20.0
  3. tried lines like pip install package --no-cache-dir --no-binary :all:
  4. tried following package version combination: hdbscan=0.8.19, matplotlib=3.2.2, numpy=1.15.4, pandas=0.23.4, scikit-learn=0.20.1, scipy=1.1.0, tensorflow=1.13.1.

I have also tried to install packages like tensorboard, but it did not helped. Everything is installed via the Terminal and with pip.

I start to think, that the problem might be deeper - but maybe I overlooked something important.

Can somebody help me to find the bug, please?

Best regards

Philipp


I guess you've probably seen this very long HDBSCAN GitHub issue where there still doesn't seem to be a clear solution. Unfortunately it seems to affect different systems in weird ways and there is a huge list of possible solutions and things that worked for other people (personally, just reinstalling numpy worked for me when I had a similar problem last week.)

The fact that you can try so many things and still have it not work seems suspicious. Maybe something else about your Python install or the way you're trying them is affecting the solutions? For instance, is JupyterLab definitely using the same Python environment that you're trying these solutions on? (You could test this by uninstalling HDBSCAN and seeing if the error changes instead to "package not found.")

Other than the many solutions in the GitHub issue (which it sounds like you've already tried), I really don't think there's much else you can try other than freshly reinstalling Python. Something about NumPy 1.20 and a change to the C API is causing this issue and it could be that something is lurking in your install every time you try these solutions.

Alternatively, you could make a new Python install/environment with a tool like pyenv or anaconda so that it doesn't break your existing install, and you can try and install just the bare minimum on this new install (i.e. just HDBSCAN.)