Reading implementation of scikit-learn in TensorFlow: http://learningtensorflow.com/lesson6/ and scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm struggling to decide which implementation to use.

scikit-learn is installed as part of the tensorflow docker container so can use either implementation.

Reason to use scikit-learn :

scikit-learn contains less boilerplate than the tensorflow implementation.

Reason to use tensorflow :

If running on Nvidia GPU the algorithm will be run against in parallel , I'm not sure if scikit-learn will utilize all available GPUs?

Reading https://www.quora.com/What-are-the-main-differences-between-TensorFlow-and-SciKit-Learn

TensorFlow is more low-level; basically, the Lego bricks that help you to implement machine learning algorithms whereas scikit-learn offers you off-the-shelf algorithms, e.g., algorithms for classification such as SVMs, Random Forests, Logistic Regression, and many, many more. TensorFlow shines if you want to implement deep learning algorithms, since it allows you to take advantage of GPUs for more efficient training.

This statement re-enforces my assertion that "scikit-learn contains less boilerplate than the tensorflow implementation" but also suggests scikit-learn will not utilize all available GPUs?


Solution 1:

Tensorflow only uses GPU if it is built against Cuda and CuDNN. By default it does not use GPU, especially if it is running inside Docker, unless you use nvidia-docker and an image with a built-in support.

Scikit-learn is not intended to be used as a deep-learning framework and it does not provide any GPU support.

Why is there no support for deep or reinforcement learning / Will there be support for deep or reinforcement learning in scikit-learn?

Deep learning and reinforcement learning both require a rich vocabulary to define an architecture, with deep learning additionally requiring GPUs for efficient computing. However, neither of these fit within the design constraints of scikit-learn; as a result, deep learning and reinforcement learning are currently out of scope for what scikit-learn seeks to achieve.

Extracted from http://scikit-learn.org/stable/faq.html#why-is-there-no-support-for-deep-or-reinforcement-learning-will-there-be-support-for-deep-or-reinforcement-learning-in-scikit-learn

Will you add GPU support in scikit-learn?

No, or at least not in the near future. The main reason is that GPU support will introduce many software dependencies and introduce platform specific issues. scikit-learn is designed to be easy to install on a wide variety of platforms. Outside of neural networks, GPUs don’t play a large role in machine learning today, and much larger gains in speed can often be achieved by a careful choice of algorithms.

Extracted from http://scikit-learn.org/stable/faq.html#will-you-add-gpu-support

Solution 2:

I'm experimenting with a drop-in solution (h2o4gpu) to take advantage of GPU acceleration in particular for Kmeans:

try this:

from h2o4gpu.solvers import KMeans
#from sklearn.cluster import KMeans

as of now, version 0.3.2 still don't have .inertia_ but I think it's in their TODO list.

EDIT: Haven't tested yet, but scikit-cuda seems to be getting traction.

EDIT: RAPIDS is really the way to go here.

Solution 3:

From my experience, I use this package to utilize GPU for some sklearn algorithms in here.

The code I use:

import numpy as np
import dpctl
from sklearnex import patch_sklearn, config_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
            [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Source: oneAPI and GPU support in Intel(R) Extension for Scikit-learn