Skip to main content
  1. Blog/

Using cuML in place of scikit-learn for GPU-enabled machine learning

Table of Contents


One of the best known libraries for machine learning is scikit-learn. Their models and tools are robust, and I especially like them since they “just work.” This makes scikit-learn especially convenient for rapid prototyping and data exploration. Unfortunately, the library is limited to CPU operations. Even with parallelization, algorithms can take a long time to run. Since I work in a lab that has access to GPU computing clusters, I was excited to find Nvidia’s RAPIDS library. There are several components to the library; for my purposes, I only needed access to cuML, a GPU-enabled drop-in for several scikit-learn functions.

Unfortunately, cuML only supports Linux platforms at this time, so for Apple or Windows machines, you are stuck with the scikit-learn library and your CPU. If you have a Linux machine with a CUDA device available, the cuML library is nearly interchangeable with scikit-learn syntax, so you can keep the body of your code the same and just import the library according to the system you’re using.

Installation of cuML is pretty straightforward, though I think the trickiest part of using the RAPIDS libraries—or any CUDA-enabled library, for that mater—is ensuring you have the required CUDA libraries and toolkit installed. If you are looking for how to set that up, this blog post is not for you!

Platform-dependent imports

Below is a data clustering example to demonstrate how easy it is to add GPU capability to your existing scikit-learn workflow. Because I often run the same code on machines with or without CUDA, the only change I made to my code was to make imports conditional on the operating system and whether CUDA is available.

Note: The test in line 4 is not rigorous; there are better ways to test if CUDA is available. For my use case, the Linux platform I was using had CUDA enabled. { .p class=“alert alert-warning” }

import sys
import time

if sys.platform == 'linux': # If on Linux and have CUDA device, use GPU
    print('CUDA available, using GPU.')
    from cuml import KMeans, TSNE
    print('CUDA unavailable, using CPU.')
    from sklearn.cluster import KMeans
    from sklearn.manifold import TSNE
import numpy as np
from sklearn.datasets import make_blobs

k-means clustering & t-SNE

With scikit-learn or cuML imported, you can use the following code to run k-means and t-SNE:

# Start the stopwatch!
start_time = time.time()

# Create some random data with 3 blobs:
centers = [(-10, -10), (0, 0), (10, 10)]
cluster_std = [0.1, 1, 10]
n_clusters = 3
x, labels_true = make_blobs(
    n_samples=10000, # <---- Make this bigger or smaller.
    cluster_std = cluster_std,

# Run k-means clustering:
labels = KMeans(n_clusters=n_clusters,random_state=2009).fit_predict(x)
# Run t-SNE:
x_embedded = TSNE(n_components=2,random_state=2009).fit_transform(x)

# Stop the stopwatch!
end_time = time.time()
print(f"Completed in {end_time - start_time:.2f} s.")

As always, double check the documentation that the syntax and arguments are the same. I have run across a few cases where the function calls differ very slightly between scikit-learn and cuML.