Accelerating Polars with RAPIDS cuDF

Federico Trotta - Sep 17 - - Dev Community

If you’re a data scientist who migrated from Pandas to Polars because of its performance, you may be happy that Polars has powered up even further thanks to NVIDIA’s cuDF.

Did I get your attention? Well, read along!

Introducing Polars

In today’s analytics world, data frames are the backbone of most data work. Whether you're cleaning data, transforming it, or running complex analyses, data frames let you organize and manipulate data in a way that feels intuitive. This is mainly because data frames:

  • Are versatile: DataFrames APIs are less verbose than SQL for complex queries.
  • Provide easy Integration: Data frames integrate well with existing software solutions (for example. with plotting and ML libraries).
  • Provide a single format for data science and engineering: Data frames support both data engineering workflows and that of data scientists.

In the last few years, Pandas has been king in this space, but with more data than ever and growing needs for performance, tools like Polars are stepping in to meet those demands without sacrificing the simplicity we’ve come to love from data frames.

In case you didn't know, Polars is a Python library for data analysis that’s gaining popularity as a speedier alternative to Pandas. While Pandas is the go-to for most data scientists and engineers, it can get sluggish when handling really big datasets.

Polars, on the other hand, is built with performance in mind and it’s optimized to handle massive datasets much faster, thanks to its use of parallelization and a more modern backend. So, if you've ever felt like Pandas was holding you back with long processing times, Polars might be the upgrade you’re looking for.

However, if you already use Polar, you may have noticed that its superpowers may not be good enough for very large datasets, especially in distributed systems:

comparison between systems
(Image from NVIDIA/Polars)

So, let’s see the solution that has been implemented and what to expect from it.

Accelerating Polars with RAPIDS

When it comes to processing data with very large datasets, as in industries quantitative finance, healthcare research, and similar, the performance needs to be even higher, due to the great amount of data.

And here’s why NVIDIA has accelerated Polars with the RAPIDS cuDF library.

Here’s what they’ve done:

  • The RAPIDS cuDF library accelerates Polars workflows up to 13x+ using NVIDIA GPUs. In particular, it’s directly integrated into the Polars Lazy API, so you don’t need to change your code.
  • It has been designed to make processing 100s of millions of rows of data feel interactive with just a single GPU.
  • The library it’s fully compatible with the ecosystem of tools built for Polars, thus reducing overhead.
  • It gracefully falls back to the CPU for unsupported queries.

In particular, moving to RAPIDS cuDF if you already have written Polars code is pretty straight forward, as you only need to add an ‘enging=gpu’ method.

For example, this is an example written in plain Polars:

Code in Polars by Federico Trotta

And here’s Polars accelerated with RAPIDS:

Polars code accelerated by RAPIDS by Federico Trotta

What to expect?

First of all, using Polars on a GPU should feel the same as using it on the CPU: just faster for many workflows.

The GPU engine, in fact, fully utilizes the Polars optimizer to ensure efficient execution and minimal memory usage.

Also, as the team was working on accelerating Polars, they benchmarked it with industry standards and found that, as the data scaled, the performance of Polars (accelerated) scaled too:

Benchmark by NVIDIA
(The Benchmark made by NVIDIA)

This is perfectly expected, as Polars is accelerated on GPUs (note that the benchmark has been realized on NVIDIA H100).

How to use it?

To accelerate Polars with cuDF, you first need to install it in an environment that allows you to use GPUs, for example in Google Colaboratory:

$ pip install polars\[gpu\] \--extra-index-url=[https://pypi.nvidia.com](https://pypi.nvidia.com)
Enter fullscreen mode Exit fullscreen mode

The following example is taken from a 22GB dataset (link at the end of the article to test it).

Here’s the time needed for an operation with “standard” Polars:

Polars code by Federico Trotta

And here’s the time needed for the same operation, with accelerated Polars:

Accelerated Polars code by Federico Trotta

So, the same operation took:

  • 12 seconds with Polars.
  • 0.34 seconds with accelerated Polars.

Conclusions

With this new Polars GPU engine, you can potentially reach high performance with huge datasets, maintaining the same Polars code you are already using.

So, why not give it a try? You can easily test it using a Colab notebook!

Want to read more? Here are all the details about that release directly on the Polars website.

. . . . . . . .
Terabox Video Player