Run complete data science workflows with high-speed GPU computing power and parallelise data loading, data manipulation and machine learning for 50 times faster end-to-end data science pipelines.

Rapids Image


Today, data science and machine learning have become the world's largest compute segment. Minor improvements in the accuracy of analytical models mean billions in profits for the company. To build the best models, data scientists must painstakingly train, evaluate, iterate and re-train to produce highly accurate results and powerful models. With RAPIDS™, processes that used to take days now take minutes, making it easier and faster to create and deploy value-added models.
Rapids Workflow Image
Workflows have many iterations of converting raw data into training data, which is fed into many algorithm combinations,
which are subjected to hyperparameter tuning to find the right combinations of models, model parameters and data features for optimal accuracy and performance.


RAPIDS is a suite of open-source software libraries and APIs for running data science pipelines entirely on GPUs - and can reduce training times from days to minutes. RAPIDS is built on NVIDIA® CUDA-X AI™ and combines years of development in graphics, machine learning, deep learning, high-performance computing (HPC) and more.


Data science is all about getting results fast. RAPIDS uses NVIDIA CUDA® under the bonnet to accelerate your workflows by running the entire data science training pipeline on GPUs. This reduces training time and model deployment frequency from days to minutes.


By hiding the complexity of working with the GPU and even the communication protocols behind the scenes within the data centre architecture, RAPIDS creates an easy way to do data science. As more and more data scientists use Python and other high-level languages, providing acceleration without code change is critical to rapidly improving development time.


RAPIDS can run anywhere - in the cloud or on-premises. You can easily scale it from a workstation to multi-GPU servers to multi-node clusters and deploy it in production with Dask, Spark, MLFlow, and Kubernetes.


The results show that GPUs offer dramatic cost and time savings for small and large Big Data analytics problems. Using familiar APIs such as Pandas and Dask, RAPIDS is up to 20 times faster than the best CPU baseline at 10 terabytes on GPUs. With just 16 NVIDIA DGX A100s, it achieves the performance of 350 CPU-based servers. This makes the NVIDIA solution 7x more cost-effective while delivering HPC-level performance.
Rapids Server Image


Common data processing tasks have many steps (data pipelines) that Hadoop cannot process efficiently. Apache Spark solved this problem by keeping all data in system memory, which allowed for more flexible and complex data pipelines, but introduced new bottlenecks. Analysing even a few hundred gigabytes (GB) of data could take hours, if not days, on Spark clusters with hundreds of CPU nodes. To realise the true potential of Data Science, GPUs must be at the heart of the data centre design, which consists of these five elements: compute, network, storage, provisioning and software. In general, end-to-end Data Science workflows are 10 times faster on GPUs than on CPUs.


Rapids Data Processing Image


RAPIDS provides a foundation for a new high-performance data science ecosystem and lowers the barrier to entry for new libraries through interoperability. Integration with leading data science frameworks such as Apache Spark, cuPY, Dask and Numba, as well as numerous deep learning frameworks such as PyTorch, TensorFlow and Apache MxNet, helps to broaden adoption and foster integration with others.
blazingSql Logo Image

BlazingSQL is a high performance distributed SQL engine in Python built on RAPIDS to ETL massive amounts of data onto GPUs.

Numba Logo Image

NVTabular is based on RAPIDS and accelerates feature engineering and preprocessing for recommender systems on GPUs.

cuStreamz Logo Image

Based on Streamz, written in Python and built on RAPIDS, cuStreamz accelerates streaming data processing on GPUs.

Plotly Logo Image

Integrated with RAPIDS, Plotly Dash enables interactive visual analysis of multi-gigabyte datasets in real time, even on a single GPU.

Apache Spark Logo Image

The RAPIDS Accelerator for Apache Spark provides a set of plug-ins for Apache Spark that use GPUs to accelerate processing via RAPIDS and UCX software.

Anaconda Logo Image
blazingSQL Logo Image
Capital One Logo Image
CuPy Logo Image
Chainer Logo Image
Deepwave Logo Image
Gunrock Logo Image
Quansight Logo Image
Walmart Logo Image
Booz Logo Image
Capital One Logo Image
databricks Logo Image
graphistry Logo Image Logo Image
Ibm Logo Image
iguazio Logo Image
Kintica Logo Image
inria Logo Image
Mapr Logo Image
Omnisci Logo Image
Preferred Network Logo Image
PyTroch Logo Image
Uber Logo Image
UrsaLabs Logo Image
Walmart Logo Image
Arrow Logo Image
blazingSQL Logo Image
CuPy Logo Image
Dask Logo Image
GoAi Logo Image
nuclio Logo Image
Numba Logo Image
learn Logo Image
dmic XGBoost Logo Image


RAPIDS is based on CUDA primitives for low-level computational optimisation, but makes GPU parallelism and high memory bandwidth accessible via user-friendly Python interfaces. RAPIDS supports end-to-end data science workflows, from data loading and preprocessing to machine learning, graph analytics and visualisation. It is a full-featured Python stack that is scalable for enterprise Big Data use cases.


RAPIDS’s data loading, preprocessing, and ETL features are built on Apache Arrow for loading, joining, aggregating, filtering, and otherwise manipulating data, all in a pandas-like API familiar to data scientists. Users can expect typical speedups of 10X or greater.


RAPIDS' machine learning algorithms and mathematical primitives follow a familiar Scikit-Learn-like API. Popular tools such as XGBoost, Random Forest and many others are supported for both single-GPU and large-scale data centre implementations. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents.

Graph Analytics

RAPIDS graph algorithms such as PageRank and features like NetworkX efficiently leverage the massive parallelism of GPUs to accelerate large graph analysis by over 1000x. Explore up to 200 million edges on a single NVIDIA A100 Tensor Core GPU and scale to billions of edges on NVIDIA DGX™ A100 clusters.


The visualisation functions of RAPIDS support GPU-accelerated cross-filtering. Inspired by the JavaScript version of the original, it enables interactive and super-fast multi-dimensional filtering of over 100 million rows of tabular data sets.

Rapids Software Stack Image


While Deep Learning is effective in areas such as computer vision, natural language processing and recommender systems, there are areas where its use is not mainstream. For problems with tabular data consisting of columns of categorical and continuous variables, techniques such as XGBoost, gradient boosting or linear models are commonly used. RAPIDS streamlines the preprocessing of tabular data on GPUs and provides a seamless handoff of data directly to all frameworks that support DLPack, such as PyTorch, TensorFlow and MxNet. These integrations open up new possibilities for creating rich workflows, including those that were previously out of the question, such as feeding new features created by deep learning frameworks back into machine learning algorithms.


There are five key ingredients to building AI-optimised data centres in the enterprise.
The key to the design is the placement of GPUs in the centre.


With their enormous computing power, systems with NVIDIA GPUs are the central compute building block for AI data centres. NVIDIA DGX systems deliver breakthrough AI performance and can replace an average of 50 dual-socket CPU servers. This is the first step in giving data scientists the industry's most powerful tools for data exploration.


By hiding the complexity of working with the GPU and the communication protocols behind the scenes within the data centre architecture, RAPIDS creates an easy way to do data science. As more and more data scientists use Python and other high-level languages, providing acceleration without code change is critical to rapidly improving development time.


Remote Direct Memory Access (RDMA) in NVIDIA Mellanox® network interface card controllers (NICs), NCCL2 (NVIDIA collective communication library) and OpenUCX (an open source framework for point-to-point communication) has led to huge improvements in training speed. With RDMA, GPUs can communicate directly across nodes at up to 100 gigabits per second (Gb/s), allowing them to span multiple nodes and operate as if they were on a single large server.


Enterprises are turning to Kubernetes and Docker containers for large-scale pipeline deployments. Combining containerised applications with Kubernetes allows enterprises to reprioritise which task is most important and gives AI data centres more resilience, reliability and scalability.


GPUDirect® Storage enables both NVMe and NVMe over Fabric (NVMe-oF) to read and write data directly to the GPU, bypassing the CPU and system memory. This frees up the CPU and system memory for other tasks, while giving each GPU access to an order of magnitude more data with up to 50 per cent higher bandwidth.