RAPIDS - Accelerated Data Science

The world's largest and most profitable companies are data-driven
Become a data-driven Real-Time Enterprise!

Data scientists spend a lot of time evaluating data iterated through Machine Learning (ML) experiments. Each hour required to study data sets, extracting features, and customizing ML algorithms extends the time it takes to get robust results.

Why are data analysis and machine learning so important?

Organizations are increasingly data-driven - capturing market and environmental data through analysis and machine learning to identify complex patterns, identify changes, and make predictions that directly impact performance. Managing a business through data-driven processes has become essential to staying at the forefront of the industry. Data-driven organizations must manage a wide variety of data.

NVIDIA shows how much faster RAPIDs is on NVIDIA GPU based Systems

Why now?

The availability of open source large-scale data analysis and machine learning software, such as Hadoop, NumPy, Scikit Learning, Pandas and Spark, have triggered the Big Data revolution. Large companies from huge industries, such as retail, finance, healthcare, logistics, adopted data analysis to improve their competitiveness, responsiveness and efficiency. Few percent improvements could impact their bottom line by billions. Data analysis and machine learning are the largest HPC segment today.

The current situation

For businesses trying to stay competitive, it’s not easy to learn from increasingly vast volumes of data, cope with the complexity of analysis or keep up with siloed analytics solutions while on legacy infrastructure. What use is valuable data if data analysis takes far too long? Quickly made available results would have avoided losses in value, possible profits would have been achievable, fraud damage would have been prevented by faster reactions.

What is the real problem?

Today’s data science problems demand a dramatic increase in the scale of data as well as the computational power required to process it.

A day in the life of a Data Scientist

What is the problem that RAPIDS is solving?

Don't take a kid for a strong man's job, don't take a CPU for a fast GPU's job! While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For the same reasons, scientific computing and deep learning has turned to NVIDIA GPU acceleration, data analytics and machine learning where GPU acceleration is ideal.

What is RAPIDS?

RAPIDS is built on more than 15 years of NVIDIA® CUDA® development and machine learning expertise. It’s powerful new software for executing end-to-end data science training pipelines completely in the GPU, reducing training time from days to minutes. NVIDIA created RAPIDS – an open-source data analytics and machine learning acceleration platform. RAPIDS is based on Python, has pandas-like and Scikit-Learn-like interfaces, built on Apache Arrow in-memory data format, and can scale from 1 to multi-GPU to multi-nodes.

Rapids Science Pipeline ( Frameworks, Libraries and other Layers )

RAPIDS integrate easily into the world’s most popular data science Python-based workflows. RAPIDS accelerate data science end-to-end – from data prep, to machine learning, to deep learning. And through Arrow, Spark users can easily move data into the RAPIDS platform for acceleration.

Machine Learning to Deep Learning: All on GPU

While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For the same reasons, scientific computing and deep learning has turned to NVIDIA GPU acceleration, data analytics and machine learning where GPU acceleration is ideal.

Is there a solution that speeds up the processing significantly?

Yes, now with Nvidia's effort to push the GPU acceleration into Machine learning and High-Performance Data Analytics (ML/HPDA), the company reports that the RAPIDS platform delivers 50x speed-ups, using the XGBoost machine learning algorithm for training on an NVIDIA DGX-2 supercomputer, compared with CPU-only systems. So, RAPIDS for Data Science can reduce computing times from days to minutes.


These applications profit from using RAPIDs:
  • Big Data
  • Forecasting, Trends, Prediction
  • Pattern Recognition
  • Credit Card Fraud
  • Risk Management
Best used with these frameworks:
  • HDOOP
  • SPARK
  • Apache Arrow
  • Python
  • Pandas
  • SciKit
GPU Applications

Recommended Hardware

Rapids Recommended Configurations

RAPIDS Deployment Stage Recommended GPU Configuration Minimum CPU Cores Minimum Main Memory Boot Drive Local Data Storage Networking Connections
Development 2 x Quadro GV100 & NVLINK 10 128 GB 500GB SSD 2TB SSD 1GbE / 10GbE
Development & Production 4x V100 & NVLINK 20 256 GB 500GB SSD 4TB SSD 1GbE / 10GbE
Production 4x V100 SXM2 & NVLINK 20 256 GB 500GB SSD 4TB SSD 10GbE / 100GbE/IB
Production 8x V100 SXM2 & NVLINK 40 512 GB 500GB SSD 4TB SSD / NVMe 10GbE / 100GbE/IB
Production 16x V100 SXM3 & NVSWITCH 56 1 TB 500GB SSD 10TB SSD / NVMe 40GbE / 100GbE/IB

Development Systems

For development systems, sysGen offers you the devCube or NVIDIA's DGX Station. The devCube is a well tested and proven system used by many of our customers for deep learning tasks. In the following table you will find the general and recommended specs for our systems.


Production Systems

Productive servers are dedicated to the high demands of continuous operation and constant utilization. Redundant power supplies and enterprise-class components are a part of our service.


Optimized Software Stack3

NVIDIA RAPIDS AND DEEPLEARNING SOFTWARESTACK

NVIDIA RAPIDS includes CUDF, CUML and CUGRAPH as its core tools. With cuDF you can prepare and wrangle your raw data. Afterwards cuML uses an optimized machine learning model training algorithm to process the prepared data.
Afterwards your data will be visualized and displayed to you.

Apache Arrow

Apache Arrow is a columnar, in-memory data structure that delivers efficient and fast data interchange with flexibility to support complex data models.

cuDF

The RAPIDS cuDF library is a DataFrame manipulation library basen on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The Python bindings of the core-accelerated CUDA DataFrame manipulation primitives mirror the pandas interface for seamless onboarding of pandas users.

cuML

RAPIDS cuML is acollection of GPU-accelerated machine learning libraries that will provide GPU versions of all machine learning algorithms available in scikit-learn.

cuGRAPH

This is a framework and collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform.

Deep Learning Libraries

RAPIDS provides native array_interface support. This means data stored in Apache Arrow can be seamlessly pushed to deep learning frameworks that array_interfache such as PyTorch and Chainer.

Visualizatrion Libraries Coming Soon

RAPIDS will include tightly integrated data visualization libraries based on Apache Arrow. Native GPU in-memory data format provides high-performance, high-FPS data visualization, even with very large datsets.


3 This info is based upon NVIDIA's accessible information


Introducing RAPIDS

During the GPU Technology Conference in Munich the graphics card manufacturer Nvidia presented the open source platform Rapids. It is primarily aimed at users in the fields of data science and machine learning and represents a collection of libraries that should enable GPU-accelerated data analysis. In addition to Nvidia, companies such as IBM, HPE, Oracle and Databricks have also announced their support for the project.

The graphics card manufacturer explains that Rapids is based on Cuda, the in-house platform for parallel programming. The new platform will enable developers to create end-to-end pipelines for data analysis. Nvidia has achieved up to 50 times faster results on the DGX-2 supercomputer compared to systems that rely only on CPUs. The platform builds on well-known open source projects such as Apache Arrow, pandas and scikit-learn, and is designed to bring GPU acceleration to popular Python toolchains. Integration with Apache Spark is also planned.

NVIDIA has been working with members of the Python community for two years to create Rapids. Currently, the collection consists of a Python GPU DataFrame library, a C GPU DataFrame library, and alpha versions of a cuML and cuDF library. According to NVIDIA founder Jensen Huang, the complete package will advance the work in the areas of data analysis and machine learning.

The entire Rapids project can be found on GitHub. Further information including installation instructions can be found on the official website. Companies like Walmart are already using the new platform.


The Personal Supercomputer for Leading-Edge AI Development

Your data science team depends on computing performance to gain insights, and innovate faster through the power of deep learning and data analytics. Until now, AI supercomputing was confined to the data center, limiting the experimentation needed to develop and test deep neural networks prior to training at scale. Now there’s a solution, offering the power to experiment with deep learning while bringing AI supercomputing performance within arm’s reach.

€ 72.718,75 (€ 86.535,31 inkl. MwSt.)

sysGen/SUPERMICRO SYS-1029GQ-TVRT, includes:
Mainboard Super X11DGQ / 1U Rackmount CSE-118GQPTS-R2K05P2

  • Up to 4 NVIDIA Tesla V100 SXM2 GPUs
  • Up to 300 GB/s GPU-to-GPU
  • NVIDIA NVLINK
  • Optimized for NVIDIA GPUDirect RDMA
  1. Dual socket P (LGA 3647) supports Intel® Xeon® Scalable Processors, 3 UPI up to 10.4GT/s
  2. Up to 1.5TB ECC 3DS LRDIMM, up to DDR4-2666MHz; 12 DIMM slots
  3. 4 PCI-E 3.0 x16 slots
  4. 2 Hot-swap 2.5" SAS/SATA drive bays, 2 Internal 2.5" drive bays
  5. Support 1x M.2 2242/2260/2280, Support M.2 SATA and NVMe
  6. 2x 10GBase-T LAN ports via Intel X540
  7. 7 Heavy duty 4cm counter-rotating fans with air shroud & optimal fan speed control
  8. 2000W Redundant Power Supplies Titanium Level (96%)

Datasheet
€ 3.357,85 (€ 3.995,84 inkl. MwSt.)

The World's First Deep Learning Supercomputer in a box

The NVIDIA® DGX-1™ is the world’s first purpose-built system for deep learning with fully integrated hardware and software that can be deployed quickly and easily. Its revolutionary performance significantly accelerates training time, making the NVIDIA DGX-1 the world’s first deep learning supercomputer in a box.


€ 156.864,73 (€ 186.669,03 inkl. MwSt.)

sysGen/SUPERMICRO SYS-4029GP-TVRT, includes:
Mainboard Super X11DGO-T / 4U Rackmountable CSE-R422BG-1

  • Artificial Intelligence
  • Big Data Analytics
  • High-performance Computing
  • Research Lab/National Lab
  • Astrophysics, Business Intelligence
  1. Dual socket P (LGA 3647) supports Intel® Xeon® Scalable Processors, 3 UPI up to 10.4GT/s
  2. Up to 3TB ECC 3DS LRDIMM, up to DDR4-2666MHz; 24 DIMM slots
  3. 4 PCI-E 3.0 x16 (LP) (GPU tray for GPUDirect RDMA), 2 PCI-E 3.0 x16 (LP, CPU tray)
  4. Support 8 Tesla V100 with 300GB/s NVLINK
  5. Dual 10GBase-T LAN with Intel® X540
  6. 16 Hot-swap 2.5" SAS/SATA drives (Optional 8x NVMe drives supported), 2 NVMe based M.2 SSD
  7. 8x 92mm cooling fans
  8. 2200W Redundant (2+2) Power Supplies; Titanium Level (96%+)

Datasheet
€ 6.263,43 (€ 7.453,47 inkl. MwSt.)

The world's most powerful AI system for the most complex AI challenges

Break through the barriers to AI speed and scale with NVIDIA DGX-2, the first 2 petaFLOPS system that engages 16 fully interconnected GPUs for 10X the deep learning performance. Powered by NVIDIA® DGX™ software and an architecture designed for AI-scale built on NVIDIA NVSwitch, DGX-2 enables you to take on the world’s most complex deep learning challenges.

DGX-2 Specs Powered by NGC Deep Learning Stack

  • Integrated suite of optimized deep learning software
  • Simplified workload management
  • Free download: NVIDIA DGX systems deep learning software brief

Getting Started Quickly
  • Get started in one day instead of months
  • Simply unpack, plug-in, and start getting results

Greater Productivity
  • Save hundreds of thousands of dollars in engineering effort
  • Avoid months of lost productivity spent on IT

Performance Without Compromise
  • NVIDIA DGX Systems with Volta offers 3X the speed of prior generations
  • Deep learning training, inference and accelerated analytics in one system

NVIDIA's DGX-2 servers are offered with several support options. The minimum support period is one year:
  • 1 year support for educational customers only
  • 2 years support for educational customers only
  • 3 years support for educational customers only
  • 4 years support for educational customers only
  • 1 year support for non educational customers
  • 2 years support for non educational customers
  • 3 years support for non educational customers
  • 4 years support for non educational customers

The price depends on Standard/Education, length of warranty period and some other circumstances. Please ask for special quotation (see left side link: "ADD TO REQUEST / ZUR ANFRAGE HINZUFÜGEN").

€ 296.475,10 (€ 352.805,36 inkl. MwSt.)

The world's most powerful AI system for the most complex AI challenges

Break through the barriers to AI speed and scale with NVIDIA DGX-2H, the first 2 petaFLOPS system that engages 16 fully interconnected GPUs for 10X the deep learning performance. Powered by NVIDIA® DGX™ software and an architecture designed for AI-scale built on NVIDIA NVSwitch, DGX-2H enables you to take on the world’s most complex deep learning challenges.

DGX-2H Specs Powered by NGC Deep Learning Stack

  • Integrated suite of optimized deep learning software
  • Simplified workload management
  • Free download: NVIDIA DGX systems deep learning software brief

Getting Started Quickly
  • Get started in one day instead of months
  • Simply unpack, plug-in, and start getting results

Greater Productivity
  • Save hundreds of thousands of dollars in engineering effort
  • Avoid months of lost productivity spent on IT

Performance Without Compromise
  • NVIDIA DGX Systems with Volta offers 3X the speed of prior generations
  • Deep learning training, inference and accelerated analytics in one system

NVIDIA's DGX-2H servers are offered with several support options. The minimum support period is one year:
  • 1 year support for educational customers only
  • 2 years support for educational customers only
  • 3 years support for educational customers only
  • 4 years support for educational customers only
  • 1 year support for non educational customers
  • 2 years support for non educational customers
  • 3 years support for non educational customers
  • 4 years support for non educational customers

The price depends on Standard/Education, length of warranty period and some other circumstances. Please ask for special quotation (see left side link: "ADD TO REQUEST / ZUR ANFRAGE HINZUFÜGEN").

€ 0,00
+