Systeme und Informatikanwendungen Nikisch GmbHsysGen GmbH - Am Hallacker 48a - 28327 Bremen - info@sysgen.de

Welcome to the new website of sysGen. Please use our contact form if you have any questions about our content

.
KEYNOTE NOVEMBER 9 CONFERENCE & TRAINING NOVEMBER 8-11,2021REGISTER HERE
Due to the widening chip crisis and the resulting, significant price increases of the major IT manufacturers, online price calculations are currently not possible. We therefore point out that price inquiries via our website may differ from the final offer!

Run entire data science workflows with high-speed GPU compute and parallelize data loading,
​​​​​​​data manipulation, and machine learning for 50X faster end-to-end data science pipelines.

WHY RAPIDS?

Today, data science and machine learning has become the world's largest compute segment. Modest improvements in the accuracy of analytics models translate into billions to the bottom line. To build the best models, data scientists toil to train, evaluate, iterate, and retrain for highly accurate results and performant models. With RAPIDS™, processes that took days take minutes, making it easier and faster to build and deploy value generating models.
Workflows have many iterations of transforming Raw Data into Training Data, which gets fed into many algorithm combinations,
​​​​​​​which undergo hyperparameter tuning to find the right combinations of models, model parameters, and data features for optimal accuracy and performance.

BUILDING A HIGH-PERFORMANCE ECOSYSTEM

RAPIDS is a suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs—and can reduce training times from days to minutes. Built on NVIDIA® CUDA-X AI™, RAPIDS unites years of development in graphics, machine learning, deep learning, high-performance computing (HPC), and more.

Faster Execution Time

Data science is all about speed to results. RAPIDS leverages NVIDIA CUDA® under the hood to accelerate your workflows by running the entire data science training pipeline on GPUs. This reduces training time and the frequency of model deployment from days to minutes.

Use the Same Tools

By hiding the complexities of working with the GPU and even the behind-the-scenes communication protocols within the data center architecture, RAPIDS creates a simple way to get data science done. As more data scientists use Python and other high-level languages, providing acceleration without code change is essential to rapidly improving development time.

Run Anywhere at Scale

RAPIDS can be run anywhere—cloud or on-prem. You can easily scale from a workstation to multi-GPU servers to multi-node clusters, as well as deploy it in production with Dask, Spark, MLFlow, and Kubernetes.

Lightning-Fast Performance on Big Data

Results show that GPUs provide dramatic cost and time-savings for small and large-scale Big Data analytics problems. Using familiar APIs like Pandas and Dask, at 10 terabyte scale, RAPIDS performs at up to 20x faster on GPUs than the top CPU baseline. Using just 16 NVIDIA DGX A100s to achieve the performance of 350 CPU-based servers, NVIDIA’s solution is 7x more cost effective while delivering HPC-level performance.

FASTER DATA ACCESS, LESS DATA MOVEMENT

Common data processing tasks have many steps (data pipelines), which Hadoop can’t handle efficiently. Apache Spark solved this problem by holding all the data in system memory, which allowed more flexible and complex data pipelines, but introduced new bottlenecks. Analyzing even a few hundred gigabytes (GB) of data could take hours if not days on Spark clusters with hundreds of CPU nodes. To tap the true potential of data science, GPUs have to be at the center of data center design, consisting of these five elements: compute, networking, storage, deployment, and software. Generally speaking, end-to-end data science workflows on GPUs are 10X faster than on CPUs.

Data Processing Evolution

RAPIDS EVERYWHERE

RAPIDS provides a foundation for a new high-performance data science ecosystem and lowers the barrier of entry for new libraries through interoperability. Integration with leading data science frameworks like Apache Spark, cuPY, Dask, and Numba, as well as numerous deep learning frameworks, such as PyTorch, TensorFlow, and Apache MxNet, help broaden adoption and encourage integration with others.

BlazingSQL is a high-performance distributed SQL engine in Python, built on RAPIDS to ETL massive datasets on GPUs.

Built on RAPIDS, NVTabular accelerates feature engineering and preprocessing for recommender systems on GPUs.

Based on Streamz, written in Python, and built on RAPIDS, cuStreamz accelerates streaming data processing on GPUs.

Integrated with RAPIDS, Plotly Dash enables real-time, interactive visual analytics of multi-gigabyte datasets even on a single GPU.

The RAPIDS Accelerator for Apache Spark provides a set of plug-ins for Apache Spark that leverage GPUs to accelerate processing via RAPIDS and UCX software.

TECHNOLOGY AT THE CORE

RAPIDS relies on CUDA primitives for low-level compute optimization but exposes that GPU parallelism and high-memory bandwidth through user-friendly Python interfaces. RAPIDS supports end-to-end data science workflows, from data loading and preprocessing to machine learning, graph analytics, and visualization. It’s a fully functional Python stack that scales to enterprise big-data use cases.

Data Loading and
​​​​​​​ Preprocessing

RAPIDS’s data loading, preprocessing, and ETL features are built on Apache Arrow for loading, joining, aggregating, filtering, and otherwise manipulating data, all in a pandas-like API familiar to data scientists. Users can expect typical speedups of 10X or greater.

Machine Learning

RAPIDS’s machine learning algorithms and mathematical primitives follow a familiar scikit-learn-like API. Popular tools like XGBoost, Random Forest, and many others are supported for both single GPU and large data center deployments. For large datasets, these GPU-based implementations can complete 10-50X faster than their CPU equivalents.

Graph Analytics

RAPIDS’s graph algorithms like PageRank and functions like NetworkX make efficient use of the massive parallelism of GPUs to accelerate analysis of large graphs by over 1000X. Explore up to 200 million edges on a single NVIDIA A100 Tensor Core GPU and scale to billions of edges on NVIDIA DGX™ A100 clusters

Visualization

RAPIDS’s visualization features support GPU-accelerated cross-filtering. Inspired by the JavaScript version of the original, it enables interactive and super-fast multi-dimensional filtering of over 100 million row tabular datasets.

Deep Learning Integration

While deep learning is effective in domains like computer vision, natural language processing, and recommenders, there are areas where its use isn’t mainstream. Tabular data problems, which consist of columns of categorical and continuous variables, commonly make use of techniques like XGBoost, gradient boosting, or linear models. RAPIDS streamlines preprocessing of tabular data on GPUs and provides a seamless handoff of data directly to any frameworks supporting DLPack, like PyTorch, TensorFlow, and MxNet. These integrations open up new opportunities for creating rich workflows, even those previously out of reason like feeding new features created from deep learning frameworks back into machine learning algorithms.

MODERN DATA CENTERS FOR DATA SCIENCE

There are five key ingredients to building AI-optimized data centers in the enterprise.
​​​​​​​The key to the design is placing GPUs ​​​​​​​at the center.

Compute

With their tremendous computational performance, systems with NVIDIA GPUs are the core compute building block for AI data centers. NVIDIA DGX systems deliver groundbreaking AI performance and can replace, on average, 50 dual-socket CPU servers. This is the first step to giving data scientists the industry’s most powerful tools for data exploration.

Software

By hiding the complexities of working with the GPU and the behind-the-scenes communication protocols within the data center architecture, RAPIDS creates a simple way to get data science done. As more data scientists use Python and other high-level languages, providing acceleration without code change is essential to rapidly improving development time

Networking

Remote direct memory access (RDMA) in NVIDIA Mellanox® network interface controllers (NICs), NCCL2 (NVIDIA collective communication library), and OpenUCX (an open-source point-to-point communication framework) has led to tremendous improvements in training speed. With RDMA allowing GPUs to communicate directly with each other across nodes at up to 100 gigabits per second (Gb/s), they can span multiple nodes and operate as if they were on one massive server.

Deployment

Enterprises are moving to Kubernetes and Docker containers for deploying pipelines at scale. Combining containerized applications with Kubernetes enables businesses to change priorities on what task is the most important and adds resiliency, reliability, and scalability to AI data centers

Storage

GPUDirect® Storage allows both NVMe and NVMe over Fabric (NVMe-oF) to read and write data directly to the GPU, bypassing the CPU and system memory. This frees up the CPU and system memory for other tasks, while giving each GPU access to orders of magnitude more data at up to 50 percent greater bandwidth.