CCCL Runtime: A Modern C++ Runtime for CUDA

2026-06-22 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Advanced, long

Summary

NVIDIA has introduced CCCL runtime, a new set of idiomatic C++ APIs within its CUDA Core Compute Libraries (CCCL) designed to modernize fundamental CUDA programming model concepts. This runtime provides safer and more convenient abstractions for core CUDA functionalities like stream management, memory allocation, and kernel launches. It serves as an alternative to the traditional CUDA runtime, aligning with modern C++ features and incorporating lessons from 20 years of CUDA evolution. Key design principles include strong typing with dedicated "_ref" types for non-owning objects, explicit dependencies for local reasoning and improved composability, and asynchronous-by-default APIs, particularly for memory management via stream-ordered memory pools (available since CUDA 11.2, expanded in CUDA 13.0). The runtime also introduces kernel functors and automatic argument transformation for "cuda::buffer" to "cuda::std::span", enhancing compile-time configuration and reducing manual boilerplate.

Key takeaway

For AI Engineers and Machine Learning Engineers developing CUDA C++ applications, adopting NVIDIA's CCCL runtime can significantly enhance code safety and maintainability. You should transition to its modern C++ APIs for stream management, memory allocation, and kernel launches to leverage strong typing and explicit dependencies. This approach reduces runtime errors and improves composability, especially in complex multi-library projects. Consider incremental adoption using provided compatibility helpers to streamline your migration.

Key insights

CCCL runtime modernizes CUDA C++ development with safer, more convenient APIs through strong typing and explicit dependencies.

Principles

Use dedicated types, not raw identifiers.
Make dependencies explicit for composability.
APIs are asynchronous by default.

Method

The CCCL runtime proposes a workflow using "cuda::device_ref", "cuda::stream", "cuda::make_buffer" with memory pools, and "cuda::launch" with kernel functors for CUDA C++ development.

In practice

Adopt CCCL runtime incrementally with compatibility helpers.
Use "cuda::make_buffer" for stream-ordered memory.
Employ kernel functors for automatic template deduction.

Topics

CUDA C++
CCCL Runtime
GPU Programming
Memory Management
Kernel Launch
Modern C++

Code references

NVIDIA/cccl

Best for: NLP Engineer, Computer Vision Engineer, AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.