What's new in this version: New Features: - This release introduces Heterogeneous Memory Management (HMM), allowing seamless sharing of data between host memory and accelerator devices. HMM is supported on Linux only and requires a recent kernel (6.1.24+ or 6.2.11+). - HMM requires the use of NVIDIA’s GPU Open Kernel Modules driver
As this is the first release of HMM, some limitations exist: - GPU atomic operations on file-backed memory are not yet supported - Arm CPUs are not yet supported - HugeTLBfs pages are not yet supported on HMM (this is an uncommon scenario) - The fork() system call is not fully supported yet when attempting to share GPU-accessible memory between parent and child processes - HMM is not yet fully optimized, and may perform slower than programs using cudaMalloc(), cudaMallocManaged(), or other existing CUDA memory management APIs. The performance of programs not using HMM will not be affected. - The Lazy Loading feature (introduced in CUDA 11.7) is now enabled by default on Linux with the 535 driver. To disable this feature on Linux, set the environment variable CUDA_MODULE_LOADING=EAGER before launch. Default enablement for Windows will happen in a future CUDA driver release. To enable this feature on Windows, set the environment variable CUDA_MODULE_LOADING=LAZY before launch. - Host NUMA memory allocation: Allocate a CPU memory targeting a specific NUMA node using either the CUDA virtual memory management APIs or the CUDA stream-ordered memory allocator. Applications must ensure device accesses to pointer backed by HOST allocations from these APIs are performed only after they have explicitly requested accessibility for the memory on the accessing device. It is undefined behavior to access these host allocations from a device without accessibility for the address range, regardless of whether the device supports pageable memory access or not. - Added per-client priority mapping at runtime for CUDA Multi-Process Service (MPS). This allows multiple processes running under MPS to arbitrate priority at a coarse-grained level between multiple processes without changing the application code. - We introduce a new environment variable CUDA_MPS_CLIENT_PRIORITY, which accepts two values: NORMAL priority, 0, and BELOW_NORMAL priority, 1.
CUDA Compilers: - LibNVVM samples have been moved out of the toolkit and made publicly available on GitHub as part of the NVIDIA/cuda-samples project. Similarly, the nvvmir-samples have been moved from the nvidia-compiler-sdk project on GitHub to the new location of the libNVVM samples in the NVIDIA/cuda-samples project. - Resolved potential soft lock-ups around rm_run_nano_timer_callback(). A Linux kernel device driver API used for timer management in the Linux kernel interface of the NVIDIA GPU driver was susceptible to a race condition under multi-GPU configurations. - Fixed potential GSP-RM hang in kernel_resolve_address(). - Removed potential GPUDirect RDMA driver crash in nvidia_p2p_put_pages(). The legacy non-persistent memory APIs allow third party driver to invoke nvidia_p2p_put_pages with a stale page_table pointer, which has already been freed by the RM callback as part of the process shutdown sequence. This behavior was broken when persistent memory support was added to the legacy nvidia_p2p APIs. We resolved the issue by providing new APIs: nvidia_p2p_get/put_pages_persistent for persistent memory. Thus, the original behavior of the legacy APIs for non-persistent memory is restored. This is essentially a change in the API, so although the nvidia-peermem is updated accordingly, external consumers of persistent memory mapping will need to be changed to use the new dedicated APIs. - Resolved an issue in watchcat syscall. - Fixed potential incorrect results in optimized code under high register pressure. NVIDIA has found that under certain rare conditions, a register spilling optimization in PTXAS could result in incorrect compilation results. This issue is fixed for offline compilation (non-JIT) in the CUDA 12.2 release and will be fixed for JIT compilation in the next enterprise driver update. - NVIDIA believes this issue to be extremely rare, and applications relying on JIT that are working successfully should not be affected
NVIDIA CUDA Toolkit 12.2.0 (for Windows 11) 相關參考資料
CUDA 12.2 Release Notes
2023年7月31日 — Starting with CUDA 11, the various components in the toolkit are ... NVIDIA Windows Driver. 536.67. x86_64 (Windows). Windows, WSL. CUDA Driver.
https://docs.nvidia.com
CUDA 12.3 Update 2 Release Notes
** CUDA 11.0 was released with an earlier driver version, but by upgrading to Tesla Recommended Drivers 450.80.02 (Linux) / 452.39 (Windows), minor version ...
https://docs.nvidia.com
CUDA Installation Guide for Microsoft Windows
CUDA Installation Guide for Microsoft Windows. The installation instructions for the CUDA Toolkit on Microsoft Windows systems.
https://docs.nvidia.com
Cuda Libraries are missing in Windows 11 and Cuda 12.2 ...
2023年9月28日 — Cuda Libraries are missing in Windows 11 and Cuda 12.2 Toolkit ... CUDA SETUP: Solution 2b): Install desired CUDA version to desired location.
https://forums.developer.nvidi
CUDA Toolkit 12.0 Downloads
CUDA Toolkit 12.0 Downloads. Select Target Platform. Click on the green buttons that describe your target platform. Only supported platforms will be shown.
https://developer.nvidia.com
CUDA Toolkit 12.2 Downloads
Operating System. Linux Windows. Architecture. x86_64 ppc64le arm64-sbsa ... 11. Server 2019. Server 2022. Installer Type. exe (local) exe (network). Installer ...
https://developer.nvidia.com
CUDA Toolkit 12.3 Update 2 Downloads
CUDA Toolkit 12.3 Update 2 Downloads. Select Target Platform. Click on the green buttons that describe your target platform. Only supported platforms will ...
https://developer.nvidia.com
CUDA Toolkit Archive
CUDA Toolkit Archive. Previous releases of the CUDA Toolkit, GPU Computing SDK, documentation and developer drivers can be found using the links ...
https://developer.nvidia.com
|