What's new in this version: New Features: - This release introduces Heterogeneous Memory Management (HMM), allowing seamless sharing of data between host memory and accelerator devices. HMM is supported on Linux only and requires a recent kernel (6.1.24+ or 6.2.11+). - HMM requires the use of NVIDIA’s GPU Open Kernel Modules driver
As this is the first release of HMM, some limitations exist: - GPU atomic operations on file-backed memory are not yet supported - Arm CPUs are not yet supported - HugeTLBfs pages are not yet supported on HMM (this is an uncommon scenario) - The fork() system call is not fully supported yet when attempting to share GPU-accessible memory between parent and child processes - HMM is not yet fully optimized, and may perform slower than programs using cudaMalloc(), cudaMallocManaged(), or other existing CUDA memory management APIs. The performance of programs not using HMM will not be affected. - The Lazy Loading feature (introduced in CUDA 11.7) is now enabled by default on Linux with the 535 driver. To disable this feature on Linux, set the environment variable CUDA_MODULE_LOADING=EAGER before launch. Default enablement for Windows will happen in a future CUDA driver release. To enable this feature on Windows, set the environment variable CUDA_MODULE_LOADING=LAZY before launch. - Host NUMA memory allocation: Allocate a CPU memory targeting a specific NUMA node using either the CUDA virtual memory management APIs or the CUDA stream-ordered memory allocator. Applications must ensure device accesses to pointer backed by HOST allocations from these APIs are performed only after they have explicitly requested accessibility for the memory on the accessing device. It is undefined behavior to access these host allocations from a device without accessibility for the address range, regardless of whether the device supports pageable memory access or not. - Added per-client priority mapping at runtime for CUDA Multi-Process Service (MPS). This allows multiple processes running under MPS to arbitrate priority at a coarse-grained level between multiple processes without changing the application code. - We introduce a new environment variable CUDA_MPS_CLIENT_PRIORITY, which accepts two values: NORMAL priority, 0, and BELOW_NORMAL priority, 1.
CUDA Compilers: - LibNVVM samples have been moved out of the toolkit and made publicly available on GitHub as part of the NVIDIA/cuda-samples project. Similarly, the nvvmir-samples have been moved from the nvidia-compiler-sdk project on GitHub to the new location of the libNVVM samples in the NVIDIA/cuda-samples project. - Resolved potential soft lock-ups around rm_run_nano_timer_callback(). A Linux kernel device driver API used for timer management in the Linux kernel interface of the NVIDIA GPU driver was susceptible to a race condition under multi-GPU configurations. - Fixed potential GSP-RM hang in kernel_resolve_address(). - Removed potential GPUDirect RDMA driver crash in nvidia_p2p_put_pages(). The legacy non-persistent memory APIs allow third party driver to invoke nvidia_p2p_put_pages with a stale page_table pointer, which has already been freed by the RM callback as part of the process shutdown sequence. This behavior was broken when persistent memory support was added to the legacy nvidia_p2p APIs. We resolved the issue by providing new APIs: nvidia_p2p_get/put_pages_persistent for persistent memory. Thus, the original behavior of the legacy APIs for non-persistent memory is restored. This is essentially a change in the API, so although the nvidia-peermem is updated accordingly, external consumers of persistent memory mapping will need to be changed to use the new dedicated APIs. - Resolved an issue in watchcat syscall. - Fixed potential incorrect results in optimized code under high register pressure. NVIDIA has found that under certain rare conditions, a register spilling optimization in PTXAS could result in incorrect compilation results. This issue is fixed for offline compilation (non-JIT) in the CUDA 12.2 release and will be fixed for JIT compilation in the next enterprise driver update. - NVIDIA believes this issue to be extremely rare, and applications relying on JIT that are working successfully should not be affected
NVIDIA CUDA Toolkit 12.2.0 (for Windows 10) 相關參考資料
CUDA Installation Guide for Microsoft Windows
2024年1月2日 — CUDA Installation Guide for Microsoft Windows. The installation instructions for the CUDA Toolkit on Microsoft Windows systems.
https://docs.nvidia.com
CUDA Toolkit 12.2 Downloads
CUDA Toolkit 12.2 Downloads. Select Target Platform. Click on the green buttons that describe your target platform. Only supported platforms will be shown.
https://developer.nvidia.com
CUDA Toolkit 12.3 Update 2 Downloads
CUDA Toolkit 12.3 Update 2 Downloads. Select Target Platform. Click on the green buttons that describe your target platform. Only supported platforms will ...
https://developer.nvidia.com
Cuda Toolkit :: Anaconda.org
沒有這個頁面的資訊。
https://anaconda.org
CUDA Toolkit Archive
CUDA Toolkit Archive. Previous releases of the CUDA Toolkit, GPU Computing SDK, documentation and developer drivers can be found using the links ...
https://developer.nvidia.com
Downloading NVIDIA CUDA Toolkit 12.2.0 (for Windows 10 ...
NVIDIA CUDA Toolkit provides a development environment for creating high-performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, ...
https://www.filehorse.com
Win10 安裝CUDA、cuDNN 教學. 上一篇有介紹 ...
2020年10月21日 — 安裝流程分為以下步驟,作業系統使用Windows 10. 如何選擇安裝的版本; 安裝 ... lib 複製到C:-Program Files-NVIDIA GPU Computing Toolkit-CUDA-v10.0 ...
https://medium.com
|