三月初的時候,nVIDIA 發表了 CUDA 4.0 的 RC 版(參考《nVIDIA 發表 CUDA 4.0》),不過當時是只有提供給註冊的開發者下載;而過了一個月後,nVIDIA 終於推出 CUDA 4.0 RC2 ,讓所有使用者都可以下載使用了!他的官方頁面是:http://developer.nvidia.com/cuda-toolkit-40。
官方的更新資訊主要為:
- Easier Application Porting
- Share GPUs across multiple threads
- Use all GPUs in the system concurrently from a single host thread
- No-copy pinning of system memory, a faster alternative to cudaMallocHost()
- C new/delete and support for virtual functions
- Support for inline PTX assembly
- Thrust library of templated performance primitives such as sort, reduce, etc.
- NVIDIA Performance Primitives (NPP) library for image/video processing
- Layered Textures for working with same size/format textures at larger sizes and higher performance
- Faster Multi-GPU Programming
- Unified Virtual Addressing
- GPUDirect v2.0 support for Peer-to-Peer Communication
- New & Improved Developer Tools
- Automated Performance Analysis in Visual Profiler
- C debugging in cuda-gdb
- GPU binary disassembler for Fermi architecture (cuobjdump)
其實主要的變化,都和 RC1 時相同,所以在這邊就不贅述了。不過,在 4.0 SDK 裡,雖然不完整,但是有許多範例的專案都已經有提供 Visual Studio 2010 的版本了!這應該也代表了,終於可以用 Visual Stduio 2010 直接寫 CUDA 的程式了!接下來,就是希望等到正式版的時候,能有完整的 Visual Studio 2010 開發環境了。
Windows 版相關下載如下: