ForceWare 185.65、Graphics Plus Power Pack #3 & nVidia CUDA 2.2 封測？

ForceWare 185.65

首先呢，nVidia 官方放出了新的顯示卡測試版驅動程式 ForceWare 185.65，主要是針對 GeForce 9、100、200 系列（GeForce 8 呢？ = =）。而更新呢？除了入了對新的 GeForce GTX 275 的支援外，還包含了對 CUDA 2.2 的支援！

檔案下載：Windows XP、Windows XP 64、Windows Vista 32、Windows Vista 64

至於驅動程式的其他變化，可以參考 Anandtach 的《ATI Radeon HD 4890 vs. NVIDIA GeForce GTX 275》一文，裡面有對 185 的驅動程式作一些描述。

Graphics Plus Power Pack #3

另外，nVidia Graphics Plus Power Pack 也追加了第三代，加入了：

Star Tales – Benchmark Demo

Sacred 2: Fallen Angel – PhysX Game Patch

PhysX Screensaver（之前也有，不過這次好像有給 Source code？）

Motion DSP’s vReveal – Try-and-Buy Demo

SETI@home

這幾項；不過目前只有英文網站上有，而且大部分的東西也都儘只是 demo 版。

CUDA 2.2 Beta

接下來，就是 CUDA 2.2 了∼CUDA 2.1 是在今年一月發布的，而現在又發布了新的 2.2 的 Beta 版！只不過，和以往都是公開測試不同，這次的 2.2 版，則是要註冊的開發者（免費註冊頁面）才能下載了…（2.2 封測？歡迎加入 CUDA online！（誤很大））

關於 CUDA 2.2 Beta 的詳細資料，可以參考官方論壇的《CUDA 2.2 features》一文，裡面也有提供 2.2 Beta 的 programmin guide 的下載。而功能更新的部分，大致如下：

支援 Zero-copy。
細節可以參考《Cuda 2.2 / Zero-copy access》一文，沒理解錯的話，就是以後 CUDA 可以直接透過 PCI-Express 存取 host 的 memory 了！不過，似乎只有 MCP7x 和 GT200 系列可以這樣玩。

在 Linux 下，一個 GPU 可以支援多個 contexts；不過 Windows 要等到 final 版才有。詳細請參考官方論壇的另一篇文章。

Vista 部分

支援 CUDA profiler

在 Vista 和 Server 2008 上支援 Asynchronous memcpy（雖然本來就有，但是在 Vista 下本來沒用）

CUDA profiler 支援更多在 GT200 上的計算∼包括了記憶體頻寬(counters for each transaction size) 和指令計算。這樣應該會更容易判斷出程式效能是卡在頻寬，還是卡在計算速度。

允許單一的 allocation 有 4GB 以上的 pinned memory（應該是指 page-locked memory）。不過 Vista 例外，他還是有 256MB 的限制，不過應該會在最終版的時候有所提升。

所有平台的 blocking sync。
這是一個 context creation 的 flag，可以用來取代 spinlocking 或 spinlocking yielding（這邊的名詞 Heresy 幾乎沒聽過了… @@）；當 thread 在等 GPU 的時候，這個 thread 會 sleep，直到完成後 driver 才會把他叫醒。這不是預設值，因為他可能會因為 OS 的排程而增加 latency，但是如果想要減少 CPU 使用量的話是很好用的。

一些新的 function

__brev(), __brevll() 32-bit and 64-bit bit reversal

__frcp_r{n,z,u,d}() single-precision reciprocal with IEEE rounding

__fsqrt_r{n,z,u,d}() single-precision square root with IEEE rounding

__fdiv_r{n,z,u,d}() single-precision division with IEEE rounding

__fadd_r{u,d}() single-precision addition with directed rounding

__fmul_r{u,d}() single-precision multiplication with directed rounding

__threadfence(): I’m not sure if there are docs for this yet–it’s kind of hard to explain, so I’m not going to comment too much about it here because I forget what its exact behavior is. （囧…這該說啥啊？）

Texturing from pitchlinear memory

增進 OGL interop 的效能。

Context creation flags 可以在 CUDART 中設定。

正式支援 Ubuntu 8.10、RHEL 5.3、Fedora 10

64-bit Linux 的 cuda-gdb

ForceWare 185.65、Graphics Plus Power Pack #3 & nVidia CUDA 2.2 封測？

Leave a Reply 取消回覆

Related Posts

在 Synology NAS 上跑 gitlab-runner

Boost C++ Libraries 簡介

拿 Docker 跑服務紀錄檔過大的問題