簡單的 CUDA 程式:DeviceInfo

| | 0 Comments| 08:52
Categories:

在《nVidia CUDA API(下)》中有提到 CUDA SDK 裡有提供一些基本的裝置管理的界面,也就是像是「cudaSetDevice( int )」這一類的函式。基本上,這些 function 的功用、用法在該篇文章中講的應該是夠了∼

會在這邊另外提出來,是因為 Heresy 自己有碰到一些執行上問題。主要的問題是,在在主要顯示卡執行計算時間過久的 CUDA 程式的時候,可能會使得 Windows 螢幕的畫面沒辦法更新。這點在 nVidia 的 Forum 裡也有人提出來過(參考:《Updating desktop to be stopping while running CUDA》),而在其 FAQ 中,也有提及:

33. What is the maximum kernel execution time?
On Windows, individual GPU program launches have a maximum run time of around 5 seconds. Exceeding this time limit usually will cause a launch failure reported through the CUDA driver or the CUDA runtime, but in some cases can hang the entire machine, requiring a hard reset.
This is caused by the Windows "watchdog" timer that causes programs using the primary graphics adapter to time out if they run longer than the maximum allowed time.
For this reason it is recommended that CUDA is run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter.

而官方的建議,則就是讓 CUDA 程式在沒有顯示桌面的 GPU 上執行了∼

下面的程式的功能,主要有兩部分:第一部分是列出目前電腦上,可以執行 CUDA 的 device;而第二部分,則是設定 CUDA 使用的 device。

#include "stdio.h"
#include "cuda_runtime.h"

// output given cudaDeviceProp
void OutputSpec( const cudaDeviceProp sDevProp )
{
printf( "Device name: %s ", sDevProp.name );
printf( "Device memory: %d ", sDevProp.totalGlobalMem );
printf( " Memory per-block: %d ", sDevProp.sharedMemPerBlock );
printf( " Register per-block: %d ", sDevProp.regsPerBlock );
printf( " Warp size: %d ", sDevProp.warpSize );
printf( " Memory pitch: %d ", sDevProp.memPitch );
printf( " Constant Memory: %d ", sDevProp.totalConstMem );
printf( "Max thread per-block: %d ", sDevProp.maxThreadsPerBlock );
printf( "Max thread dim: ( %d, %d, %d ) ", sDevProp.maxThreadsDim[0], sDevProp.maxThreadsDim[1], sDevProp.maxThreadsDim[2] );
printf( "Max grid size: ( %d, %d, %d ) ", sDevProp.maxGridSize[0], sDevProp.maxGridSize[1], sDevProp.maxGridSize[2] );
printf( "Ver: %d.%d ", sDevProp.major, sDevProp.minor );
printf( "Clock: %d ", sDevProp.clockRate );
printf( "textureAlignment: %d ", sDevProp.textureAlignment );
}

void main()
{
// part1, check the number of device
int iDeviceCount = 0;
cudaGetDeviceCount( &iDeviceCount );
printf( "Number of GPU: %d ", iDeviceCount );
if( iDeviceCount == 0 )
{
printf( "No supported GPU " );
return;
}

// part2, output information of each device
for( int i = 0; i < iDeviceCount; i )
{
printf( " === Device %i === ", i );
cudaDeviceProp sDeviceProp;
cudaGetDeviceProperties( &sDeviceProp, i );
OutputSpec( sDeviceProp );
}

// part3, set CUDA to use the second device
cudaSetDevice( 1 );

// part4, do something
...

}

其中,void OutputSpec( const cudaDeviceProp sDevProp ) 這個函式是負責把 CUDA 的 device 資訊(以 cudaDeviceProp 為變數的格式)輸出用的,格式上 Heresy 也是隨便弄弄而已。

而在 main() 裡面的 part1,則是先取得 CUDA Device 的數量,然後在 part2 時,再透過迴圈把各個 device 的資訊各自取出來並輸出。以 Heresy 工作用的電腦來說,由於有安裝 8800GT 和 8800GTX 兩張顯示卡,所以執行這個程式,他會顯示出下面的資訊:

Number of GPU: 2


=== Device 0 ===
Device name: GeForce 8800 GT
Device memory: 536543232
Memory per-block: 16384
Register per-block: 8192
Warp size: 32
Memory pitch: 262144
Constant Memory: 65536
Max thread per-block: 512
Max thread dim: ( 512, 512, 64 )
Max grid size: ( 65535, 65535, 1 )
Ver: 1.1
Clock: 1512000
textureAlignment: 256

=== Device 1 ===
Device name: GeForce 8800 GTX
Device memory: 805044224
Memory per-block: 16384
Register per-block: 8192
Warp size: 32
Memory pitch: 262144
Constant Memory: 65536
Max thread per-block: 512
Max thread dim: ( 512, 512, 64 )
Max grid size: ( 65535, 65535, 1 )
Ver: 1.0
Clock: 1350000
textureAlignment: 256

這裡也可以發現,兩張卡除了基本的版本、時脈不一樣外,不同的地方,好像也只剩下記憶體大小了!在 CUDA 的一些參數方面,都是完全一樣的∼

而 part3 的 cudaSetDevice( 1 ); 則就是設定讓 CUDA driver 去使用第二個 device(第一個是 0)去做 CUDA 程式的運算;如此,在 part4 的其他運算,就不會在主要顯示裝置上計算了∼

而由於目前 CUDA 已經有 1.0/1.1 兩個版本了,所以如果要確定裝置支援哪個版本的話,也可以透過 cudaDeviceProp.majorcudaDeviceProp.minor 這兩個變數來做識別。

不過,一般人應該不會有兩張 8 系列顯示卡就是了… ^^"

Leave a Reply

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *