CUDA Volume Rendering [Part.2 程式流程]

在前面 part.1 的部分，已經先做了一些簡單的介紹；接下來，開始看程式碼吧∼這一部分，主要是看 volumeRender.cu 中 main function，並大概講一下 volume data 的前置處理。

在 main() 函式裡，主要分成幾個部分：

透過 cutil 來處理執行時的參數。
讀取 volume data，並初始化 volume 資料
設定 glut 環境
起始設定 pixel buffer object
執行 glut 的 main loop

第一部分主要就是透過 cutil 的 cutGetCmdLineArgumenti() 這個函式，來處理 main() 的 argc 以及 argv。而這個程式是設計成可以指定檔案的檔名以及大小，例如：「-file=brain.raw -xsize=512 -ysize=512 -zsize=125」就是指定檔名為 brain.raw，x, y, z 三軸的大小依序為 512, 512, 125。

而第二部分則是先透過 cutFindFilePath() 去找出檔案完整的路徑，然後再透過 loadRawFile() 來讀取 RAW 資料成為一個 unsigned char 的陣列；這邊的 RAW 資料，原則上就是一張一張 2D 灰階圖組合成的單一檔案。接著，就是透過 initCuda()，來把讀取進來 uchar 陣列（本程式一開始就把 uchar 定義為 unsigned char，以下將以 uchar 沿用），轉換成 ray casting 需要的 3D Texture 了∼
（由於他是使用 cudaArray 來 bind 到 3D texture，所以建議可以先參考看看之前寫的《CUDA Texture Part.1 簡介》）

接下來，就是直接看 initCuda() 這個函式了∼這個函式裡做的事，主要包含了兩部分：

建立 volume 資料本身的 3D texture
建立將 volume 資料的灰階，對應到彩色的 transfer function，以及他的 1D texture

將 volume data 建立成 3D texture 的程式如下：

// create 3D array
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc&lt;uchar&gt;();
CUDA_SAFE_CALL( cudaMalloc3DArray(&amp;d_volumeArray, &amp;channelDesc, volumeSize) );
 
// The sample does not work with pageable memory 
// This is a known issue for beta that will be fixed for the public release 
#define USE_PAGE_LOCKED_MEMORY 1 
#if USE_PAGE_LOCKED_MEMORY 
// copy to page-locked mem 
cudaPitchedPtr pagelockedPtr; 
pagelockedPtr.pitch = volumeSize.width*sizeof(uchar); 
pagelockedPtr.xsize = volumeSize.width; 
pagelockedPtr.ysize = volumeSize.height; 
size_t size = volumeSize.width*volumeSize.height*volumeSize.depth*sizeof(uchar); 
CUDA_SAFE_CALL( cudaMallocHost(&amp;(pagelockedPtr.ptr), size) ); 
memcpy(pagelockedPtr.ptr, h_volume, size); 
#endif 
 
// copy data to 3D array 
cudaMemcpy3DParms copyParams = {0}; 
#if USE_PAGE_LOCKED_MEMORY 
copyParams.srcPtr   = pagelockedPtr; 
#else 
copyParams.srcPtr   = make_cudaPitchedPtr((void*)h_volume,
                               volumeSize.width*sizeof(uchar), volumeSize.width, volumeSize.height); 
#endif 
copyParams.dstArray = d_volumeArray;
copyParams.extent   = volumeSize; 
copyParams.kind     = cudaMemcpyHostToDevice; 
CUDA_SAFE_CALL( cudaMemcpy3D(&amp;copyParams) );
  
// set texture parameters 
tex.normalized = true;                      // access with normalized texture coordinates 
tex.filterMode = cudaFilterModeLinear;      // linear interpolation 
tex.addressMode[0] = cudaAddressModeClamp;  // wrap texture coordinates 
tex.addressMode[1] = cudaAddressModeClamp; 
 
// bind array to 3D texture 
CUDA_SAFE_CALL(cudaBindTextureToArray(tex, d_volumeArray, channelDesc));

在這邊，讀進來的資料是 h_volume，他是一個 uchar 的陣列。而和 2D CUDA Array 時一樣，先透過一個 cudaChannelFormatDesc 來描述 cudaArray 的資料型別，然後再透過 cudaMalloc3DArray() 來建立一個大小為 volumeSize 的 3D cudaArray：d_volumeArray。

不過，在註解裡有寫，目前的 2.0 beta 版，還不能直接把一般的記憶體空間的資料，直接複製到 3D 的 cudaArray 裡；必須要先透過 cudaMallocHost() 宣告一個 page lock 的記憶體空間，並把 h_volume 的資料複製到這個 page lock 的變數 pagelockedPtr。然後再設定 cudaMemcpy3DParms，把資料用 cudaMemcpy3D() 複製到 d_volumeArray。比起來，是多了一個先轉成 page lock 變數的過程，不過這個問題在正式版發佈時，應該是會解決的掉的。

而把資料複製到 cudaArray 後，就是設定一下 3D texture 的參數，然後再透過 cudaBindTextureToArray()，把 cudaArray d_volumeArray bind 到 texture<uchar, 3, cudaReadModeNormalizedFloat> tex 了。

而在把 Volume 資料本身處裡完了之後，接下來還有一份額外的陣列。由於在醫學的 volume rendering 中，CT、MRI 這些資料都是灰階的；如果要用彩色來呈現、凸顯某些部位的話，大部分都是用一個 transfer function 來做色彩的對應。

在 initCuda() 這個函式中的後段，就是在處理這份資料。

// create transfer function texture 
float4 transferFunc[] = {
     {  0.0, 0.0, 0.0, 0.0, },
     {  1.0, 0.0, 0.0, 1.0, },
     {  1.0, 0.5, 0.0, 1.0, },
     {  1.0, 1.0, 0.0, 1.0, },
     {  0.0, 1.0, 0.0, 1.0, },
     {  0.0, 1.0, 1.0, 1.0, },
     {  0.0, 0.0, 1.0, 1.0, },
     {  1.0, 0.0, 1.0, 1.0, },
     {  0.0, 0.0, 0.0, 0.0, },
};
 
cudaChannelFormatDesc channelDesc2 = cudaCreateChannelDesc&lt;float4&gt;();
cudaArray* d_transferFuncArray;
CUDA_SAFE_CALL(cudaMallocArray(
          &amp;d_transferFuncArray, &amp;channelDesc2, sizeof(transferFunc)/sizeof(float4), 1));  
CUDA_SAFE_CALL(cudaMemcpyToArray(
          d_transferFuncArray, 0, 0, transferFunc, sizeof(transferFunc), cudaMemcpyHostToDevice));
 
transferTex.filterMode = cudaFilterModeLinear; 
transferTex.normalized = true;    // access with normalized texture coordinates 
transferTex.addressMode[0] = cudaAddressModeClamp;   // wrap texture coordinates 
 
// Bind the array to the texture 
CUDA_SAFE_CALL( cudaBindTextureToArray( transferTex, d_transferFuncArray, channelDesc2));

在這個範例中，他是先宣告一個 float4 的陣列 transferFunc，裡面算是有九個控制點，分別代表灰階值不同所要呈現的階段顏色（介於中間的值，會透過 texture 用線性內插來算）。而這邊他就是用一般的 1D cudaArray 來做，也就不多加介紹了；最後會拿來用的，就是已經 bind 好資料的 texture<float4, 1, cudaReadModeElementType> transferTex。

再來的第三部分，則是透過 glut 來建立 OpenGL 的環境。

// initialize GLUT callback functions 
glutInit(&amp;argc, argv); 
glutInitDisplayMode(GLUT_RGB | GLUT_DOUBLE); 
glutInitWindowSize(width, height); 
glutCreateWindow(&quot;CUDA volume rendering&quot;); 
glutDisplayFunc(display); 
glutKeyboardFunc(keyboard); 
glutMouseFunc(mouse); 
glutMotionFunc(motion); 
glutReshapeFunc(reshape); 
glutIdleFunc(idle);
  
glewInit(); 
if (!glewIsSupported(&quot;GL_VERSION_2_0 GL_ARB_pixel_buffer_object&quot;)) {
     fprintf(stderr, &quot;Required OpenGL extensions missing.&quot;);
     exit(-1); 
}

前面兩行，是在起始化整個 OpenGL 的環境，而第三、四行則是在建立一個 width * height，標題為「CUDA volume rendering」的視窗。而之後的 glut*Func() 則是在設定不同的 callback function。而之後，則是做 glew 的起始化，並確認目前的 OpenGL 環境，是否有支援必要的 pixel buffer object。

而第四部分，就是在 initPixelBuffer() 這個函式中，建立用來當輸出結果的 pixel buffer object。

if (pbo) {
     // delete old buffer
     CUDA_SAFE_CALL(cudaGLUnregisterBufferObject(pbo));
     glDeleteBuffersARB(1, &amp;pbo); 
}
  
// create pixel buffer object for display 
glGenBuffersARB(1, &amp;pbo); 
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pbo); 
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, width*height*sizeof(GLubyte)*4, 0, GL_STREAM_DRAW_ARB); 
glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
  
CUDA_SAFE_CALL(cudaGLRegisterBufferObject(pbo));
  
// calculate new grid size 
gridSize = dim3(iDivUp(width, blockSize.x), iDivUp(height, blockSize.y));

第一段的 if 是判斷 pbo 這個 buffer object（型別是 GLuint）是否已經被建立，如果有的話，就先把現有的刪除掉。而第二段就是透過 OpenGL 的函式，來建立大小為 width * height，每一點的資料是 4 個 GLubyte 的一塊 GL_PIXEL_UNPACK_BUFFER_ARB 了∼

而在 CUDA 的部分，則是需要透過 cudaGLRegisterBufferObject() 這個函式來註冊 pbo 這個 buffer object；如此，之後才能在 kernel 程式中存取透過 cudaGLMapBufferObject() 所取得的記憶體位址。

當使用完後，如果在 if 的區段中所做，除了必須要透過 OpenGL 的 glDeleteBuffersARB() 來把 buffer object 刪除外，在之前也需要使用 cudaGLUnegisterBufferObject() 來取消這份 buffer object 的註冊。

而在 main() 中最後的 glutMainLoop()，就是開始執行 OpenGL 的 main loop；之後，要顯示內容、控制程式，就是要靠之前設定的 callback function 囉∼

原始發表：Part.2 程式流程

CUDA Volume Rendering [Part.2 程式流程]

Leave a Reply 取消回覆

Related Posts

簡易的程式平行化－OpenMP（五） 變數的平行化

GitLab 簡單的 C++ 專案腳本範例

C++20 std::format part 3 – 自定義型別的處理

簡易的程式平行化－OpenMP（五）變數的平行化