Compile and Debug CUDA enabled mex-file using Visual Studio 2010

Yuanjun Xiong, MMLAB CUHK

Compile and Debug CUDA enabled mex-file using Visual Studio 2010

Why So Many Tools?

This should be the first question, actually. The answer is to provide more flexibility in coding.

When Mex meets Cuda.

Mex-files are c program that can be directly invoked by MATLAB, one of the most powerful math softwork on this planet. Usually, we can write a standard c file and use the utility provided by MATLAB to compile & link it. Have a taste of c code in a mex sorce.

#include "mex.h"
#include <stdio.h>
void mexFunction(int nlhs, mxArray *plhs[],
    int nrhs, const mxArray *prhs[]){
        printf("Hello, Matlab\n");
        return;
    }

Assuming you save this snippet as hello.c. Then in interactive console of Matlab, type and enter

mex hello.c

You will get the compiled mex file called hello.mexw64. You can invoke it in Matlab as

hello()

Then you get the output

Hello, Matlab

However, if we want to debug this little program, it becomes a little involved. Matlab doesn’t provide by itself a debugging tool, since mex is just an extension. Mathworks suggests using Visual Studio’s “attach to process” feature for debugging.

Things become more tangled when we add some CUDA code to the mex file. Matlab also has its tool to compile the CUDA enabled code and once again, no debug tools. Not very difficult to imagine the mass when host code is mixed with device code and we don’t have idea what’s happening in the code.

This is why I write this.

The Code

We try to compile this small program, say VS_CUDA_MEX.cu.

#include "mex.h"
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void test_cuda_vs(int a, int b){
    return;
}

void mexFunction(int nlhs, mxArray *plhs[],
    int nrhs, const mxArray *prhs[]){

        D_OUT = mxCreateDoubleMatrix(1,1,mxREAL);
        double* out = mxGetPr(D_OUT);
        test_cuda_vs<<<1,1>>>(1,2);
        return;

}

In matlab, we can use the command mex VS_CUDA_MEX.cu to compile it. Before doing this, remember to copy this file

%matlabroot%\toolbox\distcomp\gpu\extern\src\mex\win64\mexopts.bat

to the folder the source file resides, as suggested by Mathworks. You may refer to this for help setting up CUDA and other stuffs.

Build It with Visual Studio

As in debugging we will be using VS, why not we just compile the code with it? Now let’s try this.

1 Create the project

We should start by creating the project. Any config which can generate a DLL file will be fine. I use Visual Studio 2010, so we are creating a MFC DLL. In the IDE, click in the menu New Project->Visual C++->MFC DLL, name your project, and finish the new project wizard. For example, the project is named as MexTemp.

2 The C++ Interface

We need to export a function to communicate with the MATLAB. Similar things exists in every DLL file. Let’s do this step by step.

Add a new file named as mex_interface.cpp to the source folder.
Matlab mex file have a unified entry point called mexFunction(). We should write this function in the cpp file. For example, we write the code below. Since we gonna use CUDA, we add the CUDA headers.

#include "stdafx.h"
#include "mex.h"
#include <cuda.h>
#include <cuda_runtime.h>

extern "C" void tester(int);

void mexFunction(int nlhs, mxArray *plhs[],
    int nrhs, const mxArray *prhs[]){

#define D_OUT plhs[0]

#define OPT_DIST prhs[3]
        D_OUT = mxCreateDoubleMatrix(1,1,mxREAL);
        double* out = mxGetPr(D_OUT);
        tester(1);
        out[0] = 0;
        return;
}

Please refer to Matlab documentation on how to write mex functions.

3. Device Code And The Wrapper

You may wonder we got an extern "C" in the code. This is due to that we need use a cpp to export the mex function. However, the cuda device code must be written in files with .cu extensions. So we have write a “wrapper” to expose the device code to the mex function.

To do this, add a new file called my_kernel.cu to the source folder. In this file, we write the __global cuda function and the wrapper, say tester().

#include <cuda.h>
#include <cuda_runtime.h>
#include <stdio.h>

__global__ void test_cuda_vs(int a, int b){
    return;
}

extern "C" void tester(int a)
{
    int test = a;
    test_cuda_vs<<< 1, 1 >>> (1, test);
}

This example is very simple. It runs one thread and do nothing.

But it serves as a complete template for any new contents. The function __global__ test_cuda_vs(int a, int b) is the cuda kernel. Please refer to the CUDA Programming Guide for more details.

The host function tester() here acts as the wrapper for the device function. Thus we can call this function in the mex fucntion or anywhere else to implictly invoke the device function.

4. The Module Definition File

The export the entry point to MATLAB, add a new file called mexTemp.def and write

LIBRARY "MexTemp"

EXPORTS mexFunction

For now, we have get all the code we need. Let’s configure the project to compile it.

5. Project Configuration

It’s straightforward to run the build command in the Visual Studio to compile the code. But before that, we have to set some properties. Let’s go through this.

Setup a configuration
a. In the solution explorer (usually on the left side of the IDE), right click on the project name and click “Properties”. On the top of the pop up dialog you should see Configuration Manager. Open it. In the dropdown list active solution platform select <new>,then OK. Now we have the x64 platform.
b. Select the “x64” configure as the active platform.
Setup CUDA complier
a. Right click on the project name agian, click Build Customization and check CUDA 5.5(.targets, .props) . Name may vary w.r.t. your CUDA version. This teaches the VS to use NVCC to compile cu files.
b. Once again Properties-> CUDA C/C++ -> Device -> Code Generation, choose the computing model of your GPU.
c. Properties-> CUDA C/C++ -> Host -> Runtime Library, choose Multi-threaded Debug DLL (/MDd).
d. In Properties-> Linker -> Input -> Additional Dependencies add cudart.lib. You may also specify the lib and include path of CUDA in Properties -> VC++ Directories -> Include Directories and Properties -> VC++ Directories -> Library Directories.
Setup C++ compiler & linker
This part is even more involved, some problem might happen occassionally. So good luck!
a. In Properties-> Linker -> Input add these contents, note you should search these file and write actual position of them in your system.
C:\Program Files\MATLAB\R2013a\extern\lib\win64\microsoft\libmx.lib C:\Program Files\MATLAB\R2013a\extern\lib\win64\microsoft\libmex.lib C:\Program Files\MATLAB\R2013a\extern\lib\win64\microsoft\libmat.lib
b. In Properties-> Linker -> General -> Output File, change the name of the output file to $(OutDir)$(ProjectName).mexw64.
c. In Properties-> C/C++ -> General -> Additional Include Directories add
C:\Program Files\MATLAB\R2013a\extern\include.
d. In Properties-> C/C++ -> Code Generation -> Runtime Library, choose Multi-threaded Debug DLL (/MDd). This make the linking consistent between NVCC and the C++ linker.
e. In Properties-> Linker -> Command Line add addition flags /export:mexFunction /dll.
f. In Properties-> General -> Target Extension change the extension name to mexw64 to eliminate a possible warning.
g. Set the module file to .\mexTemp.def in Properties -> Linker -> Input -> Module Definition File

For now, you should be able to successfully build the project.

Debugging in Visual Studio with Nsight

Many thanks to NVidia for providing us Nsight, we can use it to debug the device code just like CPU code. I am not willing to even recall the stupid experience when I tried debugging my GLSL code.

First please download and install Nsight. Then add NSIGHT_CUDA_DEBUGGER=1 to the environment variables. Now let’s try it.

First open Matlab, locate to the mex file we have just generated. Try run it with mexTemp(). It should work fine. Then we try Nsight’s debugging functionalities.

In the IDE, add some breakpoints in the device function. Then open the menu Tool -> Attach To Process. In the pop-up switch the proxy to Nsight GPU Debugger. It should look like this.

In the Available Process list, attach the MATLAB Process. You may first set the Qualifier to be the name of your PC.
The IDE will now switch to debug interface. Try run the mex in MATLAB. It should correctly pause at the breakpoint you specified.

Now you can debug the code just like common cpp.

To also break on bp in the host function, attach the default proxy to MATLAB.

A Working Solution as Templete

Thanks for reading so many words. Here is a working solution template as a reward for your patience.

Download it here.

Usage:

Set a system environment variable matlabroot pointing to the install location of Matlab (say, C:\Program Files\MATLAB\R2013a). You may need to log off and log in to enable the variable.
Extract the content of the zip file.
In the folder, open mexFunction.sln. There should be two projects in the solution: mexFuntion and mexFunction_CUDA. The CUDA enabled project may not be loaded if you don’t have CUDA installed or running other versions. But the pure mex project should will be OK.
Rename the project to the function name you want to produce a right name for the mex file.

Integrate into your solutions

I have also provide a property sheet in \properties, called mex.props. You can import it in the property manager of VS. Then add a `cpp file for matlab mexFunction. It will teach the IDE to build mex files.

Look at these links for more reference

Trick and Tips: Debugging CUDA Matlab Mex file codes
StackOverflow: Debug cuda kernel on mexfunction by attaching Matlab with Nsight GPU Debuger

Contact me:
yjxiong@ie.cuhk.edu.hk

MMLAB, CUHK
Mar 5, 2014