I needed to compile and debug a CUDA ".cu" file on my Windows 10 Pro with a Nvidia Quadra 2000 GPU card (yes, an old HP Z600 workstation with an old GPU). Installing CUDA 10.2, the last Quadro 2000 driver (377.83 from 2017) with Visual Studio 2019 worked great.
Until I needed to debug.
Nsight unable to debug error
Break points ignored. Nsight message: A CUDA context was created on a GPU that is not currently debuggable. Breakpoints will be disabled.
Adapter: Quadro 2000 |
After some trail and error (errors at the bottom of this blog entry) I have a working environment that can debug in Visual Studio and compile on the command line. This required going back in time and using executables released more than five years ago. Luckily all of the problems I encounter have been long solved and easily searched online.
I first had to determine that the Nvidia Quadro 2000 GPU was a Fermi microarchitecture (https://en.wikipedia.org/wiki/CUDA). The last Nsight version (used for CUDA debugging) that works with Fermi is Nsight 4.7, which comes with CUDA 7.5. This means features in GPUs > Fermi will not work in this environment (i.e. feature callMallocManaged
).
Install
Verify Visual Studio 2013 works with CUDA .cu files and GPU
Following: https://riptutorial.com/cuda
In VS2013 Menu Bar: File -> Open -> Project Solution, Open the CUDA samples Samples_vs2013.sln
.
- In the Solution Explorer (panel on right hand side), Highlight 1_Utilities -> DeviceQuery
- Right-click, Build Solution
- VS2013 Menu Bar: DEBUG -> Start Without Debugging
Result (success) DeviceQuery.cpp
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery\../../bin/win64/Debug/deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro 2000"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 4) Multiprocessors, ( 48) CUDA Cores/MP: 192 CUDA Cores
GPU Max Clock rate: 1251 MHz (1.25 GHz)
Memory Clock rate: 1304 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 15 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro 2000
Result = PASS
Press any key to continue . . .
|
Verify Visual Studio 2013 CUDA debugging
Creating a new CUDA project - https://www.youtube.com/watch?v=2EbHSCvGFM0
VS2013 Menu Bar: File -> New Project -> Installed -> Templates -> NVIDIA -> CUDA 7.5
Name the Project checkDebug, click OK

Build and run the kernel.cu
without debugging (let's see it works first).
- VS2013 Menu Bar: BUILD -> Build Solution
- VS2013 Menu Bar: DEBUG -> Start Without Debugging
Result (success): checkDebug
{1,2,3,4,5} + {10,20,30,40,50} = {11,22,33,44,55}
Press any key to continue . . . |
Run with debugging
- In
kernel.cu
, set a breakpoint on line 18: const int arraySize = 5;
- (highlight code F9, or right click in far left column)
- VS2013 Menu Bar: NSIGHT -> Start CUDA Debugging
Nsight will start (will require Administrator privileges to launch), the green icon will appear in the taskbar.

Press F10 to step through the lines. The Autos and Call Stack window should update appropriately
Result (success)
{1,2,3,4,5} + {10,20,30,40,50} = {11,22,33,44,55} |
Create custom CUDA code in Visual Studio 2013
Try CPU code only
vectorAddCPU.cu
#include <stdio.h>
#define SIZE 1024
void VectorAdd(int *a, int *b, int *c, int n)
{
int i;
for (i=0; i < n; ++i)
c[i] = a[i] + b[i];
}
int main()
{
int *a, *b, *c;
a = (int *)malloc(SIZE * sizeof(int));
b = (int *)malloc(SIZE * sizeof(int));
c = (int *)malloc(SIZE * sizeof(int));
for (int i = 0; i < SIZE; ++i)
{
a[i] = i;
b[i] = i;
c[i] = 0;
}
VectorAdd(a, b, c, SIZE);
for (int i = 0; i < 10; ++i)
printf("c[%d] = %d\n", i, c[i]);
free(a);
free(b);
free(c);
return 0;
}
|
- VS2013 Menu Bar: BUILD -> BUILD
Build (success): vectorAddCPU
1>------ Build started: Project: vectorAddCPU, Configuration: Debug Win32 ------
1> Compiling CUDA source file kernel.cu...
1>
1> E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddCPU\vectorAddCPU>"E:\CUDA\CUDA_v7.5\Toolkit\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "E:\Microsoft Visual Studio\2013\VC\bin" -IE:\CUDA\CUDA_v7.5\Toolkit\include -IE:\CUDA\CUDA_v7.5\Toolkit\include -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o Debug\kernel.cu.obj "E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddCPU\vectorAddCPU\kernel.cu"
1> kernel.cu
1> vectorAddCPU.vcxproj -> E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddCPU\Debug\vectorAddCPU.exe
1> copy "E:\CUDA\CUDA_v7.5\Toolkit\bin\cudart*.dll" "E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddCPU\Debug\"
1> E:\CUDA\CUDA_v7.5\Toolkit\bin\cudart32_75.dll
1> E:\CUDA\CUDA_v7.5\Toolkit\bin\cudart64_75.dll
1> 2 file(s) copied.
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== |
Run vectorAddCPU (aka. kernel.cu
)
- VS2013 Menu Bar: DEBUG -> Start Without Debugging
Result (success): run vectorAddCPU.exe
c[0] = 0
c[1] = 2
c[2] = 4
c[3] = 6
c[4] = 8
c[5] = 10
c[6] = 12
c[7] = 14
c[8] = 16
c[9] = 18
Press any key to continue . . .
|
Try GPU code
- VS2013 Menu Bar: File -> New Project -> Installed -> Templates -> NVIDIA -> CUDA 7.5
- Name the Project helloWorldGPU, click OK
HelloWorldGPU.cu
#include<stdio.h>
#include<stdlib.h>
__global__ void print_from_gpu(void) {
printf("Hello World! from thread [%d,%d] \
From device\n", threadIdx.x, blockIdx.x);
}
int main(void) {
printf("Hello World from host!\n");
print_from_gpu << <1, 1 >> >();
cudaDeviceSynchronize();
return EXIT_SUCCESS;
}
|
Run HelloWorldGPU.cu
Result (success): HelloWorldGPU.exe
Hello World from host!
Hello World! from thread [0,0] From device
Press any key to continue . . .
|
Verify CUDA from command line
If missing, find the Visual Studio 2013 command prompt (https://stackoverflow.com/questions/21476588/where-is-developer-command-prompt-for-vs2013 )
Look in: C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Visual Studio 2013
Verify that CUDA appears in the PATH
>echo %PATH%
CUDA in VS2013 Command Prompt Path
E:\Microsoft Visual Studio\2013>echo %PATH%
E:\Microsoft Visual Studio\2013\Common7\IDE\CommonExtensions\Microsoft\TestWindow;C:\Program Files (x86)\Microsoft SDKs\F#\3.1\Framework\v4.0\;C:\Program Files (x86)\Microsoft SDKs\TypeScript\1.0;C:\Program Files (x86)\MSBuild\12.0\bin;E:\Microsoft Visual Studio\2013\Common7\IDE\;E:\Microsoft Visual Studio\2013\VC\BIN;E:\Microsoft Visual Studio\2013\Common7\Tools;C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319;E:\Microsoft Visual Studio\2013\VC\VCPackages;C:\Program Files (x86)\HTML Help Workshop;E:\Microsoft Visual Studio\2013\Team Tools\Performance Tools;C:\Program Files (x86)\Windows Kits\8.1\bin\x86;C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\;E:\CUDA\CUDA_v7.5\Toolkit\bin;E:\CUDA\CUDA_v7.5\Toolkit\libnvvp;;;;;;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;E:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\bin\Hostx64\x64;C:\Users\adminroot\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program Files\Microsoft SQL Server\110\Tools\Binn\;C:\Program Files (x86)\Microsoft SDKs\TypeScript\1.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Microsoft SQL Server\120\Tools\Binn\;C:\Users\aholi_000\AppData\Local\Microsoft\WindowsApps;
E:\Microsoft Visual Studio\2013>
|
With the VS2013 command prompt, verify nvcc
works
>nvcc --version
Result (success) nvcc --version
E:\Microsoft Visual Studio\2013>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:49:10_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
E:\Microsoft Visual Studio\2013>
|
Navigate to where your CUDA 7.5 Samples
is stored, into the 1_Utililties
directory, in my case: E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery
Navigate to deviceQuery.cu
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>dir
Volume in drive E is Data
Volume Serial Number is AAD3-8C76
Directory of E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery
04/30/2020 10:02 AM <DIR> .
04/30/2020 10:02 AM <DIR> ..
05/27/2015 04:39 PM 13,208 deviceQuery.cpp
08/16/2015 02:32 PM 871 deviceQuery_vs2010.sln
08/16/2015 02:32 PM 4,712 deviceQuery_vs2010.vcxproj
08/16/2015 02:32 PM 871 deviceQuery_vs2012.sln
08/16/2015 02:32 PM 4,757 deviceQuery_vs2012.vcxproj
04/29/2020 01:38 PM 18,219,008 deviceQuery_vs2013.sdf
04/29/2020 11:58 AM 950 deviceQuery_vs2013.sln
04/27/2020 09:39 AM 4,753 deviceQuery_vs2013.vcxproj
08/16/2015 02:32 PM 176 readme.txt
04/29/2020 11:12 AM <DIR> x64
13 File(s) 18,449,834 bytes
3 Dir(s) 1,450,845,192,192 bytes free
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>
|
Try running and compiling CUDA sample DeviceQuery.cpp
>nvcc -o testDevQuery deviceQuery.cpp
Error: fatal error C1083: Cannot open include file: 'helper_cuda.h': No such file or directory
nvcc compile failure: missing helper_cuda.h
E:\Microsoft Visual Studio\2013>cd \CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>dir
Volume in drive E is Data
Volume Serial Number is AAD3-8C76
Directory of E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery
04/29/2020 01:38 PM <DIR> .
04/29/2020 01:38 PM <DIR> ..
05/27/2015 04:39 PM 13,208 deviceQuery.cpp
08/16/2015 02:32 PM 871 deviceQuery_vs2010.sln
08/16/2015 02:32 PM 4,712 deviceQuery_vs2010.vcxproj
08/16/2015 02:32 PM 871 deviceQuery_vs2012.sln
08/16/2015 02:32 PM 4,757 deviceQuery_vs2012.vcxproj
04/29/2020 01:38 PM 18,219,008 deviceQuery_vs2013.sdf
04/29/2020 11:58 AM 950 deviceQuery_vs2013.sln
04/27/2020 09:39 AM 4,753 deviceQuery_vs2013.vcxproj
04/29/2020 12:14 PM 198,144 devQuery.exe
04/29/2020 12:14 PM 648 devQuery.exp
04/29/2020 12:14 PM 1,736 devQuery.lib
08/16/2015 02:32 PM 176 readme.txt
04/29/2020 11:12 AM <DIR> x64
12 File(s) 18,449,834 bytes
3 Dir(s) 1,450,922,340,352 bytes free
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>nvcc -o devQuery deviceQuery.cpp
deviceQuery.cpp
deviceQuery.cpp(20) : fatal error C1083: Cannot open include file: 'helper_cuda.h': No such file or directory
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>
|
Include the header files with nvcc
.
>nvcc -o devQuery -I E:\CUDA\CUDA_v7.5\Samples\common\inc deviceQuery.cpp
nvcc
compile success
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>nvcc -o devQuery -I E:\CUDA\CUDA_v7.5\Samples\common\inc deviceQuery.cpp
deviceQuery.cpp
Creating library devQuery.lib and object devQuery.exp
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>dir
Volume in drive E is Data
Volume Serial Number is AAD3-8C76
Directory of E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery
04/30/2020 10:02 AM <DIR> .
04/30/2020 10:02 AM <DIR> ..
05/27/2015 04:39 PM 13,208 deviceQuery.cpp
08/16/2015 02:32 PM 871 deviceQuery_vs2010.sln
08/16/2015 02:32 PM 4,712 deviceQuery_vs2010.vcxproj
08/16/2015 02:32 PM 871 deviceQuery_vs2012.sln
08/16/2015 02:32 PM 4,757 deviceQuery_vs2012.vcxproj
04/29/2020 01:38 PM 18,219,008 deviceQuery_vs2013.sdf
04/29/2020 11:58 AM 950 deviceQuery_vs2013.sln
04/27/2020 09:39 AM 4,753 deviceQuery_vs2013.vcxproj
04/30/2020 10:04 AM 198,144 devQuery.exe
04/30/2020 10:04 AM 648 devQuery.exp
04/30/2020 10:04 AM 1,736 devQuery.lib
04/30/2020 10:02 AM 0 nvcc
08/16/2015 02:32 PM 176 readme.txt
04/29/2020 11:12 AM <DIR> x64
13 File(s) 18,449,834 bytes
3 Dir(s) 1,450,922,340,352 bytes free
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>
|
Run devQuery.exe
Result (success): devQuery.exe
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>devQuery.exe
devQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro 2000"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 4) Multiprocessors, ( 48) CUDA Cores/MP: 192 CUDA Cores
GPU Max Clock rate: 1251 MHz (1.25 GHz)
Memory Clock rate: 1304 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 15 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro 2000
Result = PASS
E:\CUDA\CUDA_v7.5\Samples\1_Utilities\deviceQuery>
|
Create and Run New Project from Command Line
hello.c
#include <stdio.h>
// __global__ functions, or "kernels", execute on the device
__global__ void hello_kernel(void)
{
printf("Hello, world from the device!\n");
}
int main(void)
{
// greet from the host
printf("Hello, world from the host!\n");
// launch a kernel with a single thread to greet from the device
hello_kernel<<<1,1>>>();
// wait for the device to finish so that we see the message
cudaDeviceSynchronize();
return 0;
}
|
Compile
>nvcc -o hello hello.cu
Build (success) hello.cu
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\hello>nvcc -o hello hello.cu
hello.cu
Creating library hello.lib and object hello.exp
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\hello>
|
Run
>nvcc -o hello hello.cu
Result (success) hello.cu
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\hello>hello
Hello, world from the host!
Hello, world from the device!
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\hello>
|
Errors
Nsight debug problems
Nsight unable to debug error
Break points ignored. Nsight message: A CUDA context was created on a GPU that is not currently debuggable. Breakpoints will be disabled.
Adapter: Quadro 2000 |
My Nsight 5.2 doesn't work with Fermi family GPUs (i.e. Quadro 2000)
https://stackoverflow.com/questions/43030274/a-cuda-context-was-created-on-a-gpu-that-is-not-currently-debuggable
Missing #include <stdio.h>
nvcc
compile (fails): missing stdio.h
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\test>nvcc -o test test.cu
test.cu
test.cu(8): error: identifier "printf" is undefined
1 error detected in the compilation of "C:/Users/AHOLI_~1/AppData/Local/Temp/tmpxft_000020dc_00000000-6_test.cpp4.ii".
|
Correct code
#include <stdio.h>
__global__ void foo() {}
int main()
{
foo<<<1,1>>>();
cudaDeviceSynchronize();
printf("CUDA error: %s\n", cudaGetErrorString(cudaGetLastError()));
return 0;
}
|
test.bat
nvcc -o test test.cu |
Executes normally.
Result (success): test.exe
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\test>test.bat
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\test>nvcc -o test test.cu
test.cu
Creating library test.lib and object test.exp
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\test>test
CUDA error: no error
E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\test>
|
CUDA 7.5 cannot use cudaMallocManaged
Using the revised code from https://www.youtube.com/watch?v=2EbHSCvGFM0
Convert VectorAddCPU.cu
to VectorAddGPU.cu
VectorAddGPU.cu
#include <stdio.h>
#define SIZE 1024
__global__ void VectorAdd(int *a, int *b, int *c, int n)
{
int i = threadIdx.x;
if (i < n)
c[i] = a[i] + b[i];
}
__global__ void print_from_gpu(void) {
printf("Hello World! from thread [%d,%d] \
From device\n", threadIdx.x, blockIdx.x);
}
int main()
{
int *a, *b, *c;
printf("Hello World from host!\n");
print_from_gpu << <1, 1 >> >();
cudaDeviceSynchronize();
cudaMallocManaged(&a, SIZE * sizeof(int));
cudaMallocManaged(&b, SIZE * sizeof(int));
cudaMallocManaged(&c, SIZE * sizeof(int));
printf("passed cudaMallocManaged\n");
for (int i = 0; i < SIZE; ++i)
{
a[i] = i;
b[i] = i;
c[i] = 0;
}
printf("passed var addition");
VectorAdd <<<1, SIZE>>> (a, b, c, SIZE);
cudaDeviceSynchronize();
for (int i = 0; i < 10; ++i)
printf("c[%d] = %d\n", i, c[i]);
cudaFree(a);
cudaFree(b);
cudaFree(c);
return 0;
}
|
Builds correctly.
Build (success): VectorAddGPU.cu
1>------ Build started: Project: vectorAddGPU, Configuration: Debug Win32 ------
1> Compiling CUDA source file vectorAddGPU.cu...
1>
1> E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddGPU\vectorAddGPU>"E:\CUDA\CUDA_v7.5\Toolkit\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "E:\Microsoft Visual Studio\2013\VC\bin" -IE:\CUDA\CUDA_v7.5\Toolkit\include -IE:\CUDA\CUDA_v7.5\Toolkit\include -G --keep-dir Debug -maxrregcount=0 --machine 32 --compile -cudart static -g -DWIN32 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o Debug\vectorAddGPU.cu.obj "E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddGPU\vectorAddGPU\vectorAddGPU.cu"
1> vectorAddGPU.cu
1> vectorAddGPU.vcxproj -> E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddGPU\Debug\vectorAddGPU.exe
1> copy "E:\CUDA\CUDA_v7.5\Toolkit\bin\cudart*.dll" "E:\Projects\CudaTest (VisualStudio)\CUDA 7.5 - Check Dev Env\vectorAddGPU\Debug\"
1> E:\CUDA\CUDA_v7.5\Toolkit\bin\cudart32_75.dll
1> E:\CUDA\CUDA_v7.5\Toolkit\bin\cudart64_75.dll
1> 2 file(s) copied.
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== |
VS2012 Menu Bar: DEBUG -> Start Without Debugging
Correct answer should be as listed at top of blog for VectorAddCPU.cu
Does not return correct answer.
Result (failure): VectorAddGPU.exe
Hello World from host!
Hello World! from thread [0,0] From device
passed cudaMallocManaged
Press any key to continue . . .
|
Running in debug, VS2012 Menu Bar: NSIGHT -> Start CUDA Debugging
Fails on Line 37: a[i] = i;
Unhandled Exception
Unhandled exception at 0x004E152B in vectorAddGPU.exe: 0xC0000005: Access violation writing location 0x00000000. |

Variable "a
" is never initalized.
Related to Quadro 2000 is a CUDA 2.1 compute capability and cudaMallocManaged is CUDA >=3.0 compute capability
http://selkie.macalester.edu/csinparallel/modules/TimingCUDA/build/html/0-Introduction/Introduction.html
cudaMemcpy, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost
must be used instead of cudaMallocManaged
on CUDA 7.5
https://cuda-tutorial.readthedocs.io/en/latest/tutorials/tutorial01/
// Transfer data from host to device memory
cudaMemcpy(d_a, a, sizeof(float) * N, cudaMemcpyHostToDevice);
Links