I'm trying to compile some dll files with some c++ and CUDA functions to quickly process some data that I receive in a python program (160MB/s from an acquisition card to be FFT). The DLL works fine with CUDA functions, but stops working whenever I add a function from the CUFFT library.
At the moment i have a .cu file with some simple CUDA functions as the example below shows.
#include <stdint.h>#include <iostream>#include <stdio.h>#include <stdbool.h>#include <inttypes.h>#include <cuda.h>#include <cuFFT.h>extern "C" void __declspec(dllexport) GPU_copyto(float *array, float *data_p, int offset, int ndata) // offset and ndata are in bytes{ offset/=2; ndata/=2; cudaMemcpy(data_p+offset, array+offset, ndata*sizeof(float), cudaMemcpyHostToDevice);}
I can compile the example into a .dll with the following command:
nvcc -lcufft -o cuda_avg.dll -shared cuda_average.cu
and then import it in python with:
cuda_avg_dll=ctypes.CDLL('./cuda_avg.dll', mode=ctypes.RTLD_GLOBAL)
With this simple flow I was able to use some simple CUDA functions in Python to allocate memory on the GPU, transfer data, do arithmetic and copy the data back.
The problem starts when I want to use the CUFFT library to compute a FFT of the data.Adding a function from the CUFFT package, like cufftPlan1d(), outputs an error neither from the compiler or linker. I get the following error when loading the DLL into Python.
Traceback (most recent call last): File "C:\Users\manip.batm\Desktop\shotnoise\20240123_SI_card_test\SI_cudaFFT\SI_cuda_TEST.py", line 21, in <module> cuda_avg_dll=ctypes.CDLL('./cuda_avg.dll', mode=ctypes.RTLD_GLOBAL) File "C:\Users\manip.batm\.conda\envs\qcodes\lib\ctypes\__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode)FileNotFoundError: Could not find module 'C:\Users\manip.batm\Desktop\shotnoise\20240123_SI_card_test\SI_cudaFFT\cuda_avg.dll' (or one of its dependencies). Try using the full path with constructor syntax.
I can use the following compiler command to generate a .exe file instead of a DLL.
nvcc -lcufft -o cuda_avg cuda_average.cu
It executes just fine and I can easily perform FFT on the GPU.
Can anyone tell me what I am doing wrong?
Is there any alternative to a DLL to quickly execute low level functions from a python program on windows?
I've seen on the CUDA documentation that there are both dynamically linked and statically linked libraries. Knowing that i have no idea what the difference is, can I generate a static library and load that into python? how can one do that?