Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to run llama-cpp-python in a Docker Container?

$
0
0

I have a more conceptional question about running llama-cpp-python in a Docker Container. Following a lot of different tutorials I am more confused as in the beginning.

I have a Debian 12 Server with a CPU - Intel Core i7-7700 and a GPU - GeForce GTX 1080.

I installed on the host via the Debian via DEFAULT APT Repository and the Nvidia APT Repository the following components

  • linux-headers-amd64
  • nvidia-detect
  • nvidia-driver
  • nvidia-smi
  • linux-image-amd64
  • cuda

The Nvida driver is installed correctly which I can verify with

# nvidia-smiSun Mar 31 10:46:20 2024       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: N/A      ||-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  NVIDIA GeForce GTX 1080        Off |   00000000:01:00.0 Off |                  N/A || 36%   42C    P0             39W /  180W |       0MiB /   8192MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI        PID   Type   Process name                              GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||  No running processes found                                                             |+-----------------------------------------------------------------------------------------+

Next I build a Docker Image where I installed inside the following libraries:

  • jupyterlab
  • cuda-toolkit-12-3
  • llama-cpp-python

Than I run my Container with my llama_cpp application

$ docker run --gpus all my-docker-image 

It works, but the GPU has no effect even if I can see from my log output that something with GPU and CUDA was detected by llama-cpp:

....ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   noggml_init_cublas: CUDA_USE_TENSOR_CORES: yesggml_init_cublas: found 1 CUDA devices:  Device 0: NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yesllama_kv_cache_init:  CUDA_Host KV buffer size =   381.00 MiBllama_new_context_with_model: KV self size  =  381.00 MiB, K (f16):  190.50 MiB, V (f16):  190.50 MiBllama_new_context_with_model:  CUDA_Host  output buffer size =    62.50 MiBllama_new_context_with_model:      CUDA0 compute buffer size =   227.41 MiBllama_new_context_with_model:  CUDA_Host compute buffer size =    13.96 MiBllama_new_context_with_model: graph nodes  = 1060llama_new_context_with_model: graph splits = 356AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | .....

There is no performance different in running the container with or without GPU.

My first question is: Is my environment setup correct or are there any components missing on the Host or Container side?And my second question is: What is necessary to run llama-cpp-python inside a container using the GPU?

My installation code of llama-cpp-python within my container looks like this:

...RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir....

Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>