How to run llama-cpp-python in a Docker Container?

I have a more conceptional question about running llama-cpp-python in a Docker Container. Following a lot of different tutorials I am more confused as in the beginning.

I have a Debian 12 Server with a CPU - Intel Core i7-7700 and a GPU - GeForce GTX 1080.

I installed on the host via the Debian via DEFAULT APT Repository and the Nvidia APT Repository the following components

linux-headers-amd64
nvidia-detect
nvidia-driver
nvidia-smi
linux-image-amd64
cuda

The Nvida driver is installed correctly which I can verify with

# nvidia-smiSun Mar 31 10:46:20 2024       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: N/A      ||-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  NVIDIA GeForce GTX 1080        Off |   00000000:01:00.0 Off |                  N/A || 36%   42C    P0             39W /  180W |       0MiB /   8192MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI        PID   Type   Process name                              GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||  No running processes found                                                             |+-----------------------------------------------------------------------------------------+

Next I build a Docker Image where I installed inside the following libraries:

jupyterlab
cuda-toolkit-12-3
llama-cpp-python

Than I run my Container with my llama_cpp application

$ docker run --gpus all my-docker-image

It works, but the GPU has no effect even if I can see from my log output that something with GPU and CUDA was detected by llama-cpp:

....ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   noggml_init_cublas: CUDA_USE_TENSOR_CORES: yesggml_init_cublas: found 1 CUDA devices:  Device 0: NVIDIA GeForce GTX 1080, compute capability 6.1, VMM: yesllama_kv_cache_init:  CUDA_Host KV buffer size =   381.00 MiBllama_new_context_with_model: KV self size  =  381.00 MiB, K (f16):  190.50 MiB, V (f16):  190.50 MiBllama_new_context_with_model:  CUDA_Host  output buffer size =    62.50 MiBllama_new_context_with_model:      CUDA0 compute buffer size =   227.41 MiBllama_new_context_with_model:  CUDA_Host compute buffer size =    13.96 MiBllama_new_context_with_model: graph nodes  = 1060llama_new_context_with_model: graph splits = 356AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | .....

There is no performance different in running the container with or without GPU.

My first question is: Is my environment setup correct or are there any components missing on the Host or Container side?And my second question is: What is necessary to run llama-cpp-python inside a container using the GPU?

My installation code of llama-cpp-python within my container looks like this:

...RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir....

How to run llama-cpp-python in a Docker Container?

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...