Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23218

FastAPI with SAM model: Only a few workers load properly due to CUDA memory error [closed]

$
0
0

I'm working on a project involving FastAPI with a Segment anything Model (SAM) for image processing. I have a CUDA setup with a GPU (NVIDIA RTX A4000) available for acceleration, and I'm running FastAPI with 15 workers. However, when I start the FastAPI server using the following CLI:

uvicorn server:app --port 5008 --host 0.0.0.0 --workers 15

only 3 to 5 workers load properly. I encountered a CUDA memory error for the remaining workers indicating that the memory is insufficient. The specific error message I receive is:

"RuntimeError: CUDA out of memory. Tried to allocate X MiB"

I've tried loading the SAM model when the workers start up, but I still face the same issue with workers running out of CUDA memory. Is there a way I can configure FastAPI or the workers to use the available CUDA memory more efficiently so that I can utilize all 15 workers without running into memory errors?

I'd appreciate any insights or suggestions on optimizing the CUDA memory usage for all workers in FastAPI. Thank you!


Viewing all articles
Browse latest Browse all 23218

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>