I have a Async FastAPI endpoint. This endpoint essentially at a high level gets data and does some simple business calculations and returns a json model.
It makes 3 database queries within asyncio.gather()
. These calls are made using SQALCHEMY ORM 1.4. The queries are as optimized as can be (index being used, small result sets, only columns needed etc. I use the depends within FastAPI for the async engine connection as well.
This is the only I/O then after the 3 queries are returned there is some non intensive business logic (few date calculations and match cases) ran and a new “api response” model is generated and then cached in a redis and returned to the user. Pretty straightforward and no frills.
However I do not know if my understanding of FastAPI capabilities or pythons are mislead but I can’t seem to reach a high RPS on the endpoint as well as latency is also not as I expect.
I have it deployed using kubernetes and what not and resource wise I have it currently set to 50 pods with 1.5 CPU and 2gb memory. The cpu is more than enough as I can see in monitoring of the pods and memory is plenty. With this when load testing can only reach around 4.5k RPS max, does this seem right? I know other drop wizard services and spring boot APIs of similar scale and functionality reach much higher RPS of like 15k RPS and latency was 2-3x quicker.
I know code wise it is pretty much optimized there are a few thing I know I could do like not use ORM or that but I have tested with that and it does little to no effect on performance.
I noticed “spikes” in latency as well such as the p90 would be ~40ms but then the p95 would be ~400ms and then some response even get as high as 1.2s. This is not even when there is high traffic all the time.
I’m running out of ideas but I just want to even know is this kind of RPS expected from fast api/ python 3.10 or is it a case of I’m missing something?