Gunicorn does not repondes more than 6 requests at a time
Solution 1:
Posting this Community Wiki
for better visibility for community.
Unfortunately, I don't have all information to reproduce exactly this scenario (application design, how tests were executed, environment, etc). However, based on OP's comment:
Turns out that with Kubernetes, multitasking is at the pod level. Instead of having one big pod with many threads, you can have many smaller pods running. You could experiment on that switch.
It looks, like OP in his GKE
cluster used HPA with CPU
and Cluster Autoscaling similar solution which was described in App Engine Flex || Kubernetes Engine — ? article.
Important thing which is worth to mention is that many depends on scaling types.