I currently have an API service running on AWS Lambda, but it's getting expensive due to high AWS costs. My aim is to replace 95 % of the usage with a cluster of servers and still handle peaks with Lambda. So, I would need a hybrid setup where the load balancer sends the requests to Lambda when the cluster can't handle them.

I don't have much experience with this kind of setup, so I'm asking for suggestions. Something that comes to mind is a load balancer + Kubernetes + Lambda, but I'm not sure if that's possible.

That's where you come in. Can you suggest something that would make this possible? As said, the aim is to reduce costs, so I would use free software where possible (and pay for the hardware, i.e. servers).

Thanks.


Solution 1:

I don't think you can easily mix and match K8s and Lambda backends for the same service with some intelligent overflowing to Lambda when K8s is overloaded.

It may be possible with some custom logic - e.g. have one ALB for K8s backend and one ALB for Lambda backend. Front both of them with a fleet of Nginx servers configured in such a way that they will prefer the K8s ALB. Only after that starts to time out or starts throwing errors it will direct traffic to the Lambda ALB. It may even need a custom Nginx module to achieve that, not sure. Very likely it's not something that you could do out of the box on AWS though.

What kind of API is it by the way? If it's mostly read-only (i.e. reporting some values to consumers) and if it's not necessarily real-time you can save a lot of calls to Lambdas with caching. Since Lambda costs are your concern I assume you're getting lots of calls per second. Maybe you don't have to serve each one by a Lambda but instead cache the response on CloudFront for a second, or five, or a minute, and let CloudFront serve the cached content. It's cheaper than calling the Lambda every single time.

Hope that helps :)