Recursive AWS Lambda function calls - Best Practice

I've been tasked to look at a service built on AWS Lambda that performs a long-running task of turning VMs on and off. Mind you, I come from the Azure team so I am not familair with the styling or best practices of AWS services.

The approach the original developer has taken is to send the entire workload to one Lambda function and then have that function take a section of the workload and then recursively call itself with the remaining workload until all items are gone (workload = 0).

Pseudo-ish Code:

// Assume this gets sent to a HTTP Lambda endpoint as a whole
let workload = [1, 2, 3, 4, 5, 6, 7, 8]

// The Lambda HTTP endpoint
function Lambda(workload) {
    if (!workload.length) {
        return "No more work!"
    }
    const toDo = workload.splice(0, 2) // get first two items
    doWork(toDo)

    // Then... except it builds a new HTTP request with aws sdk
    Lambda(workload) // 3, 4, 5, 6, 7, 8, etc.
}

This seems highly inefficient and unreliable (correct me if I am wrong). There is a lot of state being stored in this process and in my opinion that creates a lot of failure points.

My plan is to suggest we re-engineer the entire service to use a Queue/Worker type framework instead, where ideally the endpoint would handle one workload at a time, and be stateless.

The queue would be populated by a service (Jenkins? Lambda? Manually?), then a second service would read from the queue (and ideally scale-out as well, as needed).


UPDATE: AWS EventBridge now looks like the preferred solution.


It's "Coupling" that I was thinking of, see here: https://www.jeffersonfrank.com/insights/aws-lambda-design-considerations

Coupling
Coupling goes beyond Lambda design considerations—it’s more about the system as a whole. Lambdas within a microservice are sometimes tightly coupled, but this is nothing to worry about as long as the data passed between Lambdas within their little black box of a microservice is not over-pure HTTP and isn’t synchronous.

Lambdas shouldn’t be directly coupled to one another in a Request Response fashion, but asynchronously. Consider the scenario when an S3 Event invokes a Lambda function, then that Lambda also needs to call another Lambda within that same microservice and so on.

aws lambda coupling

enter image description here

You might be tempted to implement direct coupling, like allowing Lambda 1 to use the AWS SDK to call Lambda 2 and so on. This introduces some of the following problems:

  1. If Lambda 1 is invoking Lambda 2 synchronously, it needs to wait for the latter to be done first. Lambda 1 might not know that Lambda 2 also called Lambda 3 synchronously, and Lambda 1 may now need to wait for both Lambda 2 and 3 to finish successfully. Lambda 1 might timeout as it needs to wait for all the Lambdas to complete first, and you’re also paying for each Lambda while they wait.
  1. What if Lambda 3 has a concurrency limit set and is also called by another service? The call between Lambda 2 and 3 will fail until it has concurrency again. The error can be returned to all the way back to Lambda 1 but what does Lambda 1 then do with the error? It has to store that the S3 event was unsuccessful and that it needs to replay it.

This process can be redesigned to be event-driven: lambda coupling

enter image description here

Not only is this the solution to all the problems introduced by the direct coupling method, but it also provides a method of replaying the DLQ if an error occurred for each Lambda. No message will be lost or need to be stored externally, and the demand is decoupled from the processing.


AWS Step Functions is one way you can achieve this. Step Functions are used to orchestrate multiple Lambda functions in any manner you want - parallel executions, sequential executions or a mix of both. You can also put wait steps, condition checks, retries in between if you need.

Your overall step function might look something like this (say you want 1,2,3 to execute in parallel. Then when all these are complete, you want to execute 4, and then again 5 and 6 in parallel)

enter image description here

Configuring this is also pretty simple. It accepts a JSON like the following

{
  "Comment": "An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
  "StartAt": "Parallel",
  "States": {
    "Parallel": {
      "Type": "Parallel",
      "Next": "Task4",
      "Branches": [
        {
          "StartAt": "Task1",
          "States": {
            "Task1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
              "End": true
            }
          }
        },
        {
          "StartAt": "Task2",
          "States": {
            "Task2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
              "End": true
            }
          }
        },
        {
          "StartAt": "Task3",
          "States": {
            "Task3": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
              "End": true
            }
          }
        }
      ]
    },
    "Task4": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
      "Next": "Parallel2"
    },
    "Parallel2": {
      "Type": "Parallel",
      "Next": "Final State",
      "Branches": [
        {
          "StartAt": "Task5",
          "States": {
            "Task5": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
              "End": true
            }
          }
        },
        {
          "StartAt": "Task6",
          "States": {
            "Task6": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
              "End": true
            }
          }
        }
      ]
    },
    "Final State": {
      "Type": "Pass",
      "End": true
    }
  }
}