Lambda timeouts with calls to external api when scaling up
Solution 1:
As mentioned under "Using Performance Metrics" at https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html you should be using "Maximum" metric not "Sum" for "ConcurrentExecutions". Using "Sum" will not make sense.
By what you explained your use-case there seems to API limit that you can call per second to external API (maybe 50 requests per second as per your information). When you are exceeding the limit your lambda retries resulting into longer execution times, eventually timing out under load. You can leverage ReservedConcurrency feature of lambda to limit your lambda to exceed external api limit. And, apply some backoff logic on throttling errors codes to be graceful to your downstream services.