AWS solution for processing many API calls in parallel

We are preparing solution for gathering data from one of our analysis web portal, we are collecting data using Python/API calls. CSV/XLSX report from our analysis web tool pulling lots of JSON data via Python script using mentioned API Calls. We need to get data for large number of customers. We have executed our Python script from AWS t2.medium VM, 4 GB RAM, 2 CPUs. Script runs around 2-3 hours to build the CSV/XLSX report depending on the query (it takes too long for us). So we are thinking about other solution, we need to speed up CSV/XLSX report generation. We are not sure what components we can use in AWS. We are considering: one main Lambda function (authentication, work distribution, triggering concurrent Lambda functions) and many concurrent Lambda functions. Each concurrent Lambda will be responsible for part of the work (10% of the clients, 10% of the API calls). Is there any tool we can use to speed up mentioned solution in AWS, Step Functions/Parallel flows, Lambda?

My Sequence creator (main Lambda function) code looks like below:

import boto3
import json
import requests

client = boto3.client('lambda')

def lambda_handler(event, context):
    response1 = client.invoke(FunctionName="worker-00", InvocationType="RequestResponse", Payload=json.dumps(event));
    response2 = client.invoke(FunctionName="worker-01", InvocationType="RequestResponse", Payload=json.dumps(event));
    response3 = client.invoke(FunctionName="worker-02", InvocationType="RequestResponse", Payload=json.dumps(event));
    response4 = client.invoke(FunctionName="worker-03", InvocationType="RequestResponse", Payload=json.dumps(event));

Assuming that output file will be big and it is not streaming data then,

For your approach Fan-out pattern which you already mentioned will be great, ranging from N customers per lambda to the fastest 1 customer per lambda, with a pattern of 1-N-S3-1 which will be the fastest but with a higher cost too,

Sequence merger lambda will be triggered by cloudwatch every x minutes where x is the average running time of all lambdas in minutes. As soon as all sequences are present, it will merge them to a single file and you can save that output in another S3 bucket and remove all the existing sequences in the Sequence bucket.

Note: Adding aws step functions in this design can be a good alternative with improved control but will affect the price a lot.

enter image description here

The above diagram will behave something like the below diagram when we call lambdas asynchronously

enter image description here

You can also tune the memory required for your task to improve performance using https://github.com/alexcasalboni/aws-lambda-power-tuning

Edit:

The sequence creator code should look something like the below sample code--> Here is a sample code:

workers = 3
payload = [{"payload1":"1"},{"payload2":"2"},{"payload3":"3"}]
for i in range(workers):
    lambda_client.invoke(FunctionName='sequence',InvocationType='Event',Payload=json.dumps(workers_payload[i]))