How do i connect s3 bucket video files to ec2 instance and run python script on ec2 automatically whenever i get new video files on s3?

I have one folder on my s3 bucket ingested which has some 100 videos (mp4) files. Each time the s3 bucket ingests these 100 videos it should automatically run a python script on EC2 which will process these 100 video files and generate a new set of processed videos. These videos I need to store back in an s3 again with a new folder named generated_video.

To head start with can suggest to me how to approach the right procedure in this regard.


Solution 1:

Rather than triggering the script on the Amazon EC2 instance, the typical approach is to have the EC2 instance poll an Amazon SQS message:

  • Configure a trigger on the Amazon S3 bucket to send a message to an Amazon SQS queue whenever a new object is created in the bucket
  • Run code on the Amazon EC2 instance that will continually:
    • Call ReceiveMessage() on the Amazon SQS queue
    • If a message is returned, then process the object referenced in the message
    • Keep looping

When calling ReceiveMessage(), use ReceiveMessageWaitTimeSeconds = 20 to tell SQS to wait up to 20 seconds until a message is received. This reduces the number of times it needs to call SQS.

This architecture scales very well. If the script on the EC2 instance is 'busy' when a new file is uploaded to S3, the message simply sits in the SQS queue until the instance is ready. Similarly, if something goes wrong on the EC2 instance, the messages queue until it is fixed, rather than missing out on processing an S3 object.

Depending upon the complexity of the processing step, you might also be able to process the objects in an AWS Lambda function. It would operate in the same way, except that S3 can directly trigger the Lambda function rather than needing an SQS queue.