Run & scale simple python scripts on Google Cloud Platform

I have a simple python script that I would like to run thousands of it's instances on GCP (at the same time). This script is triggered by the $Universe scheduler, something like "python main.py --date '2022_01'".

What architecture and technology I have to use to achieve this.

PS: I cannot drop $Universe but I'm not against suggestions to use another technologies.

My solution:

  1. I already have a $Universe server running all the time.
  2. Create Pub/Sub topic
  3. Create permanent Compute Engine that listen to Pub/Sub all the time
  4. $Universe send thousand of events to Pub/Sub
  5. Compute engine trigger the creation of a Python Docker Image on another Compute Engine
  6. Scale the creation of the Docker images (I don't know how to do it)

Is it a good architecture?

How to scale this kind of process?

Thank you :)


Solution 1:

It might be very difficult to discuss architecture and design questions, as they usually are heavy dependent on the context, scope, functional and non functional requirements, cost, available skills and knowledge and so on...

Personally I would prefer to stay with entirely server-less approach if possible.

For example, use a Cloud Scheduler (server less cron jobs), which sends messages to a Pub/Sub topic, on the other side of which there is a Cloud Function (or something else), which is triggered by the message.

Should it be a Cloud Function, or something else, what and how should it do - depends on you case.

Solution 2:

As I understand, you will have a lot of simultaneous call on a custom python code trigger by an orchestrator ($Universe) and you want it on GCP platform.

Like @al-dann, I would go to serverless approach in order to reduce the cost.

As I also understand, pub sub seems to be not necessary, you will could easily trigger the function from any HTTP call and will avoid Pub Sub.

PubSub is necessary only to have some guarantee (at least once processing), but you can have the same behaviour if the $Universe validate the http request for every call (look at http response code & body and retry if not match the expected result).

If you want to have exactly once processing, you will need more tooling, you are close to event streaming (that could be a good use case as I also understand). In that case in a full GCP, I will go to pub / sub & Dataflow that can guarantee exactly once, or Kafka & Kafka Streams or Flink.

If at least once processing is fine for you, I will go http version that will be simple to maintain I think. You will have 3 serverless options for that case :

  • App engine standard: scale to 0, pay for the cpu usage, can be more affordable than below function if the request is constrain to short period (few hours per day since the same hardware will process many request)
  • Cloud Function: you will pay per request(+ cpu, memory, network, ...) and don't have to think anything else than code but the code executed is constrain on a proprietary solution.
  • Cloud run: my prefered one since it's the same pricing than cloud function but you gain the portability, the application is a simple docker image that you can move easily (to kubernetes, compute engine, ...) and change the execution engine depending on cost (if the load change between the study and real world).