DAGSTER: async ops and jobs and dynamic docker-ops

Here I have 2 questions.

  1. I need to run an aiohttp session which shall simultaneously make several requests to different urls and download several files and return a list of absolute paths to these files on disk. This list shall be passed to another async function.

Is there a way to run an "async def" function within a dagster job and build an async pipeline?

  1. In fact the length of the above mentioned list may differ from case to case. Each file requires a long and heavy processing and there is no way to make it async as the processing is blocking (unfortunately). So the only way is to start such processing in separate threads or processes or (as we do) - in separate docker-containers on different machines.

Can dagster dynamically create docker-containers with ops, return any output from them and kill each of them on container-exit?


Solution 1:

dagster supports both creating ops dynamically at runtime and running each op in its own container. You can read about dynamic mapping here: https://docs.dagster.io/concepts/ops-jobs-graphs/jobs-graphs#dynamic-mapping--collect

And you can configure your job with an executor that has it run each op in its own container: https://docs.dagster.io/deployment/executors#executors (There's a docker_executor that would be a good fit if you're running on Docker, or a k8s_job_executor if you're on Kubernetes).