Specifying Machine Type in Vertex AI Pipeline

I have a pipeline component defined like this:

data_task = run_ssd_data_op(
        labels_path=input_labels,
        data_config=config_task.outputs["output_data_config"],
        training_config=config_task.outputs["output_training_config"],
        assets_json=dump_conversion_task.outputs["output_ssd_query"]
    )
data_task.execution_options.caching_strategy.max_cache_staleness = "P0D"
data_task.container.add_resource_request('cpu', cpu_request)
data_task.container.add_resource_request('memory', memory_request)

When I run the pipeline on VertexAI the above component runs on an E2 machine type which matches the CPU and RAM requirements.

However, the component runs much more slowly on VertexAI than on the Kubeflow pipeline I setup using AIPlatform. I configured that cluster to use N1-highmem-32 machines for this job.

I would like to request that this component is run on an n1-highmem-32 machine, how can I do that?

For the GPU component of the pipeline I could use the line:

training_task.add_node_selector_constraint('cloud.google.com/gke-accelerator', 'NVIDIA_TESLA_T4').set_gpu_limit(
        gpu_request)

What is the equivalent node_selector_constraint that I need to apply to my data_task?

The machine type is defined by the amount of CPU and memory on it for example the type n1-highmem-32 has 32vCPUs and minimum 208GB of ram.

In general the Vertex AI will allocate a machine automatically, by default it will use the e2-standard-4 type.

So you may try setting .set_memory_request('208G').set_cpu_request("32"). so you can be redirected for the machine type you want.

But, in some of my tests it may ignore the amount of CPU and memory you have requested, and use the default machine type.

I recommend you to go to a custom job instead, where you can set all those parameters. Chek this doc Configure compute resources for custom training

Specifying Machine Type in Vertex AI Pipeline

Related

Recent Posts