Optional job parameter in AWS Glue?
How can I implement an optional parameter to an AWS Glue Job?
I have created a job that currently have a string parameter (an ISO 8601 date string) as an input that is used in the ETL job. I would like to make this parameter optional, so that the job use a default value if it is not provided (e.g. using datetime.now and datetime.isoformatin my case). I have tried using getResolvedOptions:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
However, when I am not passing an --ISO_8601_STRING
job parameter I see the following error:
awsglue.utils.GlueArgumentError: argument --ISO_8601_STRING is required
matsev and Yuriy solutions is fine if you have only one field which is optional.
I wrote a wrapper function for python that is more generic and handle different corner cases (mandatory fields and/or optional fields with values).
import sys
from awsglue.utils import getResolvedOptions
def get_glue_args(mandatory_fields, default_optional_args):
"""
This is a wrapper of the glue function getResolvedOptions to take care of the following case :
* Handling optional arguments and/or mandatory arguments
* Optional arguments with default value
NOTE:
* DO NOT USE '-' while defining args as the getResolvedOptions with replace them with '_'
* All fields would be return as a string type with getResolvedOptions
Arguments:
mandatory_fields {list} -- list of mandatory fields for the job
default_optional_args {dict} -- dict for optional fields with their default value
Returns:
dict -- given args with default value of optional args not filled
"""
# The glue args are available in sys.argv with an extra '--'
given_optional_fields_key = list(set([i[2:] for i in sys.argv]).intersection([i for i in default_optional_args]))
args = getResolvedOptions(sys.argv,
mandatory_fields+given_optional_fields_key)
# Overwrite default value if optional args are provided
default_optional_args.update(args)
return default_optional_args
Usage :
# Defining mandatory/optional args
mandatory_fields = ['my_mandatory_field_1','my_mandatory_field_2']
default_optional_args = {'optional_field_1':'myvalue1', 'optional_field_2':'myvalue2'}
# Retrieve args
args = get_glue_args(mandatory_fields, default_optional_args)
# Access element as dict with args[‘key’]
There is a workaround to have optional parameters. The idea is to examine arguments before resolving them (Scala):
val argName = 'ISO_8601_STRING'
var argValue = null
if (sysArgs.contains(s"--$argName"))
argValue = GlueArgParser.getResolvedOptions(sysArgs, Array(argName))(argName)
Porting Yuriy's answer to Python solved my problem:
if ('--{}'.format('ISO_8601_STRING') in sys.argv):
args = getResolvedOptions(sys.argv, ['ISO_8601_STRING'])
else:
args = {'ISO_8601_STRING': datetime.datetime.now().isoformat()}
I don't see a way to have optional parameters, but you can specify default parameters on the job itself, and then if you don't pass that parameter when you run the job, your job will receive the default value (note that the default value can't be blank).
Wrapping matsev's answer in a function:
def get_glue_env_var(key, default="none"):
if f'--{key}' in sys.argv:
return getResolvedOptions(sys.argv, [key])[key]
else:
return default