directory path types with argparse

My python script needs to read files from a directory passed on the command line. I have defined a readable_dir type as below to be used with argparse for validating that the directory passed on the command line is existent and readable. Additionally, a default value (/tmp/non_existent_dir in the example below) has also been specified for the directory argument. The problem here is that argparse invokes readable_dir() on the default value even in a situation where a directory argument is explicitly passed in on the command line. This causes the script to crap out as the default path /tmp/non_existent_dir does not exist in a context where a directory is explicitly passed in on the command line. I could get around this by not specifying a default value and making this argument mandatory, or by deferring the validation until later in the script but is a more elegant solution that anyone is aware of?

#!/usr/bin/python
import argparse
import os

def readable_dir(prospective_dir):
  if not os.path.isdir(prospective_dir):
    raise Exception("readable_dir:{0} is not a valid path".format(prospective_dir))
  if os.access(prospective_dir, os.R_OK):
    return prospective_dir
  else:
    raise Exception("readable_dir:{0} is not a readable dir".format(prospective_dir))

parser = argparse.ArgumentParser(description='test', fromfile_prefix_chars="@")
parser.add_argument('-l', '--launch_directory', type=readable_dir, default='/tmp/non_existent_dir')
args = parser.parse_args()

Solution 1:

I submitted a patch for "path arguments" to the Python standard library mailing list a few months ago.

With this PathType class, you can simply specify the following argument type to match only an existing directory--anything else will give an error message:

type = PathType(exists=True, type='dir')

Here's the code, which could be easily modified to require specific file/directory permissions as well:

from argparse import ArgumentTypeError as err
import os

class PathType(object):
    def __init__(self, exists=True, type='file', dash_ok=True):
        '''exists:
                True: a path that does exist
                False: a path that does not exist, in a valid parent directory
                None: don't care
           type: file, dir, symlink, None, or a function returning True for valid paths
                None: don't care
           dash_ok: whether to allow "-" as stdin/stdout'''

        assert exists in (True, False, None)
        assert type in ('file','dir','symlink',None) or hasattr(type,'__call__')

        self._exists = exists
        self._type = type
        self._dash_ok = dash_ok

    def __call__(self, string):
        if string=='-':
            # the special argument "-" means sys.std{in,out}
            if self._type == 'dir':
                raise err('standard input/output (-) not allowed as directory path')
            elif self._type == 'symlink':
                raise err('standard input/output (-) not allowed as symlink path')
            elif not self._dash_ok:
                raise err('standard input/output (-) not allowed')
        else:
            e = os.path.exists(string)
            if self._exists==True:
                if not e:
                    raise err("path does not exist: '%s'" % string)

                if self._type is None:
                    pass
                elif self._type=='file':
                    if not os.path.isfile(string):
                        raise err("path is not a file: '%s'" % string)
                elif self._type=='symlink':
                    if not os.path.symlink(string):
                        raise err("path is not a symlink: '%s'" % string)
                elif self._type=='dir':
                    if not os.path.isdir(string):
                        raise err("path is not a directory: '%s'" % string)
                elif not self._type(string):
                    raise err("path not valid: '%s'" % string)
            else:
                if self._exists==False and e:
                    raise err("path exists: '%s'" % string)

                p = os.path.dirname(os.path.normpath(string)) or '.'
                if not os.path.isdir(p):
                    raise err("parent path is not a directory: '%s'" % p)
                elif not os.path.exists(p):
                    raise err("parent directory does not exist: '%s'" % p)

        return string

Solution 2:

You can create a custom action instead of a type:

import argparse
import os
import tempfile
import shutil
import atexit

class readable_dir(argparse.Action):
    def __call__(self, parser, namespace, values, option_string=None):
        prospective_dir=values
        if not os.path.isdir(prospective_dir):
            raise argparse.ArgumentTypeError("readable_dir:{0} is not a valid path".format(prospective_dir))
        if os.access(prospective_dir, os.R_OK):
            setattr(namespace,self.dest,prospective_dir)
        else:
            raise argparse.ArgumentTypeError("readable_dir:{0} is not a readable dir".format(prospective_dir))

ldir = tempfile.mkdtemp()
atexit.register(lambda dir=ldir: shutil.rmtree(ldir))

parser = argparse.ArgumentParser(description='test', fromfile_prefix_chars="@")
parser.add_argument('-l', '--launch_directory', action=readable_dir, default=ldir)
args = parser.parse_args()
print (args)

But this seems a little fishy to me -- if no directory is given, it passes a non-readable directory which seems to defeat the purpose of checking if the directory is accessible in the first place.

Note that as pointed out in the comments, it might be nicer to
raise argparse.ArgumentError(self, ...) rather than argparse.ArgumentTypeError.

EDIT

As far as I'm aware, there is no way to validate the default argument. I suppose the argparse developers just assumed that if you're providing a default, then it should be valid. The quickest and easiest thing to do here is to simply validate the arguments immediately after you parse them. It looks like, you're just trying to get a temporary directory to do some work. If that's the case, you can use the tempfile module to get a new directory to work in. I updated my answer above to reflect this. I create a temporary directory, use that as the default argument (tempfile already guarantees the directory it creates will be writeable) and then I register it to be deleted when your program exits.