Create (sane/safe) filename from any (unsafe) string

Solution 1:

Python:

"".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()

this accepts Unicode characters but removes line breaks, etc.

example:

filename = u"ad\nbla'{-+\)(ç?"

gives: adblaç

edit str.isalnum() does alphanumeric on one step. – comment from queueoverflow below. danodonovan hinted on keeping a dot included.

    keepcharacters = (' ','.','_')
    "".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()

Solution 2:

My requirements were conservative ( the generated filenames needed to be valid on multiple operating systems, including some ancient mobile OSs ). I ended up with:

    "".join([c for c in text if re.match(r'\w', c)])

That white lists the alphanumeric characters ( a-z, A-Z, 0-9 ) and the underscore. The regular expression can be compiled and cached for efficiency, if there are a lot of strings to be matched. For my case, it wouldn't have made any significant difference.