How to remove extra indentation of Python triple quoted multi-line strings?
I have a python editor where the user is entering a script or code, which is then put into a main method behind the scenes, while also having every line indented. The problem is that if a user has a multi line string, the indentation made to the whole script affects the string, by inserting a tab in every space. A problem script would be something so simple as:
"""foo
bar
foo2"""
So when in the main method it would look like:
def main():
"""foo
bar
foo2"""
and the string would now have an extra tab at the beginning of every line.
textwrap.dedent from the standard library is there to automatically undo the wacky indentation.
From what I see, a better answer here might be inspect.cleandoc
, which does much of what textwrap.dedent
does but also fixes the problems that textwrap.dedent
has with the leading line.
The below example shows the differences:
>>> import textwrap
>>> import inspect
>>> x = """foo bar
baz
foobar
foobaz
"""
>>> inspect.cleandoc(x)
'foo bar\nbaz\nfoobar\nfoobaz'
>>> textwrap.dedent(x)
'foo bar\n baz\n foobar\n foobaz\n'
>>> y = """
... foo
... bar
... """
>>> inspect.cleandoc(y)
'foo\nbar'
>>> textwrap.dedent(y)
'\nfoo\nbar\n'
>>> z = """\tfoo
bar\tbaz
"""
>>> inspect.cleandoc(z)
'foo\nbar baz'
>>> textwrap.dedent(z)
'\tfoo\nbar\tbaz\n'
Note that inspect.cleandoc
also expands internal tabs to spaces.
This may be inappropriate for one's use case, but works fine for me.
What follows the first line of a multiline string is part of the string, and not treated as indentation by the parser. You may freely write:
def main():
"""foo
bar
foo2"""
pass
and it will do the right thing.
On the other hand, that's not readable, and Python knows it. So if a docstring contains whitespace in it's second line, that amount of whitespace is stripped off when you use help()
to view the docstring. Thus, help(main)
and the below help(main2)
produce the same help info.
def main2():
"""foo
bar
foo2"""
pass
Showing the difference between textwrap.dedent
and inspect.cleandoc
with a little more clarity:
Behavior with the leading part not indented
import textwrap
import inspect
string1="""String
with
no indentation
"""
string2="""String
with
indentation
"""
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))
Output
string1 plain='String\nwith\nno indentation\n '
string1 inspect.cleandoc='String\nwith\nno indentation\n '
string1 texwrap.dedent='String\nwith\nno indentation\n'
string2 plain='String\n with\n indentation\n '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='String\n with\n indentation\n'
Behavior with the leading part indented
string1="""
String
with
no indentation
"""
string2="""
String
with
indentation
"""
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))
Output
string1 plain='\nString\nwith\nno indentation\n '
string1 inspect.cleandoc='String\nwith\nno indentation\n '
string1 texwrap.dedent='\nString\nwith\nno indentation\n'
string2 plain='\n String\n with\n indentation\n '
string2 inspect.cleandoc='String\nwith\nindentation'
string2 texwrap.dedent='\nString\nwith\nindentation\n'
The only way i see - is to strip first n tabs for each line starting with second, where n is known identation of main method.
If that identation is not known beforehand - you can add trailing newline before inserting it and strip number of tabs from the last line...
The third solution is to parse data and find beginning of multiline quote and do not add your identation to every line after until it will be closed.
Think there is a better solution..