Python: How to ignore #comment lines when reading in a file

In Python, I have just read a line form a text file and I'd like to know how to code to ignore comments with a hash # at the beginning of the line.

I think it should be something like this:

for 
   if line !contain #
      then ...process line
   else end for loop

But I'm new to Python and I don't know the syntax

you can use startswith()

for line in open("file"):
    li=line.strip()
    if not li.startswith("#"):
        print line.rstrip()

I recommend you don't ignore the whole line when you see a # character; just ignore the rest of the line. You can do that easily with a string method function called partition:

with open("filename") as f:
    for line in f:
        line = line.partition('#')[0]
        line = line.rstrip()
        # ... do something with line ...

partition returns a tuple: everything before the partition string, the partition string, and everything after the partition string. So, by indexing with [0] we take just the part before the partition string.

EDIT: If you are using a version of Python that doesn't have partition(), here is code you could use:

with open("filename") as f:
    for line in f:
        line = line.split('#', 1)[0]
        line = line.rstrip()
        # ... do something with line ...

This splits the string on a '#' character, then keeps everything before the split. The 1 argument makes the .split() method stop after a one split; since we are just grabbing the 0th substring (by indexing with [0]) you would get the same answer without the 1 argument, but this might be a little bit faster. (Simplified from my original code thanks to a comment from @gnr. My original code was messier for no good reason; thanks, @gnr.)

You could also just write your own version of partition(). Here is one called part():

def part(s, s_part):
    i0 = s.find(s_part)
    i1 = i0 + len(s_part)
    return (s[:i0], s[i0:i1], s[i1:])

@dalle noted that '#' can appear inside a string. It's not that easy to handle this case correctly, so I just ignored it, but I should have said something.

If your input file has simple enough rules for quoted strings, this isn't hard. It would be hard if you accepted any legal Python quoted string, because there are single-quoted, double-quoted, multiline quotes with a backslash escaping the end-of-line, triple quoted strings (using either single or double quotes), and even raw strings! The only possible way to correctly handle all that would be a complicated state machine.

But if we limit ourselves to just a simple quoted string, we can handle it with a simple state machine. We can even allow a backslash-quoted double quote inside the string.

c_backslash = '\\'
c_dquote = '"'
c_comment = '#'


def chop_comment(line):
    # a little state machine with two state varaibles:
    in_quote = False  # whether we are in a quoted string right now
    backslash_escape = False  # true if we just saw a backslash

    for i, ch in enumerate(line):
        if not in_quote and ch == c_comment:
            # not in a quote, saw a '#', it's a comment.  Chop it and return!
            return line[:i]
        elif backslash_escape:
            # we must have just seen a backslash; reset that flag and continue
            backslash_escape = False
        elif in_quote and ch == c_backslash:
            # we are in a quote and we see a backslash; escape next char
            backslash_escape = True
        elif ch == c_dquote:
            in_quote = not in_quote

    return line

I didn't really want to get this complicated in a question tagged "beginner" but this state machine is reasonably simple, and I hope it will be interesting.

I'm coming at this late, but the problem of handling shell style (or python style) # comments is a very common one.

I've been using some code almost everytime I read a text file.
Problem is that it doesn't handle quoted or escaped comments properly. But it works for simple cases and is easy.

for line in whatever:
    line = line.split('#',1)[0].strip()
    if not line:
        continue
    # process line

A more robust solution is to use shlex:

import shlex
for line in instream:
    lex = shlex.shlex(line)
    lex.whitespace = '' # if you want to strip newlines, use '\n'
    line = ''.join(list(lex))
    if not line:
        continue
    # process decommented line

This shlex approach not only handles quotes and escapes properly, it adds a lot of cool functionality (like the ability to have files source other files if you want). I haven't tested it for speed on large files, but it is zippy enough of small stuff.

The common case when you're also splitting each input line into fields (on whitespace) is even simpler:

import shlex
for line in instream:
    fields = shlex.split(line, comments=True)
    if not fields:
        continue
    # process list of fields

Python: How to ignore #comment lines when reading in a file

A more robust solution is to use shlex:

The common case when you're also splitting each input line into fields (on whitespace) is even simpler:

Related

Recent Posts