Delete a Line from a file in C Language

I want to delete certain lines in a file and insert certain lines in the same file based on whether certain parts of the line match a specified string. Is there a way of doing this without using a temporary file to copy the contents to and so on?


Solution 1:

Problem is, a file is (essentially) an array of bytes on disk (or whatever other physical substrate, but, bytes anyway!), and "a line" can take a varying number of bytes; so to insert or remove lines (unless you're always rigorously replacing a line with another line of exactly the same length in bytes) would require "shifting" all the rest of the file "up" or "down" by the difference in bytes... which can be an extremely onerous operation (as the rest of the file can be gigabytes even if you're just changing one line's length by 1 byte towards the beginning of the file).

So, such operations can be incredibly onerous and therefore are typically never offered as primitives in ANY language supporting files with variable line length (C, Python, Java, C++, Ruby, or ANY other such language). It's extremely unlikely that you really need to pay such a potentially unbound cost in performance AND risk (a system or disk crash during the "shift" of GB or data up or down can destroy the usabilty of your whole, huge file), when the perfectly simple, adequate, fast, safe, and reasonable technique you're trying to avoid has basically ZERO downsides (so it's far from obvious WHY are you trying to avoid it?).

Use a result-file different from the source-file, when done mv the result-file over the source-file (an atomic operation on most systems if you're within the same filesystem), and you really do have the best of all possible worlds.

Solution 2:

You can't easily "cut" a section of a file out in-place. You always make a temporary copy somewhere. This isn't a C thing; it's true for any language.

You could mmap the file, and then when you find the line you want to erase, you can memcpy everything after it to the location of the start of the line. I'd question how efficient that would be; the temporary file might be quicker.