When a PC edits a file, does it delete the original file?
If code.txt
(or whatever file) is edited and saved I have two ideas of how a PC would handle the process:
The PC deletes
code.txt
completely and makes a newcode.txt
(edited version) from scratch.The PC edits part of hex of
code.txt
. So no delete happens.
Which idea represents how computers work?
Solution 1:
Could be either – it depends on the text editor that was used.
The concept of a 'text file' isn't built into computers – each operating system may manage files differently, and each text editor may use those files differently.
In practice, you'll find text editors which have both mechanisms. Practically all operating systems allow direct overwrite of an existing file's contents, so simple editors such as Notepad usually just ask the OS to write directly into the original file, as that's easiest to implement – but risky if you lose power mid-write. So for reliability reasons, many editors deliberately save the updated data to a new file and delete the original.
(I think in-place updates are more common among hex editors, where most edits don't insert/delete bytes but only change existing locations, so a full rewrite file is not needed.)
There's even a third mode of operation – the editor might first make a backup copy of the old file, then directly write new data into the file.
It also depends on the filesystem which keeps the file. With most traditional filesystems, if a program asks to write to an existing file, the filesystem will just overwrite old data in-place.
However, some filesystems do work in "copy-on-write" mode, where any new data is always written to a different location, whether the program wants it or not. Again, this has the possible advantage of increased reliability because an interrupted change can be fully reverted.
In some filesystems (such as Btrfs or ext4) this is an optional feature; in others (e.g. log-structured filesystems) it is part of the core design.
Solution 2:
Since you are talking about "saving the file", then file will not be edited in-place on disk.
With a file in a usual filesystem, there are two things to consider. There is the directory entry, and then there is the actual file data somewhere on the disk.
When you edit a file in a normal editor, it will load the file data into RAM, and any editing will just happen on that copy of the data. Then when you save the file, there are basically two options:
Option 1: the original file is renamed, so both the original directory entry and the original data will remain on the disk. The rename might for example change file suffix to .bak
(removing any previous .bak
file, usually). Then a new file is created and the data from memory is written there.
Option 2: the original directory entry is modified so the file is truncated to 0 length. The area on disk used for file data will be marked as unused, but the old file contents will remain on disk until they are overwritten. Then new data is written. In this case the directory entry remains, just the data it points to is changed.
There are a few possible variations, a common one being, the edited data is first stored to temporary file, so if your computer crashes at this point, the original file will likely not be damaged. Then the original file is deleted and the new file renamed with the correct name. Or, the original file could just be deleted before writing the new one.
So your theory 1 is close to what most editors do.
Then there are special cases. The most obvious one is a disk editor, which allows reading and overwriting bytes directly on disk. Another might be a database file, where records might be fixed size, so it's easy to just overwrite a record. But data can't be appended in the middle of a file, and therefore editing text files or any other files where the length of the data in the middle of the file commonly changes, these tricks can't really be used.
So your theory 2 is possible in some cases, but normal text editors and such don't do it.
Solution 3:
Historically, drives were directly controlled by the OS, which in turn controlled by the application. In that context, Theory 2 was the primary way PCs worked. the OS specified a physical location to put data, and it had full control over this process. As a result, early file systems had a "bad sector" table, so after your data was lost, the computer could tell you the data was lost and mark the sector as unusable to avoid more data loss. Disk scans and defragmentation was the order of the day.
However, after the turn of the century, we moved to LBA, so now the OS would simply reference the "logical" block it wanted to read or write to. The hard drive itself now had the intelligence to shuffle around data behind the OS's back without it noticing. This meant better reliability, since sectors that failed to verify could simply be moved to a new physical location without affecting the OS's knowledge of where that data was located.
In modern hardware, the "platter" disk drives typically just overwrite whatever was there before with the new incoming data, and optionally remaps the LBA if the sector looks like it might not retain the data (the sector is damaged or worn). "Flash" drives typically erase the old cells and then write data to new cells, a process known as wear-leveling.
In both cases, this is possible because there is always unused capacity beyond the reported value. This overprovisioning allows the drive to have a longer usable life than the rather unreliable technology of the previous century's technology. The LBA mode enables the physical medium to be abstracted from the OS so that the drive itself could take whatever measures the drive thinks is necessary to prevent data loss.
At the application level, you typically open a file in "WRITE" mode, which tells the OS to clear the file ("delete" the contents, but not the file itself), then write new data. All of this is buffered at the OS level, then "flushed" to the drive, which makes the requested changes.
Given that information, Theory 1 is what technically happens at the application programming level, at least by default, as there is also a "write with append" mode to avoid clearing the file contents. The OS itself will present the changes to be made more like Theory 2, but abstracted via LBA. The drive itself will then probably do something that's a mix of Theory 1 and Theory 2.
Yep. It's complicated, and very part-manufacturer/OS-developer/application-developer dependent. However, all of this complexity is aimed at making data storage more reliable while improving power usage/battery life.
Solution 4:
Depends. AFAIK Microsoft Word, when saving .doc
(not .docx
) files with Fast save options enabled, appends changes made to document since last save do existing file.
Solution 5:
Though there are other answers, I will try to give a complete answer so that you could understand it at every level, from the issue to solutions, and how they work.
Short answer
Highly depends on your editor, underlying software/drivers, storage.
What to do: If you want to delete permanently, for most users searching "secure delete file" solves the issue. If your case is not "most user's" case - there is no short answer ;)
Paranoiac's short answer
Is recoverable, unless you remove it permanently with a combination of specific tools on high settings.
Long answer
There is missing information in your question (software, hardware, etc), so instead of answering myself I will help you answer your question yourself.
It's not that easy and straightforward. The file may pass through multiple layers and it can remain anywhere for some time or for long time.
Editor <-> Memory <-> Backup/VCS/diff disk/etc if exists <-> OS <-> File System <-> Storage cache <-> Storage
Depending on your environment configuration, some layers above may be added or removed. So to give a complete answers I will give info about each of them, you may take the points which are relevant for your case.
- Editor:
- The editor may save the old (or new) version of the file temporarily on another place. Then after deleting the file, its old (or new) version may still remain somewhere. For example, MS Word creates such temporary files, so that it could recover the file after a crash.
- If the editor software replaces the bits of the same file when editing it, then the file can get rewritten when editing. So you may use such editor to fill up the file with random bits and make it harder to recover.
But this may also depend on editor settings and file types.
Note that the word may was in italic. Even if the editor rewrites the file, it still may remain untouched/recovered (read the next points).
- Underlying software/drivers/file system:
- File Systems may mark the deleted file's space as "free" without actually cleaning it up. They do it to save performance, since full cleanup of file requires more operations (by disk, CPU). So after removing the file it may still be recoverable.
The same applies to "formatting disk", especially for "fast formatting" options. You may have seen different file recovery tools which work after deleting the file or even formatting the disk - most of them can work for this reason. - File will remain untouched if there are other software/drivers underneath that protect initial file from getting overwritten. Those types of software include Version Control Systems, virtual differential disks, some backup software. An example is Git, which will keep the original file blocks and will create new file that holds the modified blocks.
- Storage:
-
Storage itself can write changed blocks on a new sector, and mark old blocks as "free". Then the file will physically remain on the storage (and is recoverable), unless it gets overwritten by another file. Example is modern SSD storage, which may do it at hardware level.
-
There are ways to recover data from a typical mechanical HDD's magnetic discs even when the data was overwritten. And there are specialized companies in it.
- Memory: Yes, it can remain in memory too - RAM, video RAM, cache, etc. It depends on the editor/viewer you use. Most of the software don't cleanup memory securely after using it. And for some time the file may remain recoverable from memory before the memory gets reused by other applications. For example, a typical text file may remain in RAM after editing and deleting it. And an image (or the whole screen) may be recovered from video RAM after it has been closed and deleted. Not only image, but for example a screen of your browser may remain in video memory even after an hour.
So, if you want to get a specific answer whether your file will be deleted or not, you should also tell what file, editor, backup/VCS software, file system, hardware, storage you use.
How to actually delete the file?
This is probably the next question that you will question yourself. Well there are many software/hardware solutions. Since SuperUser is not for promoting software/hardware, instead of telling names I will tell you how to find them.
- To remove from storage: search for keywords "securely delete file". For more exact matches add your OS, hard drive type, or other info you have into the query.
- To cleanup it from memory/cache: search for "securely cleanup RAM | video RAM | hard drive cache".
As you may have noticed, the main keyword here is "securely cleanup".
Next questions may be:
- How do these software work?
They overwrite the space of the file/memory with 0s, 1s, or random bits for few rounds, so that the old information could not be recovered. - Can I trust them?
Depends. If you want it for secure applications (such as banking), in order to get security certifications (which is required for financial institutions) then you can trust official commercial tools that have passed certifications. Be sure to check the certifications of the tool. E.g. if you are an bank admin, and regulations require that files were deleted securely after usage - then this is the case.
If you want it for yourself, can read and understand the code and algorithms - you may go for open-source battle-tested solutions too.
Hope this helps.
If there is a missing point - feel free to comment!