Database: To delete or not to delete records

I don't think I am the only person wondering about this. What do you usually practice about database behavior? Do you prefer to delete a record from the database physically? Or is it better to just flag the record with a "deleted" flag or a boolean column to denote the record is active or inactive?


Solution 1:

It definitely depends on the actual content of your database. If you're using it to store session information, then by all means wipe it immediately when the session expires (or is closed), you don't want that garbage lying around. As it cannot really be used again for any practical purposes.

Basically, what you need to ask yourself, might I need to restore this information? Like deleted questions on SO, they should definitely just be marked 'deleted', as we're actively allowing an undelete. We also have the option to display it to select users as well, without much extra work.

If you're not actively seeking to fully restore the data, but you'd still like to keep it around for monitoring (or similar) purposes. I would suggest that you figure out (to the extent possible of course) an aggregation scheme, and shove that off to another table. This will keep your primary table clean of 'deleted' data, as well as keep your secondary table optimized for monitoring purposes (or whatever you had in mind).

For temporal data, see: http://talentedmonkeys.wordpress.com/2010/05/15/temporal-data-in-a-relational-database/

Solution 2:

Pros of using a delete flag:

  1. You can get the data back later if you need it,
  2. Delete operation (updating the flag) is probably quicker than really deleting it

Cons of using a delete flag:

  1. It is very easy to miss AND DeletedFlag = 'N' somewhere in your SQL
  2. Slower for the database to find the rows that you are interested in amongst all the crap
  3. Eventually, you'll probably want to really delete it anyway (assuming your system is successful. What about when that record is 10 years old and it was "deleted" 4 minutes after originally created)
  4. It can make it impossible to use a natural key. You may have one or more deleted rows with the natural key and a real row wanting to use that same natural key.
  5. There may be legal/compliance reasons why you are meant to actually delete data.

Solution 3:

As a complement to all posts...

However, if you plan to mark the record, its good to consider making a view, for active records. This would save you from writing or forgetting the flag in your SQL query. You might consider a view for non-active records too, if you think that also serve a purpose.

Solution 4:

I am glad to have found this thread. I too was wondering what people thought about this issue. I have implemented the 'marked as deleted' for about 15 years on many systems. Whenever a user would call to say something was accidentally deleted it was certainly a lot easier to mark it un-deleted than recreate it or restore from a backup.

We are using postgresql and Ruby on rails it looks like we could do this in 1 of two ways, modify rails or add an ondelete trigger and does instead a pl/pgsql function to mark as deleted. I am leaning toward the latter.

As for performance hits, it will be interesting to see the results of EXPLAIN-ANALYZE on large tables to few deleted items as well as many deleted items.

In systems used over time I have found, new users tend to do silly things like delete things accidentally. So when people are new in a position they have all the access rights of the person previously in that position except with zero experience. Accidentally deleting something and being able to quickly recover gets everyone back to work quickly.

But as someone said, sometimes you may need that particular key back for some reason, at that point you would need to really delete it, then re-create the records (on undelete it and modify the record).

Solution 5:

I mark them as deleted, and don't really delete. However every once in a while I sweep out all the junk and archive it, so it doesn't kill performance.