What is the best procedure to remove an user from MySQL table, but keep their data in foreign tables? [closed]

First of all, I found many discussions on this question around the web, but I'm really skeptical about the solutions proposed also mainly because most of them were from over 4 years ago when GDPR was novel, and also because they contradict each other a lot.

In order to be compliant with GDPR, we need to fully remove user's personal information (email, name, address, etc) when they request it.

In our system we have the table User and other tables like Training and Session for example.

Training has a ManyToMany relationship with User and Session table has an user_id field that is a foreign key to User table. Also Session has a training_id field which is also a foreign key to Training

I want to delete the users personal info that is contained in User, but I want to be able to keep their data in Session for statistics and log purposes.

I understand a hard delete is not the way to go, the fk constraints would be problematic and even though I could use Null, it's not really recommended. The suggestion I see more often is to just add a flag on User table to indicate if the user is deleted (is_delete) and use that appropriately in my business logic. IMO, that alone does not really resolve the problem with GDPR, as the user's info would still be in the database and this could get you in trouble with an audit and I was surprised this kind of answer was upvoted in most topics.

What I'm thinking of doing, it's to add that flag of is_delete, but also update the user row with fake or empty data to get rid of all user personal info. So I'd have something like this:

ID email name surname address is_deleted
1 "" Anonymous "" "" true

Would this be sufficient? Do you guys see any flaw with my plan?


Solution 1:

You're right. UPDATEs to rows with personally identifiable information is the way to do this without messing up your database constraints. You may want to change one of the PII columns to say "Redacted at (date-time)" in case some auditor wants to see evidence that you comply with these requests.

It takes a while to get your changes to filter through to all your backups. There's not much you can do about that operationally except making sure you don't retain old backups too far back in time.

If you use PII as a primary, unique, or constraint key, that's a problem. For example, if you use email address as a key to user information, you may have to replace the values in those columns with a randomly-generated text string to erase the data and preserve the uniqueness. You'll have to do that in a way that preserves constraints.

Don't forget that your web server logs very likely allow you to find users' IP addresses given their user IDs. Be sure you set up a log retention policy (ten days?) and enforce it by automatic deletion.

Make your policy say "it may take up to thirty days to completely delete..." to give your workflows time to affect your server logs and backups.

If a user asks you to remove all data, and then later returns to use your service again, handle them as a user you've never seen before. If you could figure out that you had seen them before, you obviously did not delete all their PII when requested.