We are using a MongoDB replica set for sharing sessions and other (potentially sensitive) data in a web farm.

All the data we store uses TTL indexes to expire documents after a relatively short period of time (say an hour) partly for security reasons.

However, it has occurred to me that even if the data is deleted from a MongoDB collection, the oplog used for replication will still contain all documents created (and then deleted); all the data that was expired can then be easily read from the oplog.

Depending on the size allocated to the oplog, the data in it can be quite old.

My question is, what is best practice here? Is there anything we can do, other than severely reduce the oplog size, to prevent old data from being accessible?


Sensitive data in logs is the same as sensitive data anywhere. Depending on the criticality you'll want to -

  • only allow those authorized to view it to have access to the data (usually done via roles or group membership.)

  • if the data has any regulated or external issues related info in it (PCI, HIPAA, etc.) you must treat it accordingly and do the compliance dance

  • turn logging and monitoring (on both the network and host) to 11 (or whatever is appropriate)

  • esp. log and audit data access and attempts

  • keep the logs around only as long as needed

  • encrypt when not actively being used if possible

  • don't keep it scattered around, consolidate and defend

  • if it rises to a certain level of worry you can secure the mongodb server and treat it as a critical system (security hack: if you deal with PCI or other types of compliance related data you can pretend it has that data and have ops treat it with the same standards/policies/etc you apply to them rather than figure out a new way of handling it all)

In the case of mongo, you might -

  • send it (encrypted!) to a central log server, ideally tagged as sensitive or with a higher severity level

  • either don't store the logs locally, or, if needed for debugging/whatever, rotate them into oblivion (e.g. throw them away) after the shortest possible time (a day, week, 30 days, whatever)

  • if they exist locally ensure they're owned by root/priv'd user and mode 0400 (or whatever works for your OS)

  • if you're really paranoid you can use something akin to auditd(8) to see when someone tries to access the logs (and again, sending the auditd logs to the central log server!)

  • if you're really paranoid you'll want to use encrypted storage wherever the logs are kept so that they can't be snuffled off the disk post removal

  • some data might require long-term storage for compliance reasons, so ensure you don't nuke anything prematurely

  • access to the central log server should have the same restrictions as with the local server, cuz... data

Nothing really exciting, same ol', same ol', just don't miss anything, nail everything down, limit access, and monitor all that moves.