Storing Images in DB - Yea or Nay?
So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.
What do you think are the pros/cons?
I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.
There are a couple of issues:
- database storage is usually more expensive than file system storage
- you can super-accelerate file system access with standard off the shelf products
- for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
- things like web servers, etc, need no special coding or processing to access images in the file system
- databases win out where transactional integrity between the image and metadata are important.
- it is more complex to manage integrity between db metadata and file system data
- it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem
As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.
- You are storing images that are changing dynamically, say invoices and you wanted to get an invoice as it was on 1 Jan 2007?
- The government wants you to maintain 6 years of history
- Images stored in the database do not require a different backup strategy. Images stored on filesystem do
- It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images
On the other hand there are problems associated
- Require additional code to extract and stream the images
- Latency may be slower than direct file access
- Heavier load on the database server
File store. Facebook engineers had a great talk about it. One take away was to know the practical limit of files in a directory.
Needle in a Haystack: Efficient Storage of Billions of Photos
This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.
FileStream solves most of the problems around storing the files in the DB:
- The Blobs are actually stored as files in a folder.
- The Blobs can be accessed using either a database connection or over the filesystem.
- Backups are integrated.
- Migration "just works".
However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.
From the MSDN Article:
Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.
File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.