Should I use MySQL blob field type?
I am struggling to decide if I should be using the MySQL blob field type in an upcoming project I have.
My basic requirements are, there will be certain database records that can be viewed and have multiple files uploaded and "attached" to those records. Seeing said records can be limited to certain people on a case by case basis. Any type of file can be uploaded with virtually no restriction.
So looking at it one way, if I go the MySQL route, I don't have to worry about virus's creeping up or random php files getting uploaded and somehow executed. I also have a much easier path for permissioning and keeping data tied close to a record.
The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.
Is there a performance hit with using MySQL blob field type? I'm concerned about choosing a solution that will hinder future growth of the website as well as choosing a solution that wont be easy to maintain.
Is there a performance hit with using MySQL blob field type?
Not inherently, but if you have big BLOBs clogging up your tables and memory cache that will certainly result in a performance hit.
The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.
Yes, this is a common approach. You'd usually do something like have folders named after each table they're associated with, containing filenames based only on the primary key (ideally a integer; certainly never anything user-submitted).
Is this a better idea? It depends. There are deployment-simplicity advantages to having only a single data store, and not having to worry about giving the web user write access to anything. Also if there might be multiple copies of the app running (eg active-active load balancing) then you need to synchronise the storage, which is much easier with a database than it is with a filesystem.
If you do use the filesystem rather than a blob, the question is then, do you get the web server to serve it by pointing an Alias at the folder?
- + is super fast
- + caches well
- - extra server config: virtual directory; needs appropriate file extension to return desired
Content-Type
- - extra server config: need to add
Content-Disposition: attachment
/X-Content-Type-Options
headers to stop IE sniffing for HTML as part of anti-XSS measures
or do you serve the file manually by having a server-side script spit it out, as you would have to serving from a MySQL blob?
- - is potentially slow
- - needs a fair bit of manual If-Modified-Since and ETag handling to cache properly
- + can use application's own access control methods
- + easy to add correct Content-Type and Content-Disposition headers from the serving script
This is a trade-off there's not one globally-accepted answer for.
If your web server will be serving these uploaded files over the web, the performance will almost certainly be better if they are stored on the filesystem. The web server will then be able to apply HTTP caching hints such as Last-Modified
and ETag
which will help performance for users accessing the same file multiple times. Additionally, the web server will automatically set the correct Content-Type
for the file when serving. If you store blobs in the database, you'll end up implementing the above mentioned features and more when you should be getting them for free from your web server.
Additionally, pulling large blob data out of your database may end up being a performance bottleneck on your database. Also, your database backups will probabaly be slower because they'll be backing up more data. If you're doing ad-hoc queries during development, it'll be inconvenient seeing large blobs in result sets for select
statements. If you want to simply inspect an uploaded file, it will be inconvenient and roundabout to do so because it'll be awkwardly stored in a database column.
I would stick with the common practice of storing the files on the filesystem and the path to the file in the database.