Is it faster to access data from files or a database server? [closed]

Solution 1:

I'll add to the it depends crowd.

This is the kind of question that has no generic answer but is heavily dependent on the situation at hand. I even recently moved some data from a SQL database to a flat file system because the overhead of the DB, combined with some DB connection reliability issues, made using flat files a better choice.

Some questions I would ask myself when making the choice include:

  1. How am I consuming the data? For example will I just be reading from the beginning to the end rows in the order entered? Or will I be searching for rows that match multiple criteria?

  2. How often will I be accessing the data during one program execution? Will I go once to get all books with Salinger as the author or will I go several times to get several different authors? Will I go more than once for several different criteria?

  3. How will I be adding data? Can I just append a row to the end and that's perfect for my retrieval or will it need to be resorted?

  4. How logical will the code look in six months? I emphasize this because I think this is too often forgotten in designing things (not just code, this hobby horse is actually from my days as a Navy mechanic cursing mechanical engineers). In six months when I have to maintain your code (or you do after working another project) which way of storing and retrieving data will make more sense. If going from flat files to a DB results in a 1% efficiency improvement but adds a week of figuring things out when you have to update the code have you really improved things.

Solution 2:

As a general rule, databases are slower than files.

If you require indexing of your files, a hard-coded access path on customised indexing structures will always have the potential to be faster if you do it correctly.

But 'performance' is not the goal when choosing a database over a file based solution.

You should ask yourself whether your system needs any of the benefits that a database would provide. If so, then the small performance overhead is quite acceptable.

So:

  1. Do you need to deal with multiple users and concurrent updates? (Well; you did say it's static.)
  2. Do you need flexibility in order to easily query the data from a variety of angles?
  3. Do you have multiple users, and could gain from making use of an existing security model?

Basically, the question is more of which would be easier to develop. The performance difference between the two is not worth wasting dev time.

Solution 3:

Depends on what your information is and what your access patterns and scale are. Two of the biggest benefits of a relational databases are:

  1. Caching. Unless you're very clever, you can't write a cache as good as that of a DB server

  2. Optimizer.

However, for certain specialized applications, neither of these 2 benefits manifest itself compared to files+folders data store - therefore the answer is a resounding "depends".

As for files/folders, the tricks are:

  • Cache the contents of frequently requested files
  • Have small directories (files in deeply nested small directories are much faster to access than in a flatter structure, due to the time it takes to read the contents of a big directory).
  • There are other, more advanced optimizations (slice across disks, placement on different places in a disk or different partition, etc..) - but if you have need of THAT level, you are better off with a database in the first place.