How to store 3 million records in key value format?

Because MySQL is so widely supported and this is really quite a trivial thing to do I would suggest going with it. Unless the server has at least a few GB of memory I would suggest sticking with MySQL rather than using an in-memory system.

Once you start putting your data into a database, whether it's MySQL or something else, you will quite likely find that you will find more uses for it. Right now you're only talking about key value pairs but the rest of the data relating to your products must be stored somewhere. If that's not in a database I can't imagine the data storage being very efficient.

Whatever you do, do not create those three million files. We have seen a number of questions here already resulting from the problems so many files create.


You can use dedicated Key-Value type of NoSQL database which is optimized for this kind of tasks. Have a look at:

  • Redis -- Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
  • MemcacheDB -- MemcacheDB is a distributed key-value storage system designed for persistent.
  • others (one of such lists can be found here: http://nosql-database.org/)

Of course you may use MySQL or any other relational database, but solutions specially designed for key-value type of data supposed to be better (otherwise what is the point of designing them in first place, except possibly the fact that it will be much smaller (in terms of RAM and HDD) solution).


And now for something completely different:

Given:

  • 180MB/3M products = 62 bytes/product on average.
  • 30,000 queries per day = 0.34 queries per second
  • Updated quarterly = essentially static data

Outside of the box solution:

Dump each product as a TXT resource record and store it in the DNS, e.g.:

$origin products.example.com.

product_1_name IN TXT "product 1 description"
product_2_name IN TXT "product 2 description"
...
product_3000000_name IN TXT "product 3000000 description"

Benefits:

  • extremely reliable and trusted (you already depend on it every day)
  • can be built on pretty much any platform
  • pretty much every language has support for DNS queries in one form or another
  • open source and commercial servers support different kinds of backend databases
  • can be trivially replicated (just specify multiple name servers)
  • handles atomic updates, even when replicated across a dozen servers
  • can be cryptographically signed to ensure data integrity
  • can handle orders of magnitude higher query per second rates (10,000 queries per second are easily handled with commodity hardware)

Reasons why this might be a bad idea:

  • you need to search the data (DNS is purely key/value lookup)
  • you need to hide the data (DNS has no confidentiality)