DNS as distributed cache
We have a use case to cache 300+ million pieces of such data each with a unique key.
I know this is unorthodox but, it has been suggested at my company that DNS could be used as a fast distributed cache for small (<512 bytes) pieces of data.
The DNS entry would be {Key}.{modulus of hashed key}.mycompany.local.
i.e. U5333145311.1.mycompany.local
We would be making requests at the rate of 5000 to 7500 per second from 10 to 15 servers.
We would update each DNS server via the zone files.
As I am a programmer, this is all new to me
- is this even feasible?
- what are the pitfalls?
- how do I size the DNS servers.
Thanks
Update: The data is an array of 1 to 30 integers (not 512K sorry), so it is very small. My CTO who came from network ops like this solution because it is a known, mature system and has built in fault tolerance and he can use network ops to manage it. I am very leery but open minded.
DNS makes sense in some cases:
-
Widely distributed data
For example DNS blacklists are effective because the caching infrastructure already exists near the users (i.e. their ISP), or for large users they can easily run custom software like (rbldnsd). However the point is it makes use of an existing widely deployed protocol.
-
Small responses (around 500 bytes max)
Most interesting things people distribute via DNS are small and often even just the existence or not of the record is enough (e.g. a DNSBL signals an IP address is bad, or something that signals the checksum of this file is "bad").
I wrote this: https://dgl.cx/wikipedia-dns, and it pretty much pushes the limit of response sizes you can safely do in DNS. Not everyone implements or supports EDNS0 and as someone else said once you've fallen back to TCP the benefit of being stateless disappears.
As others said it sounds to me like you'd be better off with something like memcached. Trying to constrain yourself to DNS seems silly when it's for internal use. If you have control over the client you can easily do a better job at failover and load balancing than DNS itself can do.
As I am the skeptic's CTO, I'd like to throw some color around the discussion.
As Gary has mentioned, the application needs to be able to publish a very large manifest. The manifest will be partitioned into a few (30-100) groups. Each key will average about 55 bytes, but could be much larger.
Whichever product we choose needs to support the following:
1) Full redundancy and load balancing 2) High transaction volume (15k-20k reads/sec) with <500ms response time.
The nice to haves are: 1) Hierarchical structure 2) Record expiration 3) Self-describing topology
DNS popped into mind only after thinking about using UDP instead of TCP to retrieve data from an in-memory distributed cache. Naturally, DNS is one of the old and largest DNS applications.
DNS is known to support zone files with very large amounts of records. For example .com. is a zone file afterall (albeit distributed across many servers globally). It is also known to support very high levels of traffic.
We have run DNS through some preliminary tests. We loaded up a single zone file with 10M TXT records with a representative amount of data. From a different server on the same LAN, we then ran tests of 300,000 queries in a multi-threaded fashion and got about 5,000 requests per second. The server and client barely flinched during the test. We are either running into bottlenecks in the testing app itself or in the network stack on the client.
I am intrigued by DNS because it supports everything I want natively, and has so for many years. The features I like are:
Zone Delegation - we can define which server(s) handle particular partitions. For example 1.mycompany.local is handled by servers 10.1.1.1 and 10.1.1.2.
Redundancy - DNS was built with resiliency and redundancy in mind. It can also be easily load balanced.
Performance - Proven to support high request volumes
With all that said, DNS does sound like a bizarre tool to use as a cache. If it does end up making it through the selection phase, we would absolutely use it only on the internal LAN, and it would not be on the same DNS as our any other internal or external DNS systems. One side note is that we might want to share the data we're storing with 3rd party partners. DNS is a well known entity that anyone could easily query to receive zone transfers from.
Thank you for your continued feedback.
1 - probably, though not recommended
2 - odd caching behavior, insecure, no support anywhere
3 - no idea
There is essentially a ready-made solution for this in memcached: http://www.danga.com/memcached/