Slow Memcached: Average 10ms memcached `get`
We're using Newrelic to measure our Python/Django application performance. Newrelic is reporting that across our system "Memcached" is taking an average of 12ms
to respond to commands.
Drilling down into the top dozen or so web views (by # of requests) I can see that some Memcache get
take up to 30ms
; I can't find a single use of Memcache get
that returns in less than 10ms
.
More details on the system architecture:
- Currently we have four application servers each of which has a memcached member. All four memcached members participate in a memcache cluster.
- We're running on a cloud hosting provider and all traffic is running across the "internal" network (via "internal" IPs)
- When I ping from one application server to another the responses are in
~0.5ms
Isn't 10ms
a slow response time for Memcached?
As far as I understand if you think "Memcache is too slow" then "you're doing it wrong". So am I doing it wrong?
Here's the output of the memcache-top
command:
memcache-top v0.7 (default port: 11211, color: on, refresh: 3 seconds)
INSTANCE USAGE HIT % CONN TIME EVICT/s GETS/s SETS/s READ/s WRITE/s
cache1:11211 37.1% 62.7% 10 5.3ms 0.0 73 9 3958 84.6K
cache2:11211 42.4% 60.8% 11 4.4ms 0.0 46 12 3848 62.2K
cache3:11211 37.5% 66.5% 12 4.2ms 0.0 75 17 6056 170.4K
AVERAGE: 39.0% 63.3% 11 4.6ms 0.0 64 13 4620 105.7K
TOTAL: 0.1GB/ 0.4GB 33 13.9ms 0.0 193 38 13.5K 317.2K
(ctrl-c to quit.)
** Here is the output of the top
command on one machine: **
(Roughly the same on all cluster machines. As you can see there is very low CPU utilization, because these machines only run memcache.)
top - 21:48:56 up 1 day, 4:56, 1 user, load average: 0.01, 0.06, 0.05
Tasks: 70 total, 1 running, 69 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 501392k total, 424940k used, 76452k free, 66416k buffers
Swap: 499996k total, 13064k used, 486932k free, 181168k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6519 nobody 20 0 384m 74m 880 S 1.0 15.3 18:22.97 memcached
3 root 20 0 0 0 0 S 0.3 0.0 0:38.03 ksoftirqd/0
1 root 20 0 24332 1552 776 S 0.0 0.3 0:00.56 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
5 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kworker/u:0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
7 root RT 0 0 0 0 S 0.0 0.0 0:00.62 watchdog/0
8 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cpuset
9 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper
...output truncated...
Solution 1:
I found that write is lighting fast, single reads are crazy slow, but buffered reads are very very fast. You need to do a buffered read with get_many, passing in an array of the keys that you want to retrieve. Here is my test:
WRITE 100000 written (one by one, but likely buffered behind the scenes) in 0.42605304718 seconds, 234712 per second
READ 100000 with buckets size=10000 read (many) in 0.651949167252 seconds, 153386 per second
READ 100000 one at a time read (one) in 86.2907109261 seconds, 1158 per second
In my case I am using python with pymemcache.
Solution 2:
So far my research indicates that 10ms
is "slow". By this I mean the memcache docs themselves refer to sub-1ms
times as being 'expected'. So the response times we're seeing are an order of magnitude slower.
What to expect from performance: https://code.google.com/p/memcached/wiki/NewPerformance
The "likely culprits" for slow memcached responses seem to be (in rough order of likelihood):
- poor application design (or a bug)
- network congestion
- swapping
- high CPU load from co-tenant applications
- hitting some sort of connection max
- state-ful firewalls
- bad TCP configurations
I've addressed almost all of these as follows:
- According to
memcached-top
(https://code.google.com/p/memcache-top/) we get roughly the same connection times from our application as we do when runningbrutis
(https://code.google.com/p/brutis/) from those same app servers. - The response time remains consistent throughout the day despite our system having a clear peak load time; the response times are never "spikey" as one might expect if this were a congestion issue. (Also our hosting provider claims to provide far more Mbps to these instances than
atop
reports we are utilizing) - Observed that theres plenty of free memory (and no iowait) on systems
- Low CPU usage on entire cache
- Only servicing 10-20 simultaneous connections per server
- We were using stateful firewall (via iptables) but when we removed all stateful entries there was no change in performance.
-
Blindly set these configuration options and hoped that they improve things:
/sbin/sysctl -w net.ipv4.tcp_tw_recycle=1 /sbin/sysctl -w net.ipv4.tcp_tw_reuse=1 /sbin/sysctl -w net.ipv4.tcp_fin_timeout=10
Results in no change.
We're basically at a loss for diagnosing this issue. We're approaching the point of "install Redis" and hope it works better.
Solution 3:
First thing that comes to mind is: are you using a FQDN/DNS to initiate a connection to your memcache server or are you using an IP address or local socket?
If you are using a hostname, you might be losing some time in the name resolution.
Try either putting the FQDN in the clients and servers' /etc/hosts and restart so that nothing gets cached, or change the reference to be IP address based, and see if you don't see an improvement.