How to debug memcached "SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY" errors?

I have a two server memcached setup. When memcached write fails, I receive an email notification. About once per day "SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY" error comes and I have no idea how to find the reason.

I am using PHP Memcached client.

I am not using too long keys. I tried adding -v flag, but it does not help, the log remains empty.

If I include output of getStats to the error notification, I receive the following info:

Array
(
    [192.168.0.3:11211] => Array
        (
            [pid] => 28167
            [uptime] => 3671962
            [threads] => 4
            [time] => 1358714713
            [pointer_size] => 64
            [rusage_user_seconds] => 24516
            [rusage_user_microseconds] => 130981
            [rusage_system_seconds] => 86246
            [rusage_system_microseconds] => 675512
            [curr_items] => 1616352
            [total_items] => 118339822
            [limit_maxbytes] => 2684354560
            [curr_connections] => 8
            [total_connections] => 78108681
            [connection_structures] => 356
            [bytes] => 981522779
            [cmd_get] => 1561752945
            [cmd_set] => 158718324
            [get_hits] => 1383072575
            [get_misses] => 178680370
            [evictions] => 0
            [bytes_read] => 138113231690
            [bytes_written] => 1091741700765
            [version] => 1.4.15
        )

    [192.168.0.4:11211] => Array
        (
            [pid] => -1
            [uptime] => 0
            [threads] => 0
            [time] => 0
            [pointer_size] => 0
            [rusage_user_seconds] => 0
            [rusage_user_microseconds] => 0
            [rusage_system_seconds] => 0
            [rusage_system_microseconds] => 0
            [curr_items] => 0
            [total_items] => 0
            [limit_maxbytes] => 0
            [curr_connections] => 0
            [total_connections] => 0
            [connection_structures] => 0
            [bytes] => 0
            [cmd_get] => 0
            [cmd_set] => 0
            [get_hits] => 0
            [get_misses] => 0
            [evictions] => 0
            [bytes_read] => 0
            [bytes_written] => 0
            [version] => 
        )

)

MEMCACHED_SERVER_TEMPORARILY_DISABLED or "SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY" is generated by libmemcached library. I'm unable to confirm how pecl_memcached handles this error, but I imagine it would be treated as a standard connectivity error. This message would be issued when a connection exceeds time-out and/or retry limit. (see I/O Options)

Naturally, as this is a temporary issue, you would architect your PHP application to fall-over to the next cache server, or pull affected server out of server list.


In my case turning off the behavior option tcp_nodelay made it work.

This seems to be an option of pylibmc, which is a python wrapper around the libmemcached, but the docs says, that this option belongs to the ones directly configurable in libmemcached.

For more info see pylibmc docs http://sendapatch.se/projects/pylibmc/behaviors.html


I my case, I changed localhost to '127.0.0.1' in my code. and i am able to resolve error 'SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY'

hope that helped

'options' => array(
            'servers' => array('127.0.0.1', 11211),

In my case the reason for this error message was an incorrect server name.

Somehow the code preparing the configuration string for use was messed up, and a space character went in front of one of the memcache server names. This results in the above mentioned error message when reading from the server.

I am using PHP and the Memcached extension. Adding the server name with the space worked without any complaint, i.e. there is no internal validation for them. The are only checked at the time a connection to the server has to be made.