Getting 503 error from Varnish but varnishlog doesn't tell much
Running Varnish 4 and I'm occasionally getting 503 errors which I can't put my finger in. I've tried curl
ing the backend directly and it's returning a response everytime so it doesn't look like the issue is there. Below is entry from my varnishlog but it doesn't tell me much. I obtained this log by running the command
varnishlog -q "RespStatus >= 500 or BerespStatus >= 500"
Here is the log entry
* << Request >> 524356
- 524356 Begin c req 524355 rxreq
- 524356 Timestamp c Start: 1421453220.095464 0.000000 0.000000
- 524356 Timestamp c Req: 1421453220.095464 0.000000 0.000000
- 524356 ReqStart c 173.255.199.52 38213
- 524356 ReqMethod c GET
- 524356 ReqURL c /asd/asdasd
- 524356 ReqProtocol c HTTP/1.0
- 524356 ReqHeader c X-Real-IP: 206.169.79.151
- 524356 ReqHeader c Host: www.CENSORED.com
- 524356 ReqHeader c X-Forwarded-For: 206.169.79.151
- 524356 ReqHeader c X-Time-Server-Received: 1421453197
- 524356 ReqHeader c Connection: close
- 524356 ReqHeader c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
- 524356 ReqHeader c User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36
- 524356 ReqHeader c Accept-Encoding: gzip, deflate, sdch
- 524356 ReqHeader c Accept-Language: en-US,en;q=0.8
- 524356 ReqHeader c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
- 524356 ReqUnset c X-Forwarded-For: 206.169.79.151
- 524356 ReqHeader c X-Forwarded-For: 206.169.79.151, 173.255.199.52
- 524356 VCL_call c RECV
- 524356 ReqUnset c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
- 524356 ReqHeader c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
- 524356 ReqUnset c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
- 524356 ReqHeader c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
- 524356 VCL_return c pass
- 524356 VCL_call c HASH
- 524356 VCL_return c lookup
- 524356 VCL_call c PASS
- 524356 VCL_return c fetch
- 524356 Link c bereq 524357 pass
- 524356 Timestamp c Fetch: 1421453222.583956 2.488493 2.488493
- 524356 Timestamp c Process: 1421453222.583983 2.488520 0.000027
- 524356 RespHeader c Date: Sat, 17 Jan 2015 00:07:02 GMT
- 524356 RespHeader c Server: Varnish
- 524356 RespHeader c X-Varnish: 524356
- 524356 RespProtocol c HTTP/1.1
- 524356 RespStatus c 503
- 524356 RespReason c Service Unavailable
- 524356 RespReason c Service Unavailable
- 524356 VCL_call c SYNTH
- 524356 RespHeader c Content-Type: text/html; charset=utf-8
- 524356 RespHeader c Retry-After: 5
- 524356 VCL_return c deliver
- 524356 RespHeader c Content-Length: 415
- 524356 Debug c "RES_MODE 2"
- 524356 RespHeader c Connection: close
- 524356 Timestamp c Resp: 1421453222.584031 2.488567 0.000048
- 524356 ReqAcct c 983 0 983 205 415 620
- 524356 End c
My default.vcl
vcl 4.0;
import std;
backend default {
.host = "localhost";
.port = "81";
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
}
acl purge {
# Only allow purges coming from localhost
"127.0.0.1";
"localhost";
}
EDIT: varnishstat while throwing 503
errors. You can see fetch_failed
has happened numerous times.
MAIN.uptime 787 1.00 1.00 1.00 1.00
MAIN.sess_conn 11076 10.99 14.00 8.30 8.17
MAIN.client_req 11089 11.98 14.00 8.40 8.23
MAIN.cache_hit 1488 1.00 1.00 1.32 1.41
MAIN.cache_hitpass 2 0.00 . 0.00 0.00
MAIN.cache_miss 3186 1.00 4.00 1.27 1.29
MAIN.backend_conn 9585 9.99 12.00 7.01 6.76
MAIN.backend_reuse 17 1.00 . 0.16 0.12
MAIN.backend_recycle 17 1.00 . 0.16 0.12
MAIN.fetch_head 2 0.00 . 0.06 0.06
MAIN.fetch_length 346 2.00 . 0.39 0.35
MAIN.fetch_chunked 4497 3.99 5.00 2.27 2.06
MAIN.fetch_close 4756 4.99 6.00 4.36 4.35
MAIN.fetch_failed 180 0.00 . 0.10 0.12
MAIN.pools 4 0.00 . 4.00 4.00
MAIN.threads 54 0.00 . 54.00 54.00
MAIN.threads_limited 1 0.00 . 0.00 0.00
MAIN.threads_created 102 0.00 . 0.00 0.00
MAIN.threads_destroyed 48 0.00 . 0.00 0.00
MAIN.busy_sleep 5 0.00 . 0.00 0.00
MAIN.busy_wakeup 5 0.00 . 0.00 0.00
MAIN.sess_queued 50 0.00 . 0.00 0.00
MAIN.n_object 2873 1.00 . 2866.32 2865.00
MAIN.n_objectcore 2891 1.00 . 2885.40 2884.22
MAIN.n_objecthead 2990 1.00 . 2981.80 2980.00
MAIN.n_backend 1 0.00 . 1.00 1.00
MAIN.n_expired 135 0.00 . 133.18 132.78
MAIN.s_sess 11076 10.99 14.00 8.30 8.17
MAIN.s_req 11089 11.98 14.00 8.40 8.23
MAIN.s_pass 6415 9.99 8.00 5.81 5.52
MAIN.s_fetch 9601 10.99 12.00 7.09 6.82
MAIN.s_synth 112 0.00 . 0.00 0.00
MAIN.s_req_hdrbytes 3346583 3583.50 4252.00 2308.12 2251.15
MAIN.s_req_bodybytes 804869 1072.65 1022.00 202.52 162.86
MAIN.s_resp_hdrbytes 2737544 3615.46 3478.00 1942.75 1857.93
MAIN.s_resp_bodybytes 66853226 14104.28 84946.00 37816.96 38189.12
MAIN.sess_closed 9442 6.99 11.00 6.96 7.05
MAIN.backend_req 9594 10.99 12.00 7.17 6.87
Daemon options
DAEMON_OPTS="-a :80 \
-f /etc/varnish/default.vcl \
-T 127.0.0.1:6082 \
-u varnish -g varnish \
-t 120 \
-n $INSTANCE \
-s malloc,10G \
-p thread_pools=4 \
-p thread_pool_min=5 \
-p thread_pool_max=500 \
-p thread_pool_timeout=300 \
-S /etc/varnish/secret"
Any suggestions?
How many requests/s and total connections does your varnish handle when the 503 errors occur?
I've run into a similar situation where it would return 503 errors to the users while the backends were running just fine.
varnishlog would show that the backends were sick when in fact were healthy.
So after some digging I found out that the problem was with the max file descriptors.
I had to increase my system's max file descriptors in order for varnish to be able to handle all the requests properly.
After that all 503 errors stopped and all backends constantly show as healthy.