DRBD terrible sync performance on 10GigE
In newer versions of DRBD (8.3.9 and newer) there is a dynamic resync controller that needs tuning. In older versions of DRBD setting the syncer {rate;}
was enough; now it's used more as a lightly suggested starting place for the dynamic resync speed.
The dynamic sync controller is tuned with the "c-settings" in the disk section of DRBD's configuration (see $ man drbd.conf
for details on each of these settings).
With 10Gbe between these nodes, and assuming low latency since protocol C is used, the following config should get things moving quicker:
resource rd0 { protocol C; disk { c-fill-target 10M; c-max-rate 700M; c-plan-ahead 7; c-min-rate 4M; } on cl1 { device /dev/drbd0; disk /dev/sda4; address 192.168.42.1:7788; meta-disk internal; } on cl2 { device /dev/drbd0; disk /dev/sda4; address 192.168.42.2:7788; meta-disk internal; } }
If you're still not happy, try turning max-buffers
up to 12k. If you're still not happy, you can try turning up c-fill-target
in 2M increments.
Someone elsewhere suggested that I use these settings:
disk {
on-io-error detach;
c-plan-ahead 0;
}
net {
max-epoch-size 20000;
max-buffers 131072;
}
And the performance is excellent.
Edit: As per @Matt Kereczman and others suggestions, I've finally changed to this:
disk {
on-io-error detach;
no-disk-flushes ;
no-disk-barrier;
c-plan-ahead 0;
c-fill-target 24M;
c-min-rate 80M;
c-max-rate 720M;
}
net {
# max-epoch-size 20000;
max-buffers 36k;
sndbuf-size 1024k ;
rcvbuf-size 2048k;
}
Resync speed is high:
cat /proc/drbd
version: 8.4.5 (api:1/proto:86-101)
srcversion: EDE19BAA3D4D4A0BEFD8CDE
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
ns:133246146 nr:0 dw:2087494 dr:131187797 al:530 bm:0 lo:0 pe:5 ua:106 ap:0 ep:1 wo:d oos:4602377004
[>....................] sync'ed: 2.8% (4494508/4622592)M
finish: 1:52:27 speed: 682,064 (646,096) K/sec
Write speed is excellent during resync with these settings (80% of local write speed, full wire speed):
# dd if=/dev/zero of=./testdd bs=1M count=20k
20480+0 enregistrements lus
20480+0 enregistrements écrits
21474836480 octets (21 GB) copiés, 29,3731 s, 731 MB/s
Read speed is OK:
# dd if=testdd bs=1M count=20k of=/dev/null
20480+0 enregistrements lus
20480+0 enregistrements écrits
21474836480 octets (21 GB) copiés, 29,4538 s, 729 MB/s
Later edit:
After a full resync, the performance is very good ( wire speed writing, local speed reading). Resync is quick (5/6 hours) and doesn't hurt performance too much (wire speed reading, wire speed writing). I'll definitely stay with c-plan-ahead at zero. With non-zero values, resync is way too long.
c-plan-ahead have to set a positive value to enable dynamic sync rate controller.
disk
c-plan-ahead 15; // 5 * RTT / 0.1s unit,in my case is 15
c-fill-target 24;
c-max-rate 720M;