Write buffers performance for write-back or write-through policy

For write through, since it has to write even on hit. WB caches avoid having to write to outer levels of cache entirely on cache hits, which are hopefully the common case. (The #1 rule of thumb for caches for general-purpose CPU workloads is that caches work.)

e.g. see Cache size estimation on your system? for an example where AMD Bulldozer-family's 4k "write-coalescing cache" size between write-through L1d and L2 (https://www.realworldtech.com/bulldozer/8/) is the cutoff point for a memcpy / memset microbenchmark that finds bandwidth vs. working-set size. While CPUs with WB L1d cache can run memset at full speed up to L1d size.

Semi-related: When use write-through cache policy for pages why modern CPUs generally use WB caches, with a few exceptions like AMD's failed experiment of Bulldozer, which they redesigned from the ground up to make the very nice Zen microarchitecture.