any validation of disparate memory speed and timings?
By going to the theory (and assuming bit time, command rate and cycle time are all part of the spec and must be true and always the same for all components claiming to be ddr4-xxx --which i would like to be corrected if they are not), we have:
Type | Data rate | Bit time | Command rate | Cycle time | CAS latency | First word (ns) | Fourth word (ns) | Eighth word (ns) |
---|---|---|---|---|---|---|---|---|
ddr4-3200 | 3200 MT/s | .313 ns | 1600 MHz | .626 ns | 21 | 13.15 | 14.09 | 15.34 |
ddr4-2666 | 2666 MT/s | .375 ns | 1333 MHz | .750 ns | 11 | 8.25 | 9.38 | 10.88 |
It shows that 2666 CAS 11 can be almost twice as fast by raw ns timings.
Is this true? Am i leaving some part of the theory out? did anyone observed that first hand or know a credible source that measured something similar on a real world application?
Solution 1:
You are comparing latency and overall module speed in a way that is not related.
Sure the ns timings of an initial request might be faster with a lower CAS latency, but memory typically transfers data in larger blocks these days which means that the initial latency hit is negligible by comparison.
The selection of an address in RAM might be slower, but the higher frequency of the faster RAM means that the actual data transfers happen faster.
The "first word (ns)" time might be 5ns slower but at a rough calculation the faster module need to transfer only 80 bits in a row (using the bit times: 5ns ÷ (0.375 - 0.313) = 80.64) to have made up for that initial delay.
From Wikipedia DDR4
the basic burst size is eight 64-bit words, and higher bandwidths are achieved by sending more read/write commands per second.
So a basic transfer unit could be far longer than 80 bits with multiple requests and efficient use.
The latency slows down that initial request and affects memory address selection speeds, but actual bulk transfers are far higher in higher speed modules.
Modules have been increasing in internal complexity in order to achieve higher bulk bandwidth and faster signalling, the downside is that complexity adds latency but that is almost always compensated for.
Think of it this way:
The memory in your system has this initial delay in reading. Once you know that it is always going to transfer a particular block size and will essentially carry on that transfer regardless of changes to address lines, you can then change the address to set up the next set of bytes you want.
Depending on the memory banks or internal setup with buffers you can essentially set things up for your next transfer with near zero cost in terms of latency. The data could, in theory, be there ready to go.
You can create an ongoing chain of "now you are doing that, this next" with latencies mostly hidden behind the previous transfer. Smarter electronics can achieve higher transfer rates at higher frequencies and higher latencies.
Not everything that is "slower" is actually slower.