Little's Law, Queueing Theory, and the Universal Scalability Law
Have you tried thinking of it this way? Fundamentally, X(N) is usually measured by a benchmark running at steady state as close to 100% utilization as the SUT and load drivers allow. This is known as the "internal throughput rate" or ITR. What we are really interested in is the External Throughput Rate or ETR, which is ITR times some function of Utilization. Now if we think of the scalability law in hardware terms the are 2 things consider:
- If we measured ITR in terms of available cores the USL curve follows change in throughput from 1 to N cores. If we make a big assumption that cores are fully used before cores are added then:
For each ITR meausrement of m out of n cores, we are essentially also measuring ITR at m/n utilization of n cores. In other words the scaling curve is a proxy for the saturation curve. Using this we can back our way into ETR as a function of utilization.
- The second thing to consider is that in the short run, utilization must be in n+1 states from 0/N to N/N. All measured utilization values come from averaging these states over time. In other words utilization is quantized and actually does hop around the states of 100% usage of m/N cores where m is a random value. This means that our assumption is not all that wild.
Once we have ETR as a function of utilization we can then proceed to find the Response time. Response time will be between 1/TP(1) and 1/TP(m).
There is a metric call the TPI (TeamQuest Performance Indicator) which is the ratio of the Service Time to the Response Time. This "Key Performance Indicator" eliminates the need to understand service time, but still allows us to understand the queuing effects and relative response time of various solutions.
Using a Queueing Model we can come up with a Usage based Performance Indicator which tells us how much the queueing is effecting the solutions being considered. We can plot this indicator v utilization and get characteristic curve which yields insight into the system. Both UPI and utilization are bound by 0 and 1.
The plot has four quadrants. Quadtrant 1 Utilization < 0.5, UPI > 0.5. This is where UPI curves for viable systems start. Quadrant 2. Utilization > 0.5, UPI > 0.5 This is where a well running system should be. In this quadrant Response time is still near Service Time and ETR approaches ITR. Quadrant 3. Utilization > 0.5, UPI < 0.5.This is where UPI curves terminate. Response Time >> Service time as ETR approaches ITR. Quadrant 4. Utilization < 0.5 , UPI < 0.5. This is the quadrant that systems need to avoid. ETR << ITR, and Response Time >> Service Time.
For the M/G/1 queuing model UPI = 1 /(1 + c^2 x u/(1-u)) where u is utilization and c is the "Index of Variability" or the Stdev/Mean of the utilization. Using UPI may eliminate the need to understand Tserv(u).
Hope this helps.