Does anyone really understand how HFSC scheduling in Linux/BSD works?
I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that).
Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux.
Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist).
This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below:
alt text http://f.imagehost.org/0177/hfsc-test-setup.png
Here are some questions I cannot answer because the tutorials contradict each other:
What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average)
Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)?
Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth?
Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference?
If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic?
According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration).
Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong?
One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC?
And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope?
I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)
Solution 1:
Reading the papers on HFSC and its cousins is not a good way of understanding it. The primary goal of the HFSC paper is to provide a rigorous mathematical proof of its claims, not explaining how it works. Indeed you can't understand how it works from the HFSC paper alone, you have to read the papers it references as well. If you have some problem with the claim or their proofs then contacting the authors of the papers might be a good idea, otherwise I doubt they will be interested in hearing from you.
I have written a tutorial for HFSC. Read it if my explanations below aren't clear.
What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average)
The real time curve and the link sharing curve are evaluated in different ways. The real time curve takes into account what a class has done throughout it's entire history. It must do that to provide accurate bandwidth and latency allocation. The downside it isn't what most people intuitively think of as fair. Under real time, if a class borrows bandwidth when nobody else wants it, it is penalised when someone else wants it back later. This means under the real time guarantee a class can get no bandwidth for a long period because it used it earlier, when nobody wanted it.
The link sharing is fair, in that it doesn't penalise a class for using spare bandwidth. However this means it can't provide strong latency guarantees.
The separation of link sharing from providing latency guarantees is the new thing HFSC brings to the table, and the paper says as much in it's opening sentence: In this paper, we study hierarchical resource management models and algorithms that support both link-sharing and guaranteed real-time services with decoupled delay (priority) and bandwidth allocation. The key word in that sentence is decoupled. Translated, it means you are expected say how unused bandwidth is to be shared with ls, and specify what real time guarantees (aka latency guarantees) are needed with rt. The two are orthogonal.
Does the real-time bandwidth count towards the link-share bandwidth?
Yes. In the HFSC paper they define bandwidth in terms of what the class has sent since the class has become backlogged (ie has packets waiting to be sent). The only difference between rt and ls is rt looks forward from every time time class became backlogged and computes the lowest guaranteed bandwidth from that set, whereas link sharing only looks from the last time the class became backlogged. So rt takes more bytes into consideration than ls, but every byte ls considers is also considered by rt.
Is upper limit curve applied to real-time as well, only to link-share, or maybe to both?
The upper limit does not stop packets from being sent to satisfy the real time condition. Packets sent under the real time condition still count towards the upper limit, and thus may well delay a packet being sent under the link sharing condition in the future. This is another difference between real time and link share.
Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference?
Yes, it does make a difference. As explained above if you use real time, the latencies are guaranteed but the link is not shared fairly (and in particular the class could suffer bandwidth starvation), and the upper limits are not enforced. If you use link share the latency is not guaranteed, but link sharing is fair, there is no risk of starvation, and upper limit is enforced. Real time is always checked before link share, however this doesn't necessary mean the link share will be ignored. That's because packets are counted differently. Since history is considered by real time a packet may not needed to meet the real time guarantee (because of one counted included from the history) but is needed to satisfy link share because it ignores the historical packet.
If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic?
The curve for real time controls allows you to trade tight latency for one particular traffic class (eg VOIP) for poor latency for other traffic classes (eg email). When deciding which packet to send under the real time constraint HFSC chooses the one that will complete sending first. However, it doesn't use the bandwidth of the link to compute this, it uses the bandwidth allocated by the real time curve. Thus if we have VOIP and email packets of the same length, and the VOIP packet has a convex curve that gives if a 10 fold boost over the concave curve for email, then 10 VOIP packets will be sent before the fist email packet. But this only happens for the burst time, which should be at most the time it takes to send a few packets - ie milli seconds. Thereafter the VOIP curve should flatten out, and VOIP will get no future boost unless it backs off for a while (which, given VOIP is a low bandwidth application, it should). The net result of all this is to ensure those first couple of VOIP packets are sent faster than anything else, thus giving VOIP low latency at the expense of email getting high latency.
As I said earlier, because link share does not look at history it can't give latency guarantees. A rock solid guarantee is what is needed for real time traffic like VOIP. However, on average a link shared convex curve will still provide a latency boost to its traffic. It's just not guaranteed. That's fine for most things, like web traffic over email.
According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration).
The documentation is correct. The hierarchy (and thus inner nodes) has no effect on the computation of real time whatsoever. I can't tell you why ALTQ evidently thinks it does.
Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong?
Both are wrong. If 70% or 80% of your traffic has hard latency requirements that must be specified with real time, the reality is you almost certainly you can't satisfy them with the link you are using. You need a faster link.
The only way someone could think specifying 80% of the traffic should be real time is if they were treading real time as a priority boost. Yes, in order to provide latency guarantees you are in effect boosting the priority of some packets. But it should only be for milliseconds. That's all a link can cope with and still provide the latency guarantees.
There is very little traffic that needs latency guarantees. VOIP is one, NTP another. The rest should all be done with link sharing. It you want web to be fast, you make it fast by allocating it most of the links capacity. That share is guaranteed over the long term. It you want it to be low latency (on average), give it a convex curve under link sharing.
One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC?
It is a good description upper limit. While the link share description is strictly accurate, it is misleading. While it's true link share doesn't give the hard latency guarantees like real time does, it does a better job of giving the class its bandwidth its allocation than its competitors, like CBQ and HTB. So in saying link share "doesn't provide guarantees" it is holding it to a standard higher that any other scheduler can provide.
Real time's description manages to be both accurate, but so misleading I'd call it wrong. As the name implies real time's purpose is not to give guaranteed bandwidth. It is to provide guaranteed latency - ie the packet is sent NOW, not some random amount of time later depending on how the link is used. Most traffic does not need guaranteed latency.
That said, real time doesn't give perfect latency guarantees either. It could, if the link wasn't managed by link share as well, but most users want the additional flexibility of having both and it doesn't come for free. Real time can miss it's latency deadline by the time it takes to send one MTU packet. (If it happens, it will be because it was a MTU packet link share snuck in while real time keeping the link idle in case it was given packet with a short deadline that had to be sent immediately. This is yet another difference between link share and real time. In order to provide its guarantees real time may deliberately keep the line idle even though there are packets to send, thus provided less than 100% link utilisation. Link share always uses 100% of the links available capacity. Unlike real time, it can be said to be "work preserving".)
The reason real time can be said to offer hard latency guarantees is the delay is bounded. So if you are trying to offer a 20ms latency guarantee on a 1Mbit/sec link, and link share is sending MTU sized (1500 byte) packets, you know one of those packets will take 12ms to send. Thus if you tell real time you want a 8ms latency, you can still meet the 20ms deadline - with an absolute guarantee.
And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope?
There isn't a prioritisation model. Seriously. If you want to give traffic absolute priorities, use pfifo. That is what it is for. But also beware that if you give web traffic absolute priority over email traffic, that means letting web traffic saturate the link and thus no email packets getting through, at all. All your email connections then die.
In reality no one wants that sort of prioritisation. What they want is what HFSC provides. If you actually have real time traffic, HFSC provides that. But it will be stuff all. For the rest, HFSC allows you to say "on a congested link, allocate 90% to web and let email trickle along at 10%, and oh send the odd DNS packet quickly but don't let someone DOS me with it."
Solution 2:
You could define the curves with different names:
- rt, real-time curve, bandwidth/delay guarantee.
- ls, link-share curve, bandwidth/delay sharing (based on the configuration of neighbour leaves)
- ul, upper limit curve, maximum bandwidth/delay it may attain.
What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average)
When you make a definition in HFSC with rates only, it automatically sets 'dmax' to 0. Which basically means it doesn't account for delay. A good HFSC configuration should include both bandwidth AND delay boundaries you want to use for your class, otherwise the algorithm can't figure out exactly how much priority a class should get.
Whenever you give packets priority, other packets will have to be decreased in priority. Based on the 'dmax' and 'rate' values all classes will be multiplexed using virtual timers. Refer to tc-hfsc(7) for more information.
Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)?
If the flow is not going over the boundaries of the link-share definition of the class, then the real-time curve is never used. Defining a real-time curve in this case allows you e.g.: to guarantee a certain 'dmax'.
If your link-share definitions are flawless, then you wouldn't need real-time curves. You could just define service curves (sc), but that would make your configuration work harder.
Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth?
The upper-limit curve of your class is applied to link-share only, when you define an upper-limit curve you MUST define a link-share curve. However the upper-limit curve of parent classes are still applied.
Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference?
There is a slight difference, if e.g. A2 = 0 kbits/s and B2 = 256 kbits/s. Then the virtual-time for A2 will be at his maximum. Whenever packets are classified in A2, they will be instantly processed. However the real-time curve of B2 will still ensure that is able to transmit at least 64 kbit/s
If I use the separate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic?
Real-time curves don't share traffic between neighbour leaves, link-share curves do.
According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration).
It is true that real-time curves are ignored for inner classes, they are only applied to leaf classes. However the real-time curves defined on those inner classes are taken into account for calculations on the leaf classes.
Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong?
What they mean is: you can't prioritise all traffic ... Whenever you give packets priority, other packets will have to be decreased in priority. If you over-guarantee, the algorithm becomes pointless. Define class that gain prioritisation and define classes that may suffer.
One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC?
This is correct.
And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope?
The difference between e.g. HFSC and HTB is that HFSC will allow you to define exactly how much priorisation you want it to have. You do this by defining minimum and maximum boundries with the 'dmax' value.