How do stackless coroutines differ from stackful coroutines?

First, thank you for taking a look at CO2 :)

The Boost.Coroutine doc describes the advantage of stackful coroutine well:

stackfulness

In contrast to a stackless coroutine a stackful coroutine can be suspended from within a nested stackframe. Execution resumes at exactly the same point in the code where it was suspended before. With a stackless coroutine, only the top-level routine may be suspended. Any routine called by that top-level routine may not itself suspend. This prohibits providing suspend/resume operations in routines within a general-purpose library.

first-class continuation

A first-class continuation can be passed as an argument, returned by a function and stored in a data structure to be used later. In some implementations (for instance C# yield) the continuation can not be directly accessed or directly manipulated.

Without stackfulness and first-class semantics, some useful execution control flows cannot be supported (for instance cooperative multitasking or checkpointing).

What does that mean to you? for example, imagine you have a function that takes a visitor:

template<class Visitor>
void f(Visitor& v);

You want to transform it to iterator, with stackful coroutine, you can:

asymmetric_coroutine<T>::pull_type pull_from([](asymmetric_coroutine<T>::push_type& yield)
{
    f(yield);
});

But with stackless coroutine, there's no way to do so:

generator<T> pull_from()
{
    // yield can only be used here, cannot pass to f
    f(???);
}

In general, stackful coroutine is more powerful than stackless coroutine. So why do we want stackless coroutine? short answer: efficiency.

Stackful coroutine typically needs to allocate a certain amount of memory to accomodate its runtime-stack (must be large enough), and the context-switch is more expensive compared to the stackless one, e.g. Boost.Coroutine takes 40 cycles while CO2 takes just 7 cycles in average on my machine, because the only thing that a stackless coroutine needs to restore is the program counter.

That said, with language support, probably stackful coroutine can also take the advantage of the compiler-computed max-size for the stack as long as there's no recursion in the coroutine, so the memory usage can also be improved.

Speaking of stackless coroutine, bear in mind that it doesn't mean that there's no runtime-stack at all, it only means that it uses the same runtime-stack as the host side, so you can call recursive functions as well, just that all the recursions will happen on the host's runtime-stack. In contrast, with stackful coroutine, when you call recursive functions, the recursions will happen on the coroutine's own stack.

To answer the questions:

  • Are there any limitations on the use of automatic storage variables (i.e. variables "on the stack")?

No. It's the emulation limitation of CO2. With language support, the automatic storage variables visible to the coroutine will be placed on the coroutine's internal storage. Note my emphasis on "visible to the coroutine", if the coroutine calls a function that uses automatic storage variables internally, then those variables will be placed on the runtime-stack. More specifically, stackless coroutine only has to preserve the variables/temporaries that can be used after resumed.

To be clear, you can use automatic storage variables in CO2's coroutine body as well:

auto f() CO2_RET(co2::task<>, ())
{
    int a = 1; // not ok
    CO2_AWAIT(co2::suspend_always{});
    {
        int b = 2; // ok
        doSomething(b);
    }
    CO2_AWAIT(co2::suspend_always{});
    int c = 3; // ok
    doSomething(c);
} CO2_END

As long as the definition does not precede any await.

  • Are there any limitations on what functions I can call from a stackless coroutine?

No.

  • If there is no saving of stack context for a stackless coroutine, where do automatic storage variables go when the coroutine is running?

Answered above, a stackless coroutine doesn't care about the automatic storage variables used in the called functions, they'll just be placed on the normal runtime-stack.

If you have any doubt, just check CO2's source code, it may help you understand the mechanics under the hood ;)


What you want are user-land threads/fibers - usually you want to suspend the your code (running in fiber) in a deep nested call stack (for instance parsing messages from TCP-connection). In this case you can not use stackless context switching (application stack is shared between stackless coroutines -> stack frames of called subroutines would be overwritten).

You can use something like boost.fiber which implements user-land threads/fibers based on boost.context.