I have an implementation of nested coroutines for C in this file:
This uses the macros coro_block to introduce a new coroutine block, coro_await to invoke a nested coroutine, coro_yield to suspend the coroutine block and coro_break to exit it.
The implementation is fairly simple in about 17 lines of code (omitting file headers/comments). I use it extensively in the implementation of a Windows kernel driver that implements the FUSE protocol (work in progress).
EDIT: For an example use see the implementation of READ:
I can't be the only one who finds it cleaner to just use structs with state and pointers to them, than all such macro and typedef hackery
Structs with state and pointers to them? Might as well just use C with classes.
The author should take a look at async.h, which is extremely simple. I don't think you need a scheduler, but you do need what Rust calls an "executor", which is a library of I/O functions and an event loop.
My real use case involves transferring that event loop from existing thread to another spawned pthread, thus, having a "scheduler" / "executor" is easier.
I think async.h is very similar to Protothreads, both are very lightweight and not as opinionated.
async.h does not preclude threads, as long as you ensure that no more than one thread is running any given co-routine at any point in time.
async.h is indeed similar to protothreads, but simpler. I like its mechanism for co-routines calling co-routines.
As for scheduling, there's a lot of history of M:N scheduling. M:N scheduling hasn't worked out for, e.g., Solaris, Linux, Rust, and some others. Thread libraries tend to be 1:1 nowadays. Erlang uses M:N and I see claims that it works well enough, but I am not familiar enough with it to understand if that's true or why.
M being the number of user-land threads and N being the number of threads allocated in the OS kernel to run those user-land threads. M:N means having them be different, with M>N and a user-land scheduler to choose which OS thread runs which user-land thread (which looks a lot like a co-routine). It's easy end up with pathological conditions and leaving performance on the table. But I suppose a lot depends on just what exactly the workload looks like. An I/O bound workload with no long CPU runs (or lots of yielding during them) will probably work well enough with M:N scheduling, but 1:1 threading with as many threads as CPUs should work even better.
I think Erlang is successfully where other not because it fully commit to the idea. Is not just "give me some way to do M:N", everything is around actors. Also, actors. Is not just "some way to schedule stuff" is a full paradigm.
Right, small actors == small (stack, code), preferably stackless coroutines that hopefully don't do much CPU hogging and just lots of cooperative behavior -- I'm ready to believe that M:N does well for that when the coroutines have small stacks or are stackless, and that you don't even need a scheduler for them if there's no yield operation as then you need only ever "schedule" coroutines whose pending I/O events have occurred.
I.e., coroutines are just a C10K method, and you must end up with more of them than you have OS threads and HW CPUs.
If, e.g., Bryan Cantrill and others who claim M:N is bad are wrong, they're only wrong -I think- if they extend the claim to stackless / small stack coroutines. But Bryan Cantrill's seminal paper on the badness of M:N threading was not about stackless coroutines, but about very stackful coroutines (pthreads).
M:N is necessarily bad if the M things have large stacks (which was and is the case in, e.g., pthreads).
M:N is necessarily good (C10K) if the M things are extremely light-weight.
Everything we see in this space, from Scheme-style continuations, partial continuations, to hand-coded CPS, to stackless co-routines, async/await primitives that allow compilers to do partial CPS conversion / coroutines -- all these things are about program C10K, which is about a) using async I/O, and b) compressing program state / reducing overhead to serve the most possible clients.
As a program state compression technique, nothing beats hand-coded CPS, but it's utterly not user-friendly. Scheme-style continuations mostly shift program state from the stack to the heap. The sweet spot is async/await.
I understand the excitement of proving you _can_ do something.
Wouldn’t a language suited to async, or a framework implementing that from the ground up, complete with all its natural ugliness, be ... better than trying to shoe-horn async into such a low language?
Sure, but this is hacker news. Talking about "I figured out how to do something" is core business :) Even if it doesn't necessarily make a lot of sense to do it in practice.
It totally would! I think that is where C++20's coroutine proposal (or Rust) makes a lot of sense (and all these negative abstraction costs!). There are a lot of fun when working on though, and the end result is not bad (I translated my old stateful coroutine based code to this one in an afternoon: https://github.com/liuliu/ccv/commit/03c84ee1e3344b8458d8502...)
Maybe. Doing something yourself is sometimes preferable to your compiler doing it for you, for example because you can fix it if it breaks, or because you don't have to wait for the next version of the compiler or use a different programming language. I mean yes it's nice when other people solve your problems for you with no effort on your part but that doesn't always happen.
In many cases, the actual language itself is only one reason why a particular language gets used. Often the decision also involves tooling, culture, 3rd-party libs, workforce experience, etc.
Having said that, I'm sure D would shine in OP's case.