Simulating Recursion with Generators

I’ve recently had an idea to use generators to define and iteratively execute recursive functions.

Well why would we need to change the recursion to iteration? Because when the recursion runs too deep we can get a stack overflow.

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

In these cases we normally transform the recursive algorithm to an iterative one by using our own heap stack to simulate the recursion.

Ok, but why would we need generators for that? The problem with the normal approach is that we need to manually define the state that we will keep on the stack and that the algorithm needs to be modified to work with that state. But by using generators the solution can still be implemented recursively and the compiler will generate and track the state for us!

Implementation

Please note that a nightly version of the compiler rustc 1.30.0-nightly (4591a245c 2018-09-22) with an experimental Generator API was used, so this example could fail to compile with a future version of rust.

A google search for simulating the recursion with generators turned up mostly just the posts about the recursive generators (link1, link2). While they are a nice feature they suffer from the recursion limit as normal functions do and they most probably allocate the generator state dynamically for each recursive call. Since I wanted to avoid these problems I created a proof of concept of an executor that runs custom-defined “recursive” generators:

https://gitlab.com/blaztinn/recursion-simulator-example

Looking at the code in rec.rs (Playground), the algorithm that we want to simulate is the count function. It’s a simple function that just calls itself n times, adding one to the count with each call. Sorry, but no Fibonacci example today. :)

fn count(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1, // This line is not needed but it seems to prevent compile-time evaluation
        _ => count(n - 1) + 1,
    }
}

This function is not really useful, it is just a minimal example that can easily trigger a stack overflow and can be benchmarked. For a more real-world example you can imagine here a function that finds a binary tree’s largest depth.

To use the function with our recursion executor we do some slight changes to it by converting it into a generator that is returned from a function. Input arguments are now captured by the generator and instead of a recursive call we yield the arguments for the call. After the generator is resumed the return value of the recursive call is retrieved from the captured reference to a RefCell. Macro rec_call! is designed to help us with the “recursive call” part.

macro_rules! rec_call {
    ($ret_val:ident, $arg:expr) => {
        ({
            yield $arg;
            let r = *$ret_val.borrow(); // Bind to a variable so that the borrow ends here
            r
        })
    };
}

fn gen_count<'r>(
    ret_val: &'r RefCell<u32>,
    n: u32,
) -> impl Generator<Yield = u32, Return = u32> + 'r {
    move || match n {
        0 => 0,
        1 => 1,
        _ => rec_call!(ret_val, n - 1) + 1,
    }
}

If you look at the count and the gen_count functions, you can see that we kept the recursive implementation of the solution. There is some boilerplate on the transformed function but it’s good enough to present a workable concept.

The rec_execute! macro and the rec_execute_inner function are the executors of these kinds of generators. The func argument (in our case gen_count) is used to create new generators and the arg argument is used to pass the initial arguments to start the recursion. Each generator can yield arguments for a recursive call and that arguments get passed to a newly created generator. To pass the return values back into the calling generators the ret_val RefCell is used.
The executor creates new generators as needed and pushes them onto the callstack stack. When a generator completes its execution it gets popped of the stack and dropped. And when the stack gets empty the execution of all the generators is completed with the final result stored in the ret_val.

macro_rules! rec_execute {
    ($func:expr, $arg:expr) => {{
        let ret_val = RefCell::new(Default::default());
        rec_execute_inner(&ret_val, $func, $arg);
        ret_val.into_inner()
    }};
}

fn rec_execute_inner<'r, A, R, F, G>(ret_val: &'r RefCell<R>, func: &F, arg: A)
where
    G: Generator<Yield = A, Return = R>,
    F: Fn(&'r RefCell<R>, A) -> G,
{
    let mut callstack = vec![];
    callstack.push(func(&ret_val, arg));
    while !callstack.is_empty() {
        let gen = callstack.last_mut().unwrap();
        let ret = unsafe { gen.resume() };
        match ret {
            GeneratorState::Yielded(arg) => callstack.push(func(&ret_val, arg)),
            GeneratorState::Complete(ret) => {
                ret_val.replace(ret);
                callstack.pop();
            }
        }
    }
}

Note that this executor is generic over the type of the Generator’s Yield and Return associated types, so it can be used to create and run any generator we define.

And finally, this is how we can call the executor to run our transformed recursive function with an input parameter of 20:

let result = rec_execute!(&gen_count, 20); // result = 20

Advantages

With our simulator there is no recursion limit or stack overflow as with function and generator recursion because we now simulate the recursion.

$ cargo run normal 1000000
thread 'main' has overflowed its stack
fatal runtime error: stack overflow

$ cargo run simulate 1000000
result: 1000000

There is no boxing or dynamic allocation of the generator state because the generators don’t reference themselves and our executor knows the size of the state at compile time (by using the impl in return position of the gen_count). That means that the generator state is saved as a value inside our stack and not as a pointer to a boxed generator state. The only dynamic allocations that are happening are due to allocation and reallocation of our stack.

Disadvantages

Of course our implementation has disadvantages too.

We can’t pass local references to recursive calls. While this is possible with normal recursion we can’t do this here because the stack can get reallocated, invalidating the references.

Benchmarks of our executor show about a 4-times slowdown compared to a regular recursion (large: n = 500_000, medium: n = 50_000, small: n = 500).

test tests::bench_large_normal  ... bench:   1,317,526 ns/iter (+/- 318,654)
test tests::bench_large_rec     ... bench:   4,295,042 ns/iter (+/- 110,120)
test tests::bench_medium_normal ... bench:      73,901 ns/iter (+/- 6,572)
test tests::bench_medium_rec    ... bench:     206,235 ns/iter (+/- 31,714)
test tests::bench_small_normal  ... bench:         724 ns/iter (+/- 28)
test tests::bench_small_rec     ... bench:       2,437 ns/iter (+/- 134)

We probably incur most of the runtime penalty with runtime checking of the borrowing rules on the RefCell for every recursive call. But of course profiling would be needed to confirm that. Although custom iterative solutions will most certainly be faster than our concept I think it is still good enough if we just want to avoid the stack overflow.

On the usability side, we are limited to only one argument to pass to a recursive function, so for functions with multiple arguments we need to pack them into a tuple. Though I don’t see any way around this because we can yield only one value.

I also had some problems while implementing the concept. I just could not get the rec_execute function (the macro rec_execute! works fine) to accept the gen_count function as an argument. I have probably messed up the generic/lifetime definitions somewhere. I would really appreciate if someone could point me in the right direction. :)

But the thing that bothered me the most is that I had to use a RefCell to pass the return value of the recursive calls back to the callers.

Extended Generator Trait

The problem with the RefCell could be solved if the generators could receive values from yields, or to put it differently, if the resume function on the generators could accept a value to pass to the generator.

The Generator trait would need to be extended to something like:

pub trait Generator {
    type Resume;
    type Yield;
    type Return;
    unsafe fn resume(&mut self, Self::Resume value) -> GeneratorState<Self::Yield, Self::Return>;
}

and the old Generator trait converted to:

pub trait StaticGenerator : Generator {
    type Yield;
    type Return;
    unsafe fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return>;
}

impl<T,Y,R> StaticGenerator<Yield=Y, Return=R> for T
where T: Generator<Resume=(), Yield=Y, Return=R> {
    unsafe fn resume(&mut self) -> GeneratorState<Self::Yield, Self::Return> {
        resume(());
    }
}

This is all hand-waved together and probably isn’t written quite right since I wrote it from my head but I hope you get the point :).

The GeneratorState was not changed to include the Resume generic parameter from the Generator since I think that it is not needed, but I could be wrong.

With this extended Generator trait the modified recursive function from our example could then be implemented like this:

fn gen_count() -> impl Generator<Resume = u32, Yield = u32, Return = u32> {
    |n| match n {
        0 => 0,
        1 => 1,
        _ => yield (n - 1) + 1,
    }
}

Now this is a lot more similar to the normal recursive function and we don’t need to use a RefCell anymore to pass back the return value.

Also notice, not only that the yield returns a value now but the generator must also accept an argument of the same type. This is because the resume function is also used to start the generator and we must pass a value in every time it is called.

But the extended generator trait brings us to another problem where we want to pass a tuple or a () to a generator. How should we declare the generator arguments in these cases?

// For Generator<Resume = (u32, u32), Yield = (), Return = ()>
// Option A
|a,b| {
    let (ya, yb) = yield;
}
// Option B
|(a,b)| {
    let (ya, yb) = yield;
}

// For Generator<Resume = (), Yield = (), Return = ()>
// Option A
|| { yield; }
// Option B
|i| { yield; }

I like option A more and I guess it is a similar to unboxed closures, but it feels a little bit magic to me :).

Conclusion

I think that a cleaned up version of the presented concept could be a useful tool for those rare cases when we have a recursive algorithm hitting a stack overflow. Using this method by adding some boilerplate to an already written algorithm will be quite faster than writing a new custom iterative version. But it will perform slower then the recursive version (4x for the example presented here) and most certainly slower than a good custom iterative one.

It would be interesting to see the benchmarks of our example using the extended generator trait. We could get closer to its performance by changing our current implementation of simulated recursion to use unsafe raw pointers instead of a RefCell, but I think we would still be far off. I didn’t put too much thought into the extended generator trait but I think it is a direction worth exploring, even though it could introduce more complexity than it solves.