operator++

The Generator Pattern: Lazy Lists in C/C++

Among the many pseudo-magical features of Haskell is something called lazy evaluation, which means that the values of variables aren’t computed until they’re actually needed.

Humour aside, though, this can be quite useful for data processing. It allows a programmer to define theoretically infinite arrays, then take only as many elements as he needs, or even stream the results to an output device (which can be modelled as an infinite buffer). The canonical example of a lazily-evaluated list in Haskell is this ‘infinite’ list of the Fibonacci numbers:

fibs :: [Integer]
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

The first line, for those unfamiliar with Haskell, declares fibs to be a list of integers. The second line is its definition. The : operator, called ‘cons’, joins an element to the beginning of a (singly-linked) list. zipWith is a function that takes a binary function and two lists, evaluates the function on each pair of corresponding elements, and returns a list of the results. Finally, the tail function returns everything but the first element of a list.

If this seems difficult to understand, the best way to wrap your head around it is to observe the construction of this list in action. We start with just two elements:

fibs:       1 1 ...
tail fibs:  1 ...
zipped:

Now do the first iteration of zipWith:

fibs:       1 1 ...
tail fibs:  1 ...
zipped:     2

However, although we are imagining these three lists as separate, they’re actually all part of the same list! So when elements are added to zipped, they’re added to fibs (and tail fibs) at the same time. The last operation actually did this:

fibs:       1 1 2 ...
tail fibs:  1 2 ...
zipped:     2

Repeat the operation ad infinitum, and you have an ‘infinite’ list of Fibonacci numbers:

fibs:       1 1 2 3  5  8 13 ...
tail fibs:  1 2 3 5  8 13 ...
zipped:     2 3 5 8 13 ...

There are lots of real-world problems with this same fundamental structure, where such features could be useful. The reason Haskell is so rarely used in production is that the kinds of programmers who work on such problems don’t like magic—and really don’t like linked lists. Haskell is a compiled language; you can look at the assembly the Haskell compiler generates from this code. I can’t make heads or tails of it, but I do know it’s not an optimal implementation of a Fibonacci generator, because I create something much closer to that later on in this post.

We’ll return to a (probably easier, and definitely more efficient) implementation of this concept in C. First, however, let’s take a look at some more conventional C implementations of the Fibonacci sequence.

unsigned fib (unsigned n)
{
    if (n < 2) return 1;
    else return fib(n - 1) + fib(n - 2);
}

This is the canonical recursive algorithm, in fact the mathematical definition of the series, often used to teach the concepts of recursion to introductory computer science students. The problem is that it’s slow. Specifically, it takes exponential time to compute, because every iteration has to make two recursive calls. Almost all of this computation is redundant, but the compiler is not (yet) smart enough to recognise that fact:

fib: # @fib
  pushq %rbp
  pushq %rbx
  pushq %rax
  movl $1, %ebp
  cmpl $2, %edi
  jb .LBB0_3
  movl %edi, %ebx
  movl $1, %ebp
.LBB0_2: # =>This Inner Loop Header: Depth=1
  leal -1(%rbx), %edi
  callq fib
  addl $-2, %ebx
  addl %eax, %ebp
  cmpl $1, %ebx
  ja .LBB0_2
.LBB0_3:
  movl %ebp, %eax
  addq $8, %rsp
  popq %rbx
  popq %rbp
  retq

This code will probably cause a stack overflow for any significant values of n, and will certainly take far longer than it needs to. The trick to optimising it is realising that fib(n - 2) has already been computed by the call to fib(n - 1), which had to call fib((n - 1) - 1). If you can cache this value, you can get an algorithm that runs in linear time, albeit a much messier one:

unsigned fib_linear (unsigned n)
{
    unsigned a = 1;
    unsigned b = 0;
    while (n--)
    {
        unsigned temp = a;
        a += b;
        b = temp;
    }
    return a;
}

No longer can you see a simple definition of the series just from looking at the code. It’s also almost four times longer, in terms of the number of statements. But the assembly looks much better:

fib_linear: # @fib_linear
  testl %edi, %edi
  je .LBB1_1
  movl $1, %ecx
  xorl %edx, %edx
.LBB1_3: # =>This Inner Loop Header: Depth=1
  movl %edx, %eax
  addl %ecx, %eax
  movl %ecx, %edx
  movl %eax, %ecx
  decl %edi
  jne .LBB1_3
  retq
.LBB1_1:
  movl $1, %eax
  retq

No recursive calls, no stack overflows, no exponential time. But why did we have to break our clean, concise recursive function to get this? Haskell provides the best of both worlds, despite the overhead.

If we’re trying to generate all Fibonnaci numbers from 1 to fib(n), as we were in the Haskell program, we can actually do something in C which is even more concise and readable than the Haskell program. The key pattern we need is something I’ve termed a generator. A generator is a pure (stateless) function which takes a pointer to an item in an array (as well as, potentially, other parameters), and writes a value to that address.

A generator for the Fibonacci numbers would look like this:

void fib_gen (unsigned *a)
{
    a[0] = a[-1] + a[-2];
}

(We can only write to a[0]. If we were to write to, say a[-1], it would be legal C, but would no longer be functionally pure and thus break the pattern.)

The assembly looks like this; each iteration ought to run in constant time, making the entire algorithm linear:

fib_gen: # @fib_gen
  movl -8(%rdi), %eax
  addl -4(%rdi), %eax
  movl %eax, (%rdi)
  retq

Unfortunately, the generator cannot encode the base case; we must do that at the call site, which might look something like this:

void fib_gen_test ()
{
    unsigned fibs[100] = {1, 1};
    for (int *it = fibs + 2; it < fibs + 100; it++)
        fib_gen(it);
}

The syntax is a bit unwieldy compared with Haskell’s, but that’s the nature of C; a C++ template function with a lambda could clean it up quite nicely, so we could do something like this:

unsigned fibs[100];
generate(fibs, 100, {1, 1}, [](unsigned *a){ a[0] = a[-1] + a[-2]; });

(The implementation of generate is left as an exercise for the reader.)

More importantly, however, this is done with near-zero overhead—certainly no linked lists. Fibonacci numbers are a fairly trivial example, but in theory, any task for which Haskell’s lazy lists, JavaScript’s or Python’s yield, or Rust’s iterators are a good fit can be done this way with less overhead. For example, here’s a (unsafe, not production-worthy) generator for prime numbers using the Sieve of Eratosthenes:

// Requires initial input {1, 2} (yes, I know 1 isn't a prime number,
// but it might as well be).
void primes_gen (unsigned *a)
{
    for (unsigned trial = a[-1] + 1; trial <= UINT_MAX; trial++)
    {
        bool success = true;
        // Count backwards through the list of already tested primes,
        // and test if the trial number is  divisible by any of the
        // primes less than trial.
        for (int i = -2; a[i] > 1; i--);
        {
            if (trial % a[i] == 0)
            {
                // Is composite, try again.
                success = false;
                break;
            }
        }
        if (success)
        {
            a[0] = trial;
            return;
        }
    }
}

This, in fact, cannot be done with any iterator implementation (JavaScript, Python, Rust, or any other) that I’ve ever heard of. I believe Haskell lazy lists can do it, albeit with more complexity (Haskell, being a pure functional language, cannot iterate well).

Functional programming, and in particular functional array programming, is a powerful tool for data processing (and all programming is fundamentally data processing), but the languages that work well for high-efficiency computing aren’t designed for such abstractions. Patterns like the generator can help C and C++ programmers get the best of functional programming without sacrificing the efficiency and control that those languages provide.

Godbolt link for all C code

Reddit discussion


comments powered by Disqus