To Infinity
In the programming language Scheme the concept of infinite lists can be implemented using streams. A stream is really nothing more than a cons cell that contains a value in the cdr field and a pointer to some code to generate the cdr field. While simple in design, the concept is powerful, allowing the programmer to create infinite sequences by simply defining a function to generate the next value in the sequence.
Python supports a similar concept in the form of generator functions. The implementation here is a little more complex, but it can be thought of as a function that returns a result, but maintains its current state for later execution.
Both can be used to write rather elegant code. Take for instance the Sieve of Eratosthenes. This can be implemented in Scheme like so (borrowed from SICP):
(define (integers-starting-from n) (cons-stream n (integers-starting-from (+ n 1)))) (define (divisible? x y) (= (remainder x y) 0)) (define (sieve stream) (cons-stream (stream-car stream) (sieve (stream-filter (lambda (x) (not (divisible? x (stream-car stream)))) (stream-cdr stream))))) (define primes (sieve (integers-starting-from 2)))
In this code, a new stream is created to filter out the unwanted multiples from the previous stream.
Similarly, it can be expressed in Python using generators:
def integers(n): while 1: yield n n = n + 1 def filter_mul(n, s): for i in s: if (i % n): yield i def sieve(stream): while 1: head = stream.next() yield head stream = filter_mul(head, stream)
The expression is elegant and theoretically allows us to represent an infinite list that is evaluated lazily. However, the reality strays from the theory. While this code works just fine to find small primes, it breaks for larger values:
;Loading "sieve.scm"... done 1 ]=> (stream-ref primes 1000) ;Aborting!: out of memory
>>> from sieve import * >>> primes = sieve(integers(2)) >>> [primes.next() for x in range(1000)] RuntimeError: maximum recursion depth exceeded
In both cases, we’ve exhausted the language implementation’s ability to hold the data structures necessary to represent the lazy lists. The result is elegant code that fail rather quickly. Now, we could always buy ourselves a little more execution time by increasing the heap for Scheme and the recursion depth for Python, but that will only take us so far.
So, that leaves me wondering: is there a similarly elegant way to write code of this sort that won’t expire so quickly? To clarify, I’d like to keep using streams and generators, but find some way to limit the overhead incurred from the creation of so many filters (1 for each prime!) without resorting to a different algorithm.
Update:
I want to address the alternative primes calculation listed in SICP:
(define primes (cons-stream 2 (stream-filter prime? (integers-starting-from 3)))) (define (prime? n) (define (iter ps) (cond ((> (square (stream-car ps)) n) true) ((divisible? n (stream-car ps)) false) (else (iter (stream-cdr ps))))) (iter primes))
This procedure is still recursive (primes uses prime? which uses primes), but has much less memory overhead — I managed to generate primes much larger than the 1000th without hitting memory limits. However, this solution strikes me as not quite the same in spirit as the first. Here, a given integer is tested by comparing it to the current list of generated primes while the first definition is building up a function to generate the next prime without having to explicitly scan the list of the already generated primes.
Hey Doug,
It’s cool to see this stuff in Python. But just to be a spoiler, here’s my version. This is the kind of thing that Haskell excels at:
FILE:sieve.hs
divides x y = (y `rem` x) == 0
sieve (x:xs) = x:(sieve (filter (not . (x `divides`)) xs))
primes = sieve [2..]
main = print (take 10000 primes)
EOF
Interpreted:
$ ghci sieve.hs
*Main> take 1000 primes
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53 …
Compiled (note 10,000 to get some good run time):
$ ghc –make sieve
$ time ./sieve >/dev/null
real 0m13.462s
user 0m13.405s
sys 0m0.016s
I’m sure there are many other impressive ways to do this in Haskell as well. This was just what I could come up with quickly based on the SICP version posted here.
Thanks, Tim.
This Haskell definition is very concise and elegant.
I’m planning to follow up this post with an implementation in Stackless Python. I haven’t used it before, but from what I understand it might provide the language implementation necessary to support this sort of implementation without incurring too much overhead.
Here’s an implementation in C# 3.0:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace Primes
{
static class Program
{
static void Main(string[] args)
{
var primes = Primes.TakeWhile((_, k) => k < 1000);
foreach (var p in primes)
{
Console.WriteLine(p);
}
}
static IEnumerable Primes
{
get
{
return RSieve(NaturalsFrom2);
}
}
static IEnumerable RSieve(IEnumerable stream)
{
while (true)
{
yield return stream.First();
stream = Sieve(stream);
}
}
static IEnumerable Sieve(IEnumerable stream)
{
var head = stream.First();
var tail = stream.Skip(1);
return from n in tail where n % head != 0 select n;
}
static IEnumerable NaturalsFrom2
{
get
{
int n = 2;
while (true) yield return n++;
}
}
}
}
I also wrote this version in Scheme which looks really clever since it finds primes without checking for divisibility. Unfortunately it exhausts available memory very quickly.
; Returns a stream containing the set difference between two monotonically-increasing streams of integers,
; i.e. all elements of a that don’t appear in b.
(define (except a b)
(if (=
(head a)
(head b))
(except
(tail a)
(tail b))
(cons-stream
(head a)
(except
(tail a)
b))))
(define (sieve stream)
(except
stream
(stream-map
(lambda (x) (* x (head stream)))
stream)))
(define (rsieve stream)
(cons-stream
(head stream)
(rsieve (tail (sieve stream)))))
(define naturals-from-2 (cons-stream 2 (stream-map 1+ naturals-from-2)))
(define primes (rsieve naturals-from-2))
Hi Ron, thanks for contributing.
I’m curious, how does the C# code compare in terms of memory consumption during execution? Does it pass the “generate the 1000th prime” test? I’d run it myself, but I don’t have a C# stack up and running on my machine.