Efficiency and scale through work avoidance
# We build a somewhat large list of integers
integers = range(10000000)
That's 10,000,000 numbers. Not huge, but not small either. We need to be careful that we don't do anything too costly on these integers.
Testing for primality is costly. Lets do that!
import sympy # don't worry if you don't have this
from toolz import filter
primes = filter(sympy.isprime, integers)
Q: I thought testing primality was moderately expensive. Why did that return so quickly?
A: The toolz.filter
is lazy. No actual work was performed.
primes # not actually a list of primes
<itertools.ifilter at 0x12bb1950>
And yet we can still treat the variable primes
like a list of prime numbers. Lets print the first few
next(primes)
2
next(primes)
3
next(primes)
5
# Note that primes is not a list like integers. We can't index into it
primes[1000] # 1000th prime please
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-619b345268b1> in <module>() 1 # Note that primes is not a list like integers. We can't index into it 2 ----> 3 primes[1000] # 1000th prime please TypeError: 'itertools.ifilter' object has no attribute '__getitem__'
# But we *can* still iterate over it
for p in primes:
print p
if p > 100: # stop us from going through the whole list
break
7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101
Note that the first few primes 2, 3, 5, aren't included in this list. Items in lazy iterators are consumed as you use them, never to be seen again. If you really want to store the iterator, then call the list
constructor on it
primes = list(primes) # fully evaluate the lazy iterator, store it in a list
Laziness lets us talk about doing work without actually doing it. This separation from reality opens up new possibilities. For example, here are all of the fibonacci numbers
def fib():
""" A generator for all Fibonacci numbers """
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fibs = fib() # make a lazy iterator, fibs
print next(fibs)
print next(fibs)
print next(fibs)
print next(fibs)
0 1 1 2
Toolz provides functions to interact with lazy iterators.
from toolz import first, second, nth, drop, take
first(fib())
0
second(fib())
1
nth(10, fib())
55
take(20, fib()) # Yet another iterator
<itertools.islice at 0x12c3cf18>
list(take(20, fib())) # But this one is finite, so we can expand it with list
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
later_fibs = drop(100, fib())
next(later_fibs)
354224848179261915075L
# These all depend on itertools.islice
from itertools import islice
list(islice(fib(), 10, 20, 1)) # fib()[10:20:1]
[55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
Python's file
object is a lazy iterator. When I open a large text file Python doesn't actually read in all of the text at once
book = open('data/tale-of-two-cities.txt')
Each element of the book is a line of text.
print 1, next(book)
print 2, next(book)
print 3, next(book)
print 4, next(book)
1 The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens 2 3 This eBook is for the use of anyone anywhere at no cost and with 4 almost no restrictions whatsoever. You may copy it, give it away or
This is the Gutenberg project header. It is roughly 112 lines long. Lets start the book over and drop the header.
book = open('data/tale-of-two-cities.txt')
book = drop(112, book)
next(book)
'It was the best of times,\r\n'
That looks familiar. Lets strip the '\r\n'
from all of the lines though
from toolz import map # toolz.map is lazy too
book = map(str.strip, book)
next(book)
'it was the worst of times,'
Semantically, it is as if we applied str.strip
to every line in the book
In reality we've done no work.
Instead map
returned a new lazy iterator that draws an element from the original book
, applies str.strip
to it, and then yields the result. It only does this when we ask it to.
We can chain lazy operations on top of one another. After stripping each line we'll focus on those lines that contain the word "Mr." or "Miss" or "Mrs."
def good_line(line):
return "Mr." in line or "Miss" in line or "Mrs." in line
book = filter(good_line, book)
next(book)
'as at this. Mrs. Southcott had recently attained her five-and-twentieth'
# Lets take 10 lines from this iterator and display them
for line in take(10, book):
print 1, line
1 "Mr. Jarvis Lorry." 1 "Yes, Mr. Lorry." 1 "I know this messenger, guard," said Mr. Lorry, getting down into the 1 like a larger dog-kennel. Mr. Lorry, the passenger, shaking himself out 1 Mr. Lorry dropped off to sleep. The arrival of his breakfast roused him, 1 time to-day. She may ask for Mr. Jarvis Lorry, or she may only ask for a 1 When Mr. Lorry had finished his breakfast, he went out for a stroll on 1 again charged with mist and vapour, Mr. Lorry's thoughts seemed to cloud 1 Mr. Lorry had been idle a long time, and had just poured out his last 1 In a very few minutes the waiter came in to announce that Miss Manette
There are two main reasons for laziness
In each case we stay close to traditional Python syntax. We often write lazy code without realizing it.