Get the presentation
On Github IO: http://dstuebe.github.io/generators/
On NbViewer: http://nbviewer.ipython.org/github/dstuebe/generators/blob/gh-pages/presentation.ipynb
March 2, 2015
Copyright(C) 2015, David Stuebe
A kind of function that can return an intermediate result ("the next
value") to its caller, but maintaining the function's local state so
that the function can be resumed again right where it left off.
PEP 255 introduced the generator object and the yield statement in version 2.2 of Python.
def fib():
a, b = 0, 1
while True:
yield b
a, b = b, a+b
function_result = fib()
print function_result
<generator object fib at 0x112da18c0>
The result of calling the gererator function is a generator object. The generator object can be used as an iterator.
function_result.__iter__() is function_result
True
zip(xrange(10),function_result)
[(0, 1), (1, 1), (2, 2), (3, 3), (4, 5), (5, 8), (6, 13), (7, 21), (8, 34), (9, 55)]
(10, function_result.next())
(10, 89)
Inside the generator we can see that execution is paused after the yield and state is maintained between calls to next
def noisy_generator():
print ' Initializing'
print ' first yield'
yield "A"
print ' generator running...'
print ' second yield'
yield "B"
print ' generator running...'
print ' now what?'
g = noisy_generator()
print 'Calling next...'
print 'Result 1: %s' % g.next()
print 'Result 2: %s' % g.next()
print 'Result 3: %s' % g.next()
Calling next... Initializing first yield Result 1: A generator running... second yield Result 2: B generator running... now what?
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-118-a4dd887bc460> in <module>() 3 print 'Result 1: %s' % g.next() 4 print 'Result 2: %s' % g.next() ----> 5 print 'Result 3: %s' % g.next() StopIteration:
help(g)
Help on generator object: foo = class generator(object) | Methods defined here: | | __getattribute__(...) | x.__getattribute__('name') <==> x.name | | __iter__(...) | x.__iter__() <==> iter(x) | | __repr__(...) | x.__repr__() <==> repr(x) | | close(...) | close() -> raise GeneratorExit inside generator. | | next(...) | x.next() -> the next value, or raise StopIteration | | send(...) | send(arg) -> send 'arg' into generator, | return next yielded value or raise StopIteration. | | throw(...) | throw(typ[,val[,tb]]) -> raise exception in generator, | return next yielded value or raise StopIteration. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | gi_code | | gi_frame | | gi_running
help(g.throw)
Help on built-in function throw: throw(...) throw(typ[,val[,tb]]) -> raise exception in generator, return next yielded value or raise StopIteration.
Looks fun - lets try it...
def accepting_generator():
print 'Initializing'
try:
yield 'Hello'
print 'generator running...'
except StandardError as e:
print "Error Message: %s" % e
yield 'World'
g = accepting_generator()
g.next()
Initializing
'Hello'
g.throw(StandardError, 'Foo bar baz')
Error Message: Foo bar baz
'World'
g.next()
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-185-d7e53364a9a7> in <module>() ----> 1 g.next() StopIteration:
help(g.close)
Help on built-in function close: close(...) close() -> raise GeneratorExit inside generator.
def closeable():
try:
yield 1
yield 2
except GeneratorExit:
print 'closing'
g = closeable()
g.next()
1
g.close()
closing
g.next()
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-450-d7e53364a9a7> in <module>() ----> 1 g.next() StopIteration:
help(g.send)
Help on built-in function send: send(...) send(arg) -> send 'arg' into generator, return next yielded value or raise StopIteration.
g = fib()
g.send('foo')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-452-73713f4894f7> in <module>() 1 g = fib() ----> 2 g.send('foo') TypeError: can't send non-None value to a just-started generator
so next() is really just send(None)
g = fib()
g.send(None)
1
g.send(None)
1
What is send('foo') ?
g.send('foo')
2
Where did it go?
Coroutines are a natural way of expressing many algorithms, such as
simulations, games, asynchronous I/O, and other forms of event-
driven programming or co-operative multitasking. Python's generator
functions are almost coroutines -- but not quite -- in that they
allow pausing execution to produce a value, but do not provide for
values or exceptions to be passed in when execution resumes. They
also do not allow execution to be paused within the "try" portion of
try/finally blocks, and therefore make it difficult for an aborted
coroutine to clean up after itself.
PEP 342 added close, send and throw to the generator in version 2.5 of python and made yield an expresion rather than a statement.
def adder(val):
x = 0
while True:
x = yield val+x
g = adder(5)
g.send(None)
5
g.send(2)
7
g.send(6)
11
def noisy_coroutine():
print ' Initializing'
print ' first yield'
received = yield "A"
print ' generator running after receiving: %s' % received
print ' second yield'
received = yield "B"
print ' generator running after receiving: %s' % received
print ' now what?'
g = noisy_coroutine()
print 'Calling send...'
g.send(None)
Calling send... Initializing first yield
'A'
g.send(1)
generator running after receiving: 1 second yield
'B'
g.send(2)
generator running after receiving: 2 now what?
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-149-7a89fd35d27c> in <module>() ----> 1 g.send(2) StopIteration:
def foo():
x = yield
y = yield
z = yield
yield x, y, z
g = foo()
g.next()
g.send(1)
g.send(2)
print g.send(3)
(1, 2, 3)
def consumer(func):
def wrapper(*args,**kw):
gen = func(*args, **kw)
gen.next()
return gen
wrapper.__name__ = func.__name__
wrapper.__doc__ = func.__doc__
return wrapper
g = consumer(adder)(4)
print g.send(11)
print g.send(2)
15 6
@consumer
def subtractor(val):
'''A generator that subtracts numbers from a value'''
x = 0
while True:
x = yield x - val
g = subtractor(8)
g.send(15)
7
help(subtractor)
Help on function subtractor in module __main__: subtractor(*args, **kw) A generator that subtracts numbers from a value
So far we have looked at simple examples but we can use generators for:
# From http://www.dabeaz.com/coroutines/pipeline.py
def grep(pattern,lines):
for line in lines:
if pattern in line:
yield line
import time
def follow(thefile):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
yield line
# Set up a processing pipe : tail -f | grep python
with open("access-log") as logfile:
loglines = follow(logfile)
pylines = grep("python",loglines)
# Pull results out of the processing pipeline
for line in pylines:
print line,
217.237.150.206 - - [02/Apr/2015:00:47:42 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE060.HTM HTTP/1.0" 200 1686 129.106.32.126 - - [02/Apr/2015:00:47:43 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE006.HTM HTTP/1.0" 200 1254 66.91.239.214 - - [02/Apr/2015:00:47:44 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE014.HTM HTTP/1.0" 200 1232 217.219.18.80 - - [02/Apr/2015:00:47:46 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE090.HTM HTTP/1.0" 200 2001
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-418-8786720a03f2> in <module>() 21 22 # Pull results out of the processing pipeline ---> 23 for line in pylines: 24 print line, <ipython-input-418-8786720a03f2> in grep(pattern, lines) 1 # From http://www.dabeaz.com/coroutines/pipeline.py 2 def grep(pattern,lines): ----> 3 for line in lines: 4 if pattern in line: 5 yield line <ipython-input-418-8786720a03f2> in follow(thefile) 11 line = thefile.readline() 12 if not line: ---> 13 time.sleep(0.1) # Sleep briefly 14 continue 15 yield line KeyboardInterrupt:
import time
def follow(thefile, target):
thefile.seek(0,2) # Go to the end of the file
while True:
line = thefile.readline()
if not line:
time.sleep(0.1) # Sleep briefly
continue
target.send(line)
@consumer
def grep(pattern, target):
print "Looking for %s" % pattern
while True:
line = (yield)
if pattern in line:
target.send(line),
@consumer
def printer():
while True:
line = (yield)
print line,
with open("access-log") as logfile:
follow(logfile, grep('python',printer() ) )
Looking for python 66.249.65.37 - - [02/Apr/2015:00:51:57 -0600] "GET /papers/python.html HTTP/1.1" 404 133 128.135.125.245 - - [02/Apr/2015:00:51:58 -0600] "GET /python/tutorial/beazley_intro_python/intropy.pdf HTTP/1.0" 304 - 194.105.57.11 - - [02/Apr/2015:00:52:01 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE002.HTM HTTP/1.0" 200 1352 189.141.19.88 - - [02/Apr/2015:00:52:03 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE096.HTM HTTP/1.0" 200 1671 123.190.193.8 - - [02/Apr/2015:00:52:03 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE059.HTM HTTP/1.0" 200 1694
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-423-0ade070e4e58> in <module>() 24 25 with open("access-log") as logfile: ---> 26 follow(logfile,grep('python',printer() ) ) <ipython-input-423-0ade070e4e58> in follow(thefile, target) 5 line = thefile.readline() 6 if not line: ----> 7 time.sleep(0.1) # Sleep briefly 8 continue 9 target.send(line) KeyboardInterrupt:
# An object
class GrepHandler(object):
def __init__(self,pattern, target):
self.pattern = pattern
self.target = target
def send(self,line):
if self.pattern in line:
self.target.send(line)
# A coroutine
@consumer
def grep(pattern,target):
while True:
line = (yield)
if pattern in line:
target.send(line)
# A null-sink to send data
@consumer
def null():
while True: item = (yield)
# A benchmark
line = 'python is nice'
p1 = grep('python',null()) # Coroutine
p2 = GrepHandler('python',null()) # Object
from timeit import timeit
print "coroutine:", timeit("p1.send(line)", "from __main__ import line, p1")
print "object:", timeit("p2.send(line)", "from __main__ import line, p2")
coroutine: 0.326890945435 object: 0.381680011749
%timeit p1.send(line)
The slowest run took 9.04 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 317 ns per loop
Beware - ipython magic %timeit does not play nice with generator send calls!
The idea for this benchmark comes from a stack overflow question a-faster-nested-tuple-to-list-and-back
I'm trying to perform tuple to list and list to tuple conversions on nested sequences of unknown depth and shape. The calls are being made hundreds of thousands of times, which is why I'm trying to squeeze out as much speed as possible.
First, lets define a function to make some test data...
def make_test(m, n):
return [[range(m), make_test(m, n-1)] for i in range(n)]
make_test(2,3)
[[[0, 1], [[[0, 1], [[[0, 1], []]]], [[0, 1], [[[0, 1], []]]]]], [[0, 1], [[[0, 1], [[[0, 1], []]]], [[0, 1], [[[0, 1], []]]]]], [[0, 1], [[[0, 1], [[[0, 1], []]]], [[0, 1], [[[0, 1], []]]]]]]
def list2tuple(a):
return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))
def tuple2list(a):
return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))
list2tuple(make_test(2,3))
(((0, 1), (((0, 1), (((0, 1), ()),)), ((0, 1), (((0, 1), ()),)))), ((0, 1), (((0, 1), (((0, 1), ()),)), ((0, 1), (((0, 1), ()),)))), ((0, 1), (((0, 1), (((0, 1), ()),)), ((0, 1), (((0, 1), ()),)))))
make_test(2,3) == tuple2list(list2tuple(make_test(2,3)))
True
@consumer
def colist2tuple():
"""Convertes lists to tuples"""
result = []
while True:
lst, l2t = yield result
result = tuple((l2t.send((x, l2t)) if isinstance(x,list) else x for x in lst) )
l2t = colist2tuple()
l2t.send((make_test(2,3),l2t))
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-156-3030118d1c72> in <module>() 8 9 l2t = colist2tuple() ---> 10 l2t.send((make_test(2,3),l2t)) <ipython-input-156-3030118d1c72> in colist2tuple() 5 while True: 6 lst, l2t = yield result ----> 7 result = tuple((l2t.send((x, l2t)) if isinstance(x,list) else x for x in lst) ) 8 9 l2t = colist2tuple() <ipython-input-156-3030118d1c72> in <genexpr>((x,)) 5 while True: 6 lst, l2t = yield result ----> 7 result = tuple((l2t.send((x, l2t)) if isinstance(x,list) else x for x in lst) ) 8 9 l2t = colist2tuple() ValueError: generator already executing
Generators are not recursive...
@consumer
def colist2tuple():
"""Convertes lists to tuples"""
result = []
while True:
(lst, pool) = yield result
result = tuple((pool[0].send((x, pool[1:])) if isinstance(x,list) else x for x in lst) )
@consumer
def cotuple2list():
"""converts tuples to lists"""
result = None
while True:
(tup, pool) = (yield result)
result = list((pool[0].send((x, pool[1:])) if isinstance(x, tuple) else x for x in tup))
class GenPool:
def __init__(self, gen_func, depth):
self.pool = [gen_func() for i in xrange(depth) ]
def send(self,iterable):
return self.pool[0].send((iterable, self.pool[1:]))
l2t_pool = GenPool(colist2tuple,5)
t2l_pool = GenPool(cotuple2list,5)
make_test(3,2) == t2l_pool.send(l2t_pool.send(make_test(3,2)))
True
def inplace_list2tuple(a):
for (i,x) in enumerate(a):
if isinstance(x,list):
a[i] = inplace_list2tuple(x)
return tuple(a)
def inplace_tuple2list(a):
a = list(a) # can't really modify a tuple in place...
for (i,x) in enumerate(a):
if isinstance(x,tuple):
a[i] = inplace_tuple2list(x)
return a
make_test(2,3) == inplace_tuple2list(inplace_list2tuple(make_test(2,3)))
True
@consumer
def inplace_colist2tuple():
"""Convertes lists to tuples"""
result = None
while True:
(lst, co_pool) = (yield result)
for (i,x) in enumerate(lst):
if isinstance(x,list):
lst[i] = co_pool[0].send((x, co_pool[1:]))
result = tuple(lst)
@consumer
def inplace_cotuple2list():
"""converts tuples to lists"""
result = None
while True:
(tup, co_pool) = (yield result)
result = list(tup)
for (i,x) in enumerate(result):
if isinstance(x,tuple):
result[i] = co_pool[0].send((x, co_pool[1:]))
l2t_pool = GenPool(inplace_colist2tuple,5)
t2l_pool = GenPool(inplace_cotuple2list,5)
make_test(3,2) == t2l_pool.send(l2t_pool.send(make_test(3,2)))
True
import timeit
breadth,depth = 25,8
number,repeat = 2,3
l2t_pool = GenPool(colist2tuple,breadth)
t2l_pool = GenPool(cotuple2list,breadth)
inplace_l2t_pool = GenPool(inplace_colist2tuple,breadth)
inplace_t2l_pool = GenPool(inplace_cotuple2list,breadth)
# Compare round trip operations
print "Generator %s" % ["%0.5f" % (v/number) for v in timeit.repeat('t2l_pool.send(l2t_pool.send(test_data))', setup='from __main__ import t2l_pool, l2t_pool, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
print "Recursive %s" % ["%0.5f" % (v/number) for v in timeit.repeat('tuple2list(list2tuple(test_data))', setup='from __main__ import tuple2list, list2tuple, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
Generator ['3.18855', '3.21685', '2.94067'] Recursive ['2.46753', '2.82640', '3.27728']
# Compare round trip operations using inplace for list to tuple
print "Generator %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_t2l_pool.send(inplace_l2t_pool.send(test_data))', setup='from __main__ import inplace_l2t_pool, inplace_t2l_pool, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
print "Recursive %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_tuple2list(inplace_list2tuple(test_data))', setup='from __main__ import inplace_tuple2list, inplace_list2tuple, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
Generator ['1.54234', '1.54228', '1.51895'] Recursive ['1.38003', '1.37990', '1.37279']
# Compare just list2tuple using inplace
print "Generator %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_l2t_pool.send(test_data)', setup='from __main__ import inplace_l2t_pool, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
print "Recursive %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_list2tuple(test_data)', setup='from __main__ import inplace_list2tuple, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
Generator ['0.51324', '0.51948', '0.50737'] Recursive ['0.44689', '0.45245', '0.44493']
Generators are not magic - a function call is still a function call and the gil still serializes the process.
test_data = make_test(breadth,depth)
# Courtine
%timeit t2l_pool.send(l2t_pool.send(test_data))
# Recursive
%timeit tuple2list(list2tuple(test_data))
1 loops, best of 3: 305 ms per loop 1 loops, best of 3: 271 ms per loop
inplace_l2t_pool = GenPool(inplace_colist2tuple,breadth)
inplace_t2l_pool = GenPool(inplace_cotuple2list,breadth)
test_data = make_test(breadth,depth)
%timeit inplace_t2l_pool.send(inplace_l2t_pool.send(test_data))
test_data = make_test(breadth,depth)
%timeit inplace_tuple2list(inplace_list2tuple(test_data))
100 loops, best of 3: 15.3 ms per loop 100 loops, best of 3: 13.5 ms per loop
# Generator
test_data = make_test(breadth,depth)
%timeit inplace_l2t_pool.send(test_data)
# Recursive
test_data = make_test(breadth,depth)
%timeit inplace_list2tuple(test_data)
The slowest run took 51639.38 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 2.63 µs per loop The slowest run took 56308.31 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 2.17 µs per loop
import cython
cython_list2tuple, cython_tuple2list = cython.inline(
"""
@cython.profile(True)
def cython_list2tuple(a):
return tuple([cython_list2tuple(x) if type(x)==list else x for x in a])
@cython.profile(True)
def cython_tuple2list(a):
return [cython_tuple2list(x) if type(x)==tuple else x for x in a]
"""
).values() # it returns a dict of named functions
make_test(3,2) == cython_tuple2list(cython_list2tuple(make_test(3,2)))
True
print "Cython: %s" % ["%0.5f" % (v/number) for v in timeit.repeat('cython_tuple2list(cython_list2tuple(t))', setup='from __main__ import cython_list2tuple, cython_tuple2list, make_test, depth, breadth; t = make_test(breadth, depth)', number=number, repeat=repeat)]
Cython: ['0.40505', '0.38909', '0.40008']
As expected, cython is blazing fast
import cProfile
test_data = make_test(breadth, depth)
cProfile.run('tuple2list(list2tuple(test_data));')
13590406 function calls (6137622 primitive calls) in 5.433 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 328801/1 0.942 0.000 2.520 2.520 <ipython-input-154-19d1758e8e98>:1(list2tuple) 3397601/9 1.042 0.000 2.520 0.280 <ipython-input-154-19d1758e8e98>:2(<genexpr>) 328801/1 1.159 0.000 2.853 2.853 <ipython-input-154-19d1758e8e98>:4(tuple2list) 3397601/9 1.121 0.000 2.853 0.317 <ipython-input-154-19d1758e8e98>:5(<genexpr>) 1 0.060 0.060 5.433 5.433 <string>:1(<module>) 6137600 1.109 0.000 1.109 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
test_data = make_test(breadth, depth)
cProfile.run('inplace_tuple2list(inplace_list2tuple(test_data));')
6795204 function calls (6137604 primitive calls) in 3.892 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 328801/1 1.019 0.000 1.614 1.614 <ipython-input-158-935485dfecde>:1(inplace_list2tuple) 328801/1 1.684 0.000 2.228 2.228 <ipython-input-158-935485dfecde>:7(inplace_tuple2list) 1 0.050 0.050 3.892 3.892 <string>:1(<module>) 6137600 1.140 0.000 1.140 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
test_data = make_test(breadth, depth)
cProfile.run('cython_tuple2list(cython_list2tuple(test_data));')
657604 function calls (4 primitive calls) in 1.422 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.053 0.053 1.422 1.422 <string>:1(<module>) 328801/1 0.689 0.000 0.689 0.689 _cython_inline_2b6dbefcfcbecd965f1c99dd441375e0.pyx:12(cython_tuple2list) 328801/1 0.680 0.000 0.680 0.680 _cython_inline_2b6dbefcfcbecd965f1c99dd441375e0.pyx:8(cython_list2tuple) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
test_data = make_test(breadth, depth)
cProfile.run('t2l_pool.send(l2t_pool.send(test_data))')
14248010 function calls (6137626 primitive calls) in 5.771 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 328801/1 0.804 0.000 2.583 2.583 <ipython-input-157-d591fe3bcabe>:1(colist2tuple) 3397601/9 1.090 0.000 3.188 0.354 <ipython-input-157-d591fe3bcabe>:15(<genexpr>) 2 0.000 0.000 5.771 2.886 <ipython-input-157-d591fe3bcabe>:20(send) 3397601/9 1.147 0.000 2.583 0.287 <ipython-input-157-d591fe3bcabe>:7(<genexpr>) 328801/1 1.505 0.000 3.188 3.188 <ipython-input-157-d591fe3bcabe>:9(cotuple2list) 1 0.000 0.000 5.771 5.771 <string>:1(<module>) 6137600 1.098 0.000 1.098 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 657602/2 0.128 0.000 5.771 2.886 {method 'send' of 'generator' objects}
test_data = make_test(breadth, depth)
cProfile.run('inplace_t2l_pool.send(inplace_l2t_pool.send(test_data))')
7452808 function calls (6137608 primitive calls) in 3.864 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 2 0.000 0.000 3.864 1.932 <ipython-input-157-d591fe3bcabe>:20(send) 328801/1 1.060 0.000 1.665 1.665 <ipython-input-166-ce8d713e0c92>:1(inplace_colist2tuple) 328801/1 1.594 0.000 2.199 2.199 <ipython-input-166-ce8d713e0c92>:12(inplace_cotuple2list) 1 0.000 0.000 3.864 3.864 <string>:1(<module>) 6137600 1.102 0.000 1.102 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 657602/2 0.108 0.000 3.864 1.932 {method 'send' of 'generator' objects}
import numpy
import itertools
# make some data
n = 1000000
a = numpy.random.randn(n)
b = numpy.random.randn(n)
number,repeat = 2,3
print "Numpy Dot Product"
print "Value %s" % numpy.dot(a,b)
print "Time %s" % ["%0.5f" % (v/number) for v in timeit.repeat('numpy.dot(a,b)', setup='from __main__ import numpy, a, b', number=number, repeat=repeat)]
Numpy Dot Product Value 731.284297837 Time ['0.00087', '0.00082', '0.00081']
def naive_loop(it):
result = numpy.float64(0.0)
for (a_val,b_val) in it:
result += a_val*b_val
return result
print "Naive Dot Product"
print "Value: %s" % naive_loop(numpy.nditer([a, b]))
print "Time: %s" % ["%0.5f" % (v/number) for v in timeit.repeat('naive_loop(it)', setup='from __main__ import numpy, naive_loop, a, b; it = numpy.nditer([a, b])', number=number, repeat=repeat)]
Naive Dot Product Value: 731.284297837 Time: ['0.32970', '0.33121', '0.33032']
@consumer
def mult(target=None):
result = None
while True:
(a, b) = (yield result)
result = target.send(a*b)
@consumer
def add():
result = numpy.float64(0.0)
while True:
m = (yield result)
result += m
dot_product_process = mult(add())
def gen_loop(it):
dot_product = None
for a_val,b_val in it:
dot_product = dot_product_process.send((a_val, b_val))
return dot_product
print "Coroutinte Dot Product"
print "Value: %s" % gen_loop(numpy.nditer([a, b]))
print "Time %s" % ["%0.5f" % (v/number) for v in timeit.repeat('gen_loop(it)', setup='from __main__ import numpy, gen_loop, a, b; it = numpy.nditer([a, b])', number=number, repeat=repeat)]
Coroutinte Dot Product Value: 731.284297837 Time ['0.49758', '0.49486', '0.50892']
# Coroutine processes without using return...
@consumer
def mult_allocated_result(target):
while True:
(a, b) = (yield)
result = target.send(a*b)
@consumer
def add_allocated_result(result):
while True:
m = (yield)
result[0] += m
result = numpy.zeros((1,))
dot_product_process_noresult = mult_allocated_result(add_allocated_result(result))
def allocated_result_loop(it):
dot_product = None
for a_val,b_val in it:
dot_product_process_noresult.send((a_val, b_val))
print "Coroutine Dot Product:"
allocated_result_loop(numpy.nditer([a, b]))
print "Value: %s" % result
print "Time %s" % ["%0.5f" % (v/number) for v in timeit.repeat('allocated_result_loop(it)', setup='from __main__ import numpy, allocated_result_loop, a, b; it = numpy.nditer([a, b])', number=number, repeat=repeat)]
Coroutine Dot Product: Value: [ 731.28429784] Time ['0.59082', '0.57764', '0.61219']
n = 100000
a = numpy.random.randn(n)
b = numpy.random.randn(n)
cProfile.run('gen_loop(numpy.nditer([a, b]))')
400003 function calls (300003 primitive calls) in 0.205 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 100000 0.091 0.000 0.138 0.000 <ipython-input-65-4e20301a8ce3>:1(mult) 1 0.046 0.046 0.205 0.205 <ipython-input-65-4e20301a8ce3>:17(gen_loop) 100000 0.026 0.000 0.026 0.000 <ipython-input-65-4e20301a8ce3>:8(add) 1 0.000 0.000 0.205 0.205 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 200000/100000 0.042 0.000 0.159 0.000 {method 'send' of 'generator' objects}
cProfile.run('naive_loop(numpy.nditer([a, b]))')
3 function calls in 19.887 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 19.887 19.887 19.887 19.887 <ipython-input-66-21b9c16a1a7f>:1(naive_loop) 1 0.000 0.000 19.887 19.887 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Unfortunately, we don't get a lot of insight into why the naive loop is soo slow.
TARGETS = 'targets'
class AnalysisWindowComplete(Exception):
"""Used in coroutine control flow, this is not an error condition."""
@consumer
def myprinter(name):
while True:
p = (yield)
print " PrinterName: %s; says: %s" % (name, p)
@consumer
def average(targets):
try:
while True:
cnt, result = 0, 0.0
try:
while True:
result += (yield)
cnt += 1
except AnalysisWindowComplete as wc:
print ' In complete with:', wc
for target in targets:
target.send(result/cnt)
except (ValueError, IndexError) as e:
raise
avg_co = average([myprinter('#1'),] )
avg_co.send(5.)
avg_co.send(5.)
avg_co.send(5.)
avg_co.send(4.)
print "Call Complete"
avg_co.throw(AnalysisWindowComplete,'foobar')
Call Complete In complete with: foobar printer #1 says: 4.75
def get_targets(co):
'''Get the targets from the generators frame locals'''
try:
return co.gi_frame.f_locals[TARGETS]
except KeyError as ke:
raise KeyError('No target key found?')
def set_targets(co, targets):
"""Set new targets after the generator has started!"""
t = get_targets(co)
while len(t) > 0:
t.pop()
for target in targets:
t.append(target)
# Now continue using the same coroutine workflow for next analysis
avg_co.send(6)
avg_co.send(6)
avg_co.send(7)
# change where things go...
set_targets(avg_co,(myprinter('#2'),myprinter('#3')) )
avg_co.throw(AnalysisWindowComplete,'Print twice!')
In complete with: Print twice! printer #2 says: 6.33333333333 printer #3 says: 6.33333333333
Generators can be used to build some really powerful data processing flows
Pitfalls
With lots more pictures than this talk...
http://www.dabeaz.com/generators-uk/
http://www.dabeaz.com/coroutines/
http://www.dabeaz.com/finalgenerator/
Combining Context Managers, Decorators and Generators in Python 3.0 for control flow of inline thread execution.
Get the presentation On Github IO: http://dstuebe.github.io/generators/
On NbViewer: http://nbviewer.ipython.org/github/dstuebe/generators/blob/gh-pages/presentation.ipynb