This is a simple illustration of using ThreadPool
to parallelize downloads.
Assumes that bandwidth is not the limiting factor, in which case concurrency doesn't help.
import requests
from multiprocessing.pool import ThreadPool
Test a simple request to my slow server
It just replies to any request for /NUMBER
with the number requested,
but the server is artificially slow in its handling of requests.
%time r = requests.get("http://localhost:8888/10")
r.content
CPU times: user 18.1 ms, sys: 4.54 ms, total: 22.6 ms Wall time: 224 ms
'10'
Our test function downloads the URL for a given ID, and parses the result (casts str of int to int).
def get_data(ID):
"""function for getting data from our slow server"""
r = requests.get("http://localhost:8888/%i" % ID)
return int(r.content)
Now test using a threadpool to get the data, using a varying number of concurrent threads
IDs = range(128)
for nthreads in [1, 2, 4, 8, 16, 32]:
pool = ThreadPool(nthreads)
tic = time.time()
result = pool.map(get_data, IDs)
toc = time.time()
print "%i threads: %3.1f seconds" % (nthreads, toc-tic)
1 threads: 26.2 seconds 2 threads: 13.3 seconds 4 threads: 6.7 seconds 8 threads: 3.4 seconds 16 threads: 1.8 seconds 32 threads: 1.1 seconds