This is a simple illustration of using
ThreadPool to parallelize downloads. Assumes that bandwidth is not the limiting factor, in which case concurrency doesn't help.
import requests from multiprocessing.pool import ThreadPool
Test a simple request to my slow server It just replies to any request for
/NUMBER with the number requested, but the server is artificially slow in its handling of requests.
%time r = requests.get("http://localhost:8888/10") r.content
CPU times: user 18.1 ms, sys: 4.54 ms, total: 22.6 ms Wall time: 224 ms
Our test function downloads the URL for a given ID, and parses the result (casts str of int to int).
def get_data(ID): """function for getting data from our slow server""" r = requests.get("http://localhost:8888/%i" % ID) return int(r.content)
Now test using a threadpool to get the data, using a varying number of concurrent threads
IDs = range(128) for nthreads in [1, 2, 4, 8, 16, 32]: pool = ThreadPool(nthreads) tic = time.time() result = pool.map(get_data, IDs) toc = time.time() print "%i threads: %3.1f seconds" % (nthreads, toc-tic)
1 threads: 26.2 seconds 2 threads: 13.3 seconds 4 threads: 6.7 seconds 8 threads: 3.4 seconds 16 threads: 1.8 seconds 32 threads: 1.1 seconds