This is a simple illustration of using ThreadPool to parallelize downloads. Assumes that bandwidth is not the limiting factor, in which case concurrency doesn't help.

In [1]:
import requests


Test a simple request to my slow server It just replies to any request for /NUMBER with the number requested, but the server is artificially slow in its handling of requests.

In [2]:
%time r = requests.get("http://localhost:8888/10")
r.content

CPU times: user 18.1 ms, sys: 4.54 ms, total: 22.6 ms
Wall time: 224 ms

Out[2]:
'10'

Our test function downloads the URL for a given ID, and parses the result (casts str of int to int).

In [3]:
def get_data(ID):
"""function for getting data from our slow server"""
r = requests.get("http://localhost:8888/%i" % ID)
return int(r.content)


Now test using a threadpool to get the data, using a varying number of concurrent threads

In [4]:
IDs = range(128)
for nthreads in [1, 2, 4, 8, 16, 32]:

1 threads: 26.2 seconds