Multiprocessing

Basics

In [1]:
import time
import datetime

Define a function that represents some work.

In [3]:
def work(x):
    start_time = time.time()
    # Our work takes x seconds
    time.sleep(x)
    end_time = time.time()
    return {'start': start_time, 'end_time': end_time}

I want to call this several times.

In [4]:
import numpy as np
In [5]:
np.random.seed(1045)
job_times = np.random.uniform(0.08, 0.12, 12)
job_times
Out[5]:
array([ 0.11018225,  0.1014037 ,  0.09888686,  0.11039241,  0.11638768,
        0.09277434,  0.08017236,  0.10559411,  0.09655754,  0.09319494,
        0.08069176,  0.1141882 ])

Iterating

for

In [6]:
results = []
for t in job_times:
    results.append(work(t))

results
Out[6]:
[{'end_time': 1366293797.804116, 'start': 1366293797.693243},
 {'end_time': 1366293797.906109, 'start': 1366293797.804119},
 {'end_time': 1366293798.005102, 'start': 1366293797.906111},
 {'end_time': 1366293798.116095, 'start': 1366293798.005103},
 {'end_time': 1366293798.233087, 'start': 1366293798.116096},
 {'end_time': 1366293798.326081, 'start': 1366293798.233089},
 {'end_time': 1366293798.40708, 'start': 1366293798.326083},
 {'end_time': 1366293798.51307, 'start': 1366293798.407084},
 {'end_time': 1366293798.610063, 'start': 1366293798.513072},
 {'end_time': 1366293798.704059, 'start': 1366293798.610066},
 {'end_time': 1366293798.785053, 'start': 1366293798.704063},
 {'end_time': 1366293798.900046, 'start': 1366293798.785054}]

map

In [7]:
results = map(work, job_times)
results
Out[7]:
[{'end_time': 1366293846.201056, 'start': 1366293846.089965},
 {'end_time': 1366293846.303049, 'start': 1366293846.201058},
 {'end_time': 1366293846.402042, 'start': 1366293846.30305},
 {'end_time': 1366293846.513035, 'start': 1366293846.402043},
 {'end_time': 1366293846.630027, 'start': 1366293846.513037},
 {'end_time': 1366293846.723025, 'start': 1366293846.630029},
 {'end_time': 1366293846.804016, 'start': 1366293846.723028},
 {'end_time': 1366293846.910011, 'start': 1366293846.804018},
 {'end_time': 1366293847.007004, 'start': 1366293846.910013},
 {'end_time': 1366293847.100998, 'start': 1366293847.007006},
 {'end_time': 1366293847.181994, 'start': 1366293847.100999},
 {'end_time': 1366293847.296985, 'start': 1366293847.181995}]

Entire program

In [8]:
tic = time.time()

# Create job list
np.random.seed(1045)
job_times = np.random.uniform(0.8, 1.2, 12)

# do something else
time.sleep(1)

# Call the map function
results = map(work, job_times)

# Wrap up
time.sleep(1)

print time.time() - tic
14.0075240135

Let's speed things up!

A quick note: We spent about 12 seconds in the map function and 2 seconds elsewhere.

In [9]:
import multiprocessing

Create a group of workers (Pool)

In [10]:
num_cores = multiprocessing.cpu_count()
num_cores
Out[10]:
12
In [11]:
pool = multiprocessing.Pool(num_cores)
In [12]:
tic = time.time()
results = pool.map(work,job_times)
print time.time() - tic
1.16631388664

We went from 12 seconds to 1.16 seconds.

How easy was that!

  • zero installation
  • Added one import
  • Added two lines of code (which could be one)
  • Added a pool in front of my map (which could be a single character)

1 import, 1 line, 1 character

Speedup and Efficiency

In [13]:
import pandas as pd
In [14]:
num_processors = np.array([1,2,4,6,12])
In [15]:
def measure_time(np):
    
    tic = time.time()
    time.sleep(1)
    pool = multiprocessing.Pool(np)
    results = pool.map(work,job_times)
    time.sleep(1)
    return time.time() - tic
In [16]:
time = map(measure_time, num_processors)
In [24]:
data = pd.DataFrame(time, index=num_processors, columns=['time'])
data['time'].plot(figsize=(8.0, 6.0))
Out[24]:
<matplotlib.axes.AxesSubplot at 0x41e6ad0>
In [18]:
data
Out[18]:
time
1 14.009648
2 8.139494
4 5.309660
6 4.317720
12 3.185784

Speedup

speedup = sequential time / parallel time

Sequential time

In [19]:
data.ix[1]['time']
Out[19]:
14.009648084640503

Speedup

In [33]:
data['speedup'] = float(data.ix[1]['time'])/data['time']
In [34]:
data['speedup'].plot(figsize=(8.0, 6.0))
Out[34]:
<matplotlib.axes.AxesSubplot at 0x4925d50>

Efficiency

Measures processor utilization

efficiency = sequential time / ( number of processors $\cdot$ parallel time )

In [35]:
data['efficiency'] = float(data.ix[1]['time'])/(data['time'] * data.index)
In [36]:
data['efficiency'].plot(figsize=(8.0, 6.0))
Out[36]:
<matplotlib.axes.AxesSubplot at 0x4c36c90>

Using 12 processors, we are 38% efficient.

  • 1 processor is used 100% of the time
  • 11 are used about 30%

Is there anything we can do about it?

Karp-Flatt Metric

Peformance Categories

  • Sequential computation
  • Parallel computation
  • Parallel overhead

The Karp-Flatt metric helps you understand your performance.

$\text{serial fraction} = e = \frac{(1/s - 1/p)}{(1 - 1/p)}$

In [37]:
data['karpflatt'] = ( 1/data['speedup'] - 1.0/data.index )/(1 - 1.0/data.index)
In [38]:
data['karpflatt']
Out[38]:
1          NaN
2     0.161984
4     0.172000
6     0.169835
12    0.157163
Name: karpflatt

If the Karp-Flatt metric returns a non-increasing serial fraction, then the poor performance is due to the sequential work the program is doing.

If the metric increases, then the poor performance is due to parallel overhead.

Create some poor parallel performance

Add a penalty for a larger number of cores

In [40]:
data.index*0.1
Out[40]:
Int64Index([0.1, 0.2, 0.4, 0.6, 1.2], dtype=int64)
In [41]:
data['time2'] = data['time'] + data.index*0.1
In [42]:
data['speedup2'] = float(data.ix[1]['time2'])/data['time2']
data['karpflatt2'] = ( 1/data['speedup2'] - 1.0/data.index )/(1 - 1.0/data.index)
In [49]:
print data['time'], '\n', data['time2']
1     14.009648
2      8.139494
4      5.309660
6      4.317720
12     3.185784
Name: time 
1     14.109648
2      8.339494
4      5.709660
6      4.917720
12     4.385784
Name: time2

In [50]:
data['karpflatt2']
Out[50]:
1          NaN
2     0.182098
4     0.206218
6     0.218243
12    0.248185
Name: karpflatt2

Amdahl's Law

In [99]:
num_processors = np.arange(1,100)
In [100]:
num_processors
Out[100]:
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
       69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
       86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
In [101]:
ideal = num_processors
In [102]:
data = pd.DataFrame(ideal, index=num_processors, columns=['ideal'])
data['ideal'].plot(figsize=(8.0, 6.0))
Out[102]:
<matplotlib.axes.AxesSubplot at 0x7527a50>
In [103]:
serial_fraction = 0.01
data['t1'] = serial_fraction + (1-serial_fraction)/data.index
In [104]:
data['t1'].plot()
Out[104]:
<matplotlib.axes.AxesSubplot at 0x760f850>
In [105]:
serial_fraction = 0.1
data['t2'] = serial_fraction + (1-serial_fraction)/data.index
data[['t1','t2']].plot()
Out[105]:
<matplotlib.axes.AxesSubplot at 0x7969d50>

Speedup

In [109]:
data['speedup1'] = float(data.ix[1]['t1'])/data['t1']
data['speedup2'] = float(data.ix[1]['t2'])/data['t2']
In [111]:
data['speedup2']
Out[111]:
1     1.000000
2     1.818182
3     2.500000
4     3.076923
5     3.571429
6     4.000000
7     4.375000
8     4.705882
9     5.000000
10    5.263158
11    5.500000
12    5.714286
13    5.909091
14    6.086957
15    6.250000
16    6.400000
17    6.538462
18    6.666667
19    6.785714
20    6.896552
21    7.000000
22    7.096774
23    7.187500
24    7.272727
25    7.352941
26    7.428571
27    7.500000
28    7.567568
29    7.631579
30    7.692308
31    7.750000
32    7.804878
33    7.857143
34    7.906977
35    7.954545
36    8.000000
37    8.043478
38    8.085106
39    8.125000
40    8.163265
41    8.200000
42    8.235294
43    8.269231
44    8.301887
45    8.333333
46    8.363636
47    8.392857
48    8.421053
49    8.448276
50    8.474576
51    8.500000
52    8.524590
53    8.548387
54    8.571429
55    8.593750
56    8.615385
57    8.636364
58    8.656716
59    8.676471
60    8.695652
61    8.714286
62    8.732394
63    8.750000
64    8.767123
65    8.783784
66    8.800000
67    8.815789
68    8.831169
69    8.846154
70    8.860759
71    8.875000
72    8.888889
73    8.902439
74    8.915663
75    8.928571
76    8.941176
77    8.953488
78    8.965517
79    8.977273
80    8.988764
81    9.000000
82    9.010989
83    9.021739
84    9.032258
85    9.042553
86    9.052632
87    9.062500
88    9.072165
89    9.081633
90    9.090909
91    9.100000
92    9.108911
93    9.117647
94    9.126214
95    9.134615
96    9.142857
97    9.150943
98    9.158879
99    9.166667
Name: speedup2, Length: 99
In [107]:
data[['ideal','speedup1','speedup2']].plot()
Out[107]:
<matplotlib.axes.AxesSubplot at 0x7a29d50>

$T = $ Total time

$S = $ Serial fraction

$n = $ Number of processors

$P = $ Parallel Time $ = (T - S)/n$

$Speedup = T / (S + P)$

But, as $n$ gets large, P gets small.

$Speedup \approx T/S$

In our example, T = 1.0 and S = 0.01 and 0.1

s1 = 1/0.01 = 100x

s2 = 1/0.1 = 10x

In []:
 
Back to top