import cv2
Low-level code written in Python, as looping in big arrays, can be slow, mainly because Python is dynamically typed and interpreted. However, in the scientific computing environment described above, this is rarely a problem: the OpenCV interface just access optimized C/C++ code, and most of the software in the SciPy Stack relies in a base of numerical software implemented in C, C++ and Fortran, including the efficient NumPy arrays.
But in the few situations where a low-level looping must be implemented (if the task cannot be implemented using NumPy capabilities) or the functionality of a external library is needed, Cython raises as an alternative. Cython is a static compiler capable of working in a super-set of the Python language that supports C-like static type declarations. It compiles Python code to C, further producing a Python module that can be imported and used from the interpreter. As noted by Behnel et al.), the key idea behind Cython is the Pareto Principle, also known as the "80/20 rule": 80% of the run-time is spent in 20% of the source code. Cython’s goal is to speed up the critical parts of the code while avoiding too much overhead on coding by the programmer.
Other performance issues can be addressed by parallelization. IPython.parallel is a powerful architecture for parallel and distributed computing supporting different styles of parallelism, such as single program, multiple data (SPMD) and multiple program, multiple data (MPMD). Parallel applications can be easily developed, executed and monitored interactively from the IPython shell. Computer vision tasks can involve large sets of images or big point clouds, but many times the parallelization of these tasks is trivial and, using IPython.parallel, implemented in a few lines of code. The dynamic load balancing feature allows the use of all the available processing threads in the computer or all the processing power available in a cluster, but keeping the interactive computing environment free from large amounts of specific code for parallel computing.
In this example, SIFT descriptors of a reference image $I_1$ are computed. Then, descriptors are extracted for every image $I_n$ in a list, and the matches to $I_1$ descriptors are computed. The processing of the list is done in parallel, using all the available cores in the user’s machine.
Let $D_1$ be an array containing the descriptors of $I_1$.
T1 = cv2.imread('data/templeRing/templeR0001.png', cv2.IMREAD_GRAYSCALE)
sift = cv2.xfeatures2d.SIFT_create(nfeatures=5000)
_, D_1 = sift.detectAndCompute(T1, mask=None)
In a system shell, an IPython cluster for parallel computing is started using:
ipcluster start --n=8
Eight nodes are started (in this example, the number of clusters is selected based on the number of cores available in the user’s machine).
Note: if ipcluster
is not available, it can be installed using, for example, pip
:
$ pip install ipyparallel
Back to the IPython shell, the next step is the creation of a Client
object. A LoadBalancedView
object is created to provide a load-balanced parallel execution:
from ipyparallel import Client
rc = Client()
lview = rc.load_balanced_view()
@lview.parallel
defines a parallel, load-balanced funtionget_num_matches
will:BFMatcher
andNext, a Python decorator is used to define a parallel function that computes the descriptors and the matches (the decorator starts with a "@
" symbol). The function below takes a path to an image in the file system, computes the SIFT features and uses OpenCV’s BFMatcher
to get the matches to $D_1$, returning the number of matches found and the image’s path:
@lview.parallel()
def get_num_matches(arg):
fname, D_src = arg
import cv2
frame = cv2.imread(fname, cv2.IMREAD_GRAYSCALE)
print frame.shape
sift = cv2.SIFT(nfeatures=5000)
_, D = sift.detectAndCompute(frame, mask=None)
matcher = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
matches = matcher.match(D_src, D)
return fname, len(matches)
map
function starts the parallelized callIPython capability to access the system’s shell is employed to list all the files in a directory and store the file paths in a list of strings, fnames
. Finally, the map
function calls get_num_matches
to every string in the fnames
list, automatically performing the load balance on the nodes:
fnames = !ls data/templeRing/temple*.png
args = [(fname, D_1) for fname in fnames]
async_res = get_num_matches.map(args)
for f, n in async_res:
print f, n
This simple example is able to explore all the available cores in the local machine, just asking for a few extra lines of code. But the parallel computing capabilities in IPython go farbeyond, supporting SPMD and MPMD parallelism and the use of StarCluster for execution in Amazon’s Elastic Compute Cloud (EC$_2$). The interested reader is referred to the section Using IPython for parallel computing in the IPython documentation.