#!/usr/bin/env python
# coding: utf-8

# In[1]:


import cv2


# # Performance, HPC and IPython Parallel

# ## Performance issues
# 
# - Python looping in big arrays can be **slow**
# - However
#     - OpenCV operations are efficient machine code
#     - NumPy operations on arrays are efficient machine code
#     - SciPy Stack relies in C, C++ and Fortran implementations for numerical software

# Low-level code written in Python, as looping in big arrays, can be slow, mainly because Python is dynamically typed and interpreted. However, in the scientific computing environment described above, this is rarely a problem: the OpenCV interface just access optimized C/C++ code, and most of the software in the SciPy Stack relies in a base of numerical software implemented in C, C++ and Fortran, including the efficient NumPy arrays.

# ### Cython
# 
# - Cython is a **static compiler**
# - It works on a **super-set of the Python language** that supports **C-like static type declarations**
# - Compiles Python code to C
#     - Produces a module that can be imported by the Python interpreter
# - Useful to
#     - speed-up low-level looping in arrays;
#     - access external C/C++ libraries
# - **Pareto Principle**
#     - 80% of the run-time is spent in 20% of the source code

# But in the few situations where a low-level looping must be implemented (if the task
# cannot be implemented using NumPy capabilities) or the functionality of a external library is needed, [Cython](http://cython.org) raises as an alternative. Cython is a static compiler capable of working in a super-set of the Python language that supports C-like static type declarations. It compiles Python code to C, further producing a Python module that can be imported and used from the interpreter. As noted by [Behnel *et al.*]((http://dx.doi.org/10.1109/MCSE.2010.118)), the key idea behind Cython is the **Pareto Principle**, also known as the "80/20 rule": 80% of the run-time is spent in 20% of the source code. Cython’s goal is to speed up the critical parts of the code while avoiding too much overhead on coding by the programmer.

# ## IPython.parallel
# 
# - Other performance issues can be addressed by **parallelization**
# - IPython.parallel allows parallel and distributed computing
#     - Single Program, Multiple Data (SPMD) 
#     - Multiple Program, Multiple Data (MPMD). 
# - Parallel applications can be developed, executed and monitored from the IPython shell

# - Computer vision tasks can involve large sets of images or big point clouds
# - However, the parallelization of these tasks is trivial 
# - In IPython.parallel, those tasks can be implemented in a few lines of code

# Other performance issues can be addressed by parallelization. IPython.parallel is a powerful architecture for parallel and distributed computing supporting different styles of parallelism, such as single program, multiple data (SPMD) and multiple program, multiple data (MPMD). Parallel applications can be easily developed, executed and monitored interactively from the IPython shell. Computer vision tasks can involve large sets of images or big point clouds, but many times the parallelization of these tasks is trivial and, using IPython.parallel, implemented in a few lines of code. The dynamic load balancing feature allows the use of all the available processing threads in the computer or all the processing power available in a cluster, but keeping the interactive computing environment free from large amounts of specific code for parallel computing.

# ## Example 9 - Process a bundle of images in parallel
# 
# In this example, SIFT descriptors of a reference image $I_1$ are computed. Then, descriptors are extracted for every image $I_n$ in a list, and the matches to $I_1$ descriptors are computed. The processing of the list is done in parallel, using all the available cores in the user’s machine.

# Let $D_1$ be an array containing the descriptors of $I_1$.

# In[4]:


T1 = cv2.imread('data/templeRing/templeR0001.png', cv2.IMREAD_GRAYSCALE)
sift = cv2.xfeatures2d.SIFT_create(nfeatures=5000)
_, D_1 = sift.detectAndCompute(T1, mask=None)


# In a system shell, an IPython cluster for parallel computing is started using:
# 
#     ipcluster start --n=8

# Eight *nodes* are started (in this example, the number of clusters is selected based on the number of cores available in the user’s machine). 

# Note: if `ipcluster` is not available, it can be installed using, for example, `pip`:
# 
#     $ pip install ipyparallel

# Back to the IPython shell, the next step is the creation of a `Client` object. A `LoadBalancedView` object is created to provide a load-balanced parallel execution:

# In[6]:


from ipyparallel import Client
rc = Client()
lview = rc.load_balanced_view()


# - The decorator `@lview.parallel` defines a **parallel, load-balanced** funtion
# - The arguments are:
#     - The image file absolute path in the filesystem
#     - The reference descriptor set, $D_1$
# - `get_num_matches` will:
#     - read the image;
#     - compute SIFT features and their descriptors;
#     - perform matching using OpenCV's *brute force matching* `BFMatcher` and
#     - return the number of martches found

# Next, a Python *decorator* is used to define a parallel function that computes the descriptors and the matches (the decorator starts with a "`@`" symbol). The function below takes a path to an image in the file system, computes the SIFT features and uses OpenCV’s `BFMatcher` to get the matches to $D_1$, returning the number of matches found and the image’s path:

# In[7]:


@lview.parallel()
def get_num_matches(arg):    
    fname, D_src = arg
    import cv2
    frame = cv2.imread(fname, cv2.IMREAD_GRAYSCALE)
    print frame.shape
    sift = cv2.SIFT(nfeatures=5000)
    _, D = sift.detectAndCompute(frame, mask=None)
    matcher = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
    matches = matcher.match(D_src, D)
    return fname, len(matches)


# - File paths and $D_1$ are assembled in an **arguments list** 
# - The `map` function starts the parallelized call
# - Load balancing is automatically performed

# IPython capability to access the system’s shell is employed to list all the files in a directory and store the file paths in a list of strings, `fnames`. Finally, the `map` function calls `get_num_matches` to every string in the `fnames` list, automatically performing the load balance on the nodes:

# In[10]:


fnames = get_ipython().getoutput('ls data/templeRing/temple*.png')

args = [(fname, D_1) for fname in fnames]
async_res = get_num_matches.map(args)


# In[33]:


for f, n in async_res:
    print f, n


# This simple example is able to explore all the available cores in the local machine, just asking for a few extra lines of code. But the parallel computing capabilities in IPython go farbeyond, supporting SPMD and MPMD parallelism and the use of StarCluster for execution
# in Amazon’s Elastic Compute Cloud (EC$_2$). The interested reader is referred to the section [Using IPython for parallel computing in the IPython](http://ipyparallel.readthedocs.io/en/latest/) documentation.

# In[ ]: