This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
Let's import PyCUDA.
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np
Now, we initialize the NumPy array that will contain the fractal.
size = 200
iterations = 100
col = np.empty((size, size), dtype=np.int32)
We allocate memory for this array on the GPU.
col_gpu = cuda.mem_alloc(col.nbytes)
We write the CUDA kernel in a string. The mandelbrot function accepts the figure size, the number of iterations, and a pointer to the memory buffer as arguments. It updates the col buffer with the escape value in the fractal for each pixel.
code = """
__global__ void mandelbrot(int size,
int iterations,
int *col)
{
// Get the row and column index of the current thread.
int i = blockIdx.y * blockDim.y + threadIdx.y;
int j = blockIdx.x * blockDim.x + threadIdx.x;
int index = i * size + j;
// Declare and initialize the variables.
double cx, cy;
double z0, z1, z0_tmp, z0_2, z1_2;
cx = -2.0 + (double)j / size * 3;
cy = -1.5 + (double)i / size * 3;
// Main loop.
z0 = z1 = 0.0;
for (int n = 0; n < iterations; n++)
{
z0_2 = z0 * z0;
z1_2 = z1 * z1;
if (z0_2 + z1_2 <= 100)
{
// Need to update z0 and z1 in parallel.
z0_tmp = z0_2 - z1_2 + cx;
z1 = 2 * z0 * z1 + cy;
z0 = z0_tmp;
col[index] = n;
}
else break;
}
}
"""
Now, we compile the CUDA program.
prg = SourceModule(code)
mandelbrot = prg.get_function("mandelbrot")
We define the block size and the grid size, specifying how the threads will be parallelized with respect to the data.
block_size = 10
block = (block_size, block_size, 1)
grid = (size // block_size, size // block_size, 1)
We call the compiled function.
mandelbrot(np.int32(size), np.int32(iterations), col_gpu,
block=block, grid=grid)
Once the function has completed, we copy the contents of the CUDA buffer back to the NumPy array col.
cuda.memcpy_dtoh(col, col_gpu)
Let's display the fractal.
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(np.log(col), cmap=plt.cm.hot,);
plt.xticks([]);
plt.yticks([]);
Let's evaluate the time taken by this function.
%%timeit col_gpu = cuda.mem_alloc(col.nbytes); cuda.memcpy_htod(col_gpu, col)
mandelbrot(np.int32(size), np.int32(iterations), col_gpu,
block=block, grid=grid)
cuda.memcpy_dtoh(col, col_gpu)
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).