# Understanding Kernel Convolution (Part 2)

Wed Oct 05 2016

In my previous post, I included a simple kernel convolution sandbox that uses VisionJS (opens new window) to generate kernels and apply them to the classic Lenna test image.

Note: this is done on the CPU of your machine; it does not use server-side processing. At some point, I may do a WebGL GPU implementation with a speed comparison.

The mathematical definition of a convolution is f(t)g(t)=f(τ)g(tτ)dτf(t) * g(t) = \int_{-\infty}^{\infty}f(\tau)g(t-\tau)d\tau. This is a formal way of describing sliding one signal across another and computing their overlapping area at each position. If this isn't clear, Wolfram MathWorld has a terrific article on convolution (opens new window) that is probably worth the time to look at.

So, how do we apply an abstract mathematical concept to something like an image? First we need to put the image in terms that can be used for computation. A very common way of storing image data is in a 24-bit RGB format. That means that we have split up a composite image into its red, green, and blue light components. When combined, we get the whole image.

lenna-split

Each component is stored as 8 bits, or [0-255]. For computation, we can treat the image like a set of three separate m x n matrices all with values on the range [0-255]. We'll refer to these separate matrices as channels.

The convolution definition above describes the theoretical process for one-dimensional continous signals. In the case of image processing, our input data is two-dimensional and discrete. Because of this, it may be helpful to think of convolution as a way of relating pixels spatially with their neighbors. Let K be our convolution kernel, a matrix of size 3 x 3. At each pixel, we take the 3 x 3 of pixels on each channel around our selection, I(x,y)I(x,y) where I(x,y)={IR(x,y),IG(x,y),IB(x,y)}I(x,y) = \{I_R(x,y), I_G(x,y), I_B(x,y)\}.

kernel-convolution

Each pixel value is multiplied by its corresponding kernel value, and the sum is stored on the selected pixel. The kernel values map to mirrored pixel values. (e.g. K(0,0)K(0,0) gets multiplied by I(x,y)(3,3)I(x,y)(3,3).

This operation can be written mathematically as I(x,y)K=i=03j=03[I(x,y)(i,j)K(2i)(2j)]I(x,y) * K = \sum_{i=0}^3 \sum_{j=0}^3 [ I(x,y)(i,j) \cdot K(2-i)(2-j) ].