Secrets of Simple Image Filtering
This article will explain the theory behind image filtering. Only a few basic filters are discussed; more complex filters will not be dealt with due to their different nature. We will discuss filter design and cover much of the necessary terminology one might come across when reading related literature. Source code is supplied with the article that implements a greyscale edge detection filter.
1: The Terminology
The filter its properties, characteristics and functionality
First off, what is a filter? A filter is any kind of processing that has one characteristic: it takes a signal as the input and produces a relevant output signal after processing it in some way. A filter can always be mathematically described think of it as a function, for instance:
A filter can take any number of arguments that further define its behaviour in fact, when designing a filter we usually need to give it a few guidelines according to which it will be built. In this article we will only be dealing with filters that are designed using one parameter the dimensions the contents of the filter we will choose carefully ourselves (no mathematical methods are discussed). More on this later.
Notice that we defined the dimensions of a filter as one parameter this hints that most filters have uniform side length. For a 1D filter this will define the length while for 2D filters the dimension will define the length of the sides. In essence one could also create 3D filters, but there aren't that many everyday uses for them.
An example of a 1D filter:
Notice one very important, although subtle, characteristic in figure 1.2 the filter has an odd number of samples. This is a must for filters the sample distribution doesn't have to be symmetrical (although usually it is) across the centre sample, but it must be possible to centre the filter over any sample in an input signal.
An example of a 2D filter:
A 2D filter, too, has to have an odd number of samples in both the vertical as well as the horizontal orientation. The filters in figures 1.2 and 1.3 are not "real" they were created randomly.
The spatial, frequency and time domains
The above terms actually define two different "spaces". The catch here lies that while the time domain only exists for 1D signals (such as an audio signal), the spatial domain describes a 2D signal (an image). Both of these domains can be converted into an alternate domain the frequency domain. The process used to do the conversion is called the Fourier Transform and will not be discussed in this article because it has very little use in image filtering. That is, while most audio filters (high pass, low pass, band pass and others) function in the frequency domain, image information is encoded mostly in the spatial domain, requiring no conversion at all. The moral here if you look at an image, think of the spatial domain; in the case of an audio signal think of the time domain and when looking at a frequency spectrum (as in WinAMP), think of the frequency domain. For now, let's forget all about everything but the spatial domain.
The kernel and the PSF
Again, these two terms mean the exact same thing with the difference that a kernel generally denotes the use of a 1D filter while the PSF, which stands for point spread function, denotes the use of a 2D filter. There is yet another name that is used: the impulse response, which is arguably the most relevant for both cases.
Looking back at figures 1.2 and 1.3, one can see that the explanations below them are flawed. The figures do not show what a filter looks like, but rather the impulse responses of the respective 1D and 2D filters. We will be using all of the terms interchangeably for all filter types.
Convolution as a mathematical operation
Applying a simple filter to an input signal requires some math to be done. The method used for kernel-type filters is called convolution. It is a ternary mathematical operation (taking two arguments and producing one output value) just as addition or multiplication. Compared to the two, however, it is more like multiplication due to one very important characteristic in signal processing: linearity.
There are two types of systems that surround us: a system can either be linear or non-linear. Most real life systems (such as temperature changes) can be characterized as non-linear. This means that there is a lot of uncertainty involved. Mathematics introduces some very well known non-linear operations and functions: integration, division, multiplication, taking the logarithm, etc. This means that it is somewhat challenging to know the source data by analyzing the output data. Fortunately mathematics also introduces inverse functions and operations to most of the commonly used ones: division for multiplication, logarithm for exponentiation, etc.
Other types of systems are linear systems that we can and most of the time do know a great deal about. It isn't uncommon that a linear system is perfectly defined that is, we know everything about it. Two linear mathematical operations are addition and subtraction.
Convolution belongs to the non-linear class that means that once a signal is convolved, it isn't at all that trivial to reproduce the original signal. Although the inverse of convolution, deconvolution, exists, it is not discussed in this article.
In mathematics, convolution is generally denoted with the asterisk (*). This notation is what we will be using so do not confuse it with multiplication! Again, mathematically speaking, here's how the whole thing works:
Let's explain the above math for two-dimensional signals. First off, we need to view both the input signal and the output signal as well as the filter's impulse response as two dimensional matrices that can be accessed through respective coordinates n and m, i and j. To produce an output sample y[n, m], we multiply a sample and its surrounding samples in the input signal x with corresponding samples in the filter's impulse response h and add the products. Visually, this means that:
By the way, figure 1.4 uses the actual PSF used for edge detection from the code example provided with this article therefore it is an actually used impulse response.
Two very important things can be seen in figure 4. Firstly, convolution with a filter kernel that includes negative values, may produce negative output values. For instance, consider the following:
The product of the above convolution for c = -1/8 is , which brings us to the question: what should we do if we encounter negative values? There are two feasible solutions to solve this problem: we could simply discard any values below 0 or we could include a correction factor and shift all of the values in the output signal by some constant for instance, this could be a constant value added to all of the samples in the output signal.
The other important thing to note is the change in data types. Namely, the edge detection used here introduces floating point values which generally also make the output sample a floating point value. The best solution to remedy this is rounding we simply crop the sample values in the output signal that have a nonzero fractional part. It's as simple as that.
Yet another thing worth notice is the fact that convolution is commutative (it doesn't matter if you convolve the input sample with the filter kernel or vice versa the outcome is the same), associative (if you have three signals, it doesn't matter which two you convolve first) as well as distributive ().
Convolving a 1D signal is almost identical to convolving a 2D signal (just lose the extra dimension).
We will next look at how to put all of the above together to produce a functioning filter, what problems it will introduce and how to battle them.