DEEP LEARNING SERIES- PART 6

The previous article was about the procedure to develop a deep learning network and introduction to CNN. This article concentrates on the process of convolution which is the process of taking in two images and doing a transformation to produce an output image. This process is common in mathematics and signals analysis also. The CNN’s are mainly used to work with images.

In the CNN partial connection is observed. Hence all the neurons are not connected to those in the next layer. So the number of parameters reduces leading to lesser computations.

Sample connection is seen in CNN.

Convolution in mathematics refers to the process of combining two different functions. With respect to CNN, convolution occurs between the image and the filter or kernel. Convolution itself is one of the processes done on the image.

Here also the operation is mathematical. It is a kind of operation on two vectors. The input image gets converted into a vector-based on color and dimension. The kernel or filter is a predefined vector with fixed values to perform various functions onto the image.

Process of convolution

The kernel or filter is chosen in order of 1*1, 3*3, 5*5, 7*7, and so on. The given filter vector slides over the image and performs dot product over the image vector and produces an output vector with the result of each 3*3 dot product over the 7*7 vector.

A 3*3 kernel slides over the 7*7 input vector to produce a 5*5 output image vector. The reason for the reduction in the dimension is that the kernel has to do dot product operation on the input vector-only with the same dimension. I.e. the kernel slides for every three rows in the seven rows. The kernel must perfectly fit into the input vector. All the cells in the kernel must superimpose onto the vector. No cells must be left open. There are only 5 ways to keep a 3-row filter in a 7-row vector.    

This pictorial representation can help to understand even better. These colors might seem confusing, but follow these steps to analyze them.

  1. View at the first row.
  2. Analyse and number the different colours used in that row
  3. Each colour represents a 3*3 kernel.
  4. In the first row the different colours are red, orange, light green, dark green and blue.
  5. They count up to five.
  6. Hence there are five ways to keep a 3 row filter over a 7 row vector.
  7. Repeat this analysis for all rows
  8. 35 different colours will be used. The math is that in each row there will be 5 combinations. For 7 rows there will be 35 combinations.
  9. The colour does not go beyond the 7 rows signifying that kernel cannot go beyond the dimension of input vector.

These are the 35 different ways to keep a 3*3 filter over a 7*7 image vector. From this diagram, we can analyse each row has five different colours. All the nine cells in the kernel must fit inside the vector. This is the reason for the reduction in the dimension of output vector.

Procedure to implement convolution

  1. Take the input image with given dimensions.
  2. Flatten it into 1-D vector. This is the input vector whose values represent the colour of a pixel in the image.
  3. Decide the dimension, quantity and values for filter. The value in a filter is based on the function needed like blurring, fadening, sharpening etc. the quantity and dimension is determined by the user.
  4. Take the filter and keep it over the input vector from the first cell. Assume a 3*3 filter kept over a 7*7 vector.
  5. Perform the following computations on them.

5a. take the values in the first cell of the filter and the vector.

5b. multiply them.

5c. take the values in the second cell of the filter and the vector.

5d. multiply them.

5e. repeat the procedure till the last cell.

5f. take the sum for all the nine values.

  • Place this value in the output vector.
  • Using the formula mentioned later, find the dimensions of the output vector.

HAPPY LEARNING!!