DEEP LEARNING SERIES- PART 9

This article is about one of the pre-trained CNN models known as the VGG-16. The process of using a pretrained CNN is known as transfer learning. In this case, we need not build a CNN instead we can use this with a modification. The modifications are:-

  • Removing the top (input) and bottom (output) layers
  • Adding input layer with size equal to the dimension of the image
  • Adding output layer with size equal to number of classes
  • Adding additional layers (if needed)

The pre-trained model explained in this article is called the VGGNet. This model was developed by the Oxford University researchers as a solution to the ImageNet task. The ImageNet data consists of 10 classes with 1000 images each leading to 10000 images in total.

VGGNet

I/p 1     2   3     4     5        6       7         8      9          10     11            12       13   o/p

Credit: – Nshafiei neural network in Machine learning  Creative Commons Attribution-ShareAlike 4.0 License.

This is the architecture for VGGNet. This has been found for the CIFAR-10 dataset, a standard dataset containing 1000 classes. This was used for multiclass classification. Some modifications are made before using it for detecting OA. The output dimension is changed into 1*1*2 and the given images must be reshaped to 224*224 since this dimension is compatible with VGGNet. The dimensions and other terms like padding, stride, number of filters, dimension of filter are chosen by researchers and found optimal. In general, any number can be used in this place.

The numbers given below the figure correspond to the layer number. So the VGGNet is 13 layered and is CNN till layer 10 and the rest are FNN.

Colour indexName
GreyConvolution
RedPooling
BlueFFN

Computations and parameters for each layer

Input

224*224 images are converted into a vector whose dimension is 224*224*3 based on the RGB value.

Layer 1-C1

This is the first convolutional layer. Here 64 filters are used.

Wi =224, P=1, S=1, K=64, f=3*3

Wo =224 (this is the input Wi for the next layer)

Dim= 224*224*64

Parameter= 64*3*3= 576

Layer 2-P1

This is the first pooling layer

 Wi =224, S=2, P=1, f=3

Wo=112 (this is the input Wi for the next layer)

Dim= 112*112*3

Parameter= 0

Layer 3-C2C3

Here two convolutions are applied. 128 filters are used.

Wi =112, P=1, S=1, K=64, f=3

Wo=112 (this is the input Wi for the next layer)

Dim= 112*112*128

Parameter= 128*3*3=1152

Layer 4- P2

Second pooling layer

Wi =112, P=1, S=2, f=3*3

Wo =56 (this is the input Wi for the next layer)

Dim= 56*56*3

Parameter= 0

Layer 5- C4C5C6

Combination of three convolutions

Wi =56, P=1, S=1, K=256, f=3*3

Wo = 56 (this is the input Wi for the next layer)

Dim= 224*224*64

Parameter= 64*3*3= 576

Layer 6-P3

Third pooling layer

Wi =56, P=1, S=2, f=3*3

Wo =28 (this is the input Wi for the next layer)

Dim= 28*28*3

Parameter= 0

Layer 7-C7C8C9

Combination of three convolutions

Wi =28, P=1, S=1, K=512, f=3*3

Wo =28 (this is the input Wi for the next layer)

Dim= 28*28*512

Parameter= 512*3*3= 4608

Layer 8-P4

Fourth pooling layer

Wi =28, P=1, S=2, f=3*3

Wo =14 (this is the input Wi for the next layer)

Dim= 14*14*3

Parameter= 0

Layer 9-C10C11C12

Last convolution layer, Combination of three convolutions

Wi =14, P=1, S=1, K=512, f=3*3

Wo =14 (this is the input Wi for the next layer)

Dim= 14*14*512

Parameter= 512*3*3= 4608

Layer 10-P5

Last pooling layer and last layer in CNN

Wi =14, P=1, S=2, f=3*3

Wo =7 (this is the input Wi for the next layer)

Dim= 7*7*3

Parameter= 512*3*3= 4608

With here the CNN gets over. So a complex 224*224*3 boil down to 7*7*3

Trends in CNN

As the layer number increases,

  1. The dimension decreases.
  2. The filter number increases.
  3. Filter dimension is constant.

In convolution

Padding of 1 and stride of 1 to transfer original dimensions to output

In pooling

Padding of 1 and stride of 2 are used in order to half the dimensions.

Layer 11- FF1

4096 neurons

Parameter= 512*7*7*4096=102M

Wo= 4096

Layer 12- FF2

4096 neurons

Wo= 4096

Parameter= 4096*4096= 16M

Output layer

2 classes

  • non-osteoarthritic
  • osteoarthritic

Parameter= 4096*2= 8192

Parameters

LayerValue of parameters
Convolution16M
FF1102M
FF216M
Total134M

It takes a very large amount of time nearly hours for a machine on CPU to learn all the parameters. Hence they came with speed enhancers like faster processors known as GPU Graphic Processing Unit which may finish the work up to 85% faster than CPU.

HAPPY LEARNING!!