DEEP LEARNING SERIES- PART 10

This is the last article in this series. This article is about another pre-trained CNN known as the ResNet along with an output visualization parameter known as the confusion matrix.

ResNet

This is also known as a residual network. It has three variations 51,101,151. They used a simple technique to achieve this high number of layers.

Credit – Xiaozhu0429/ Wikimedia Commons / CC-BY-SA-4.0

The problem in using many layers is that the input information gets changed in accordance with each layer and subsequently, the information will become completely morphed. So to prevent this, the input information is sent in again like a recurrent for every two steps so that the layers don’t forget the original information. Using this simple technique they achieved about 100+ layers.

ResNet these are the three fundamentals used throughout the network.

  (conv1): Conv2d (3, 64, kernel_size= (7, 7), stride= (2, 2), padding= (3, 3))

  (relu): ReLU

  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1)

These are the layers found within a single bottleneck of the ResNet.

    (0): Bottleneck

  1    (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))

  2    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))     

  3    (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))    

      (relu): ReLU(inplace=True)

   Down sampling   

   Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))

    (1): Bottleneck

  4    (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))

  5    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))     

  6   (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))     

      (relu): ReLU(inplace=True)

    )

    (2): Bottleneck

  7    (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))

  8    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

  9   (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))

   (relu): ReLU

There are many bottlenecks like these throughout the network. Hence by this, the ResNet is able to perform well and produce good accuracy. As a matter of fact, the ResNet is the model which won the ImageNet task competition.

There are 4 layers in this architecture. Each layer has a bottleneck which comprises convolution followed by relu activation function. There are 46 convolutions, 2 pooling, 2 FC layers.

TypeNo of layers
7*7 convolution1
1*1, k=64 + 3*3, k=64+1*1, k=256 convolution9
1*1, k=128+ 3*3, k=128+1*1, k=512  convolution10
1*1, k=256+ 3*3, k=256 + 1*1, k=1024 convolution16
1 * 1, k=512+3 * 3, k=512+1 * 1, k=2048 convolution9
Pooling and FC4
Total50

There is a particular aspect apart from the accuracy which is used to evaluate the model, especially in research papers. That method is known as the confusion matrix. It is seen in a lot of places and in the medical field it can be seen in test results. The terms used in the confusion matrix have become popularized in the anti-PCR test for COVID.

The four terms used in a confusion matrix are True Positive, True Negative, and False positive, and false negative. This is known as the confusion matrix.

True positive- both the truth and prediction are positive

True negative- both the truth and prediction are negative

False-positive- the truth is negative but the prediction is positive

False-negative- the truth is positive but the prediction is false

Out of these the false positive is dangerous and has to be ensured that this value is minimal.

We have now come to the end of the series. Hope that you have got some knowledge in this field of science. Deep learning is a very interesting field since we can do a variety of projects using the artificial brain which we have with ourselves. Also, the technology present nowadays makes these implementations so easy. So I recommend all to study and do projects using these concepts. Till then,

HAPPY LEARNING!!!

DEEP LEARNING SERIES- PART 9

This article is about one of the pre-trained CNN models known as the VGG-16. The process of using a pretrained CNN is known as transfer learning. In this case, we need not build a CNN instead we can use this with a modification. The modifications are:-

  • Removing the top (input) and bottom (output) layers
  • Adding input layer with size equal to the dimension of the image
  • Adding output layer with size equal to number of classes
  • Adding additional layers (if needed)

The pre-trained model explained in this article is called the VGGNet. This model was developed by the Oxford University researchers as a solution to the ImageNet task. The ImageNet data consists of 10 classes with 1000 images each leading to 10000 images in total.

VGGNet

I/p 1     2   3     4     5        6       7         8      9          10     11            12       13   o/p

Credit: – Nshafiei neural network in Machine learning  Creative Commons Attribution-ShareAlike 4.0 License.

This is the architecture for VGGNet. This has been found for the CIFAR-10 dataset, a standard dataset containing 1000 classes. This was used for multiclass classification. Some modifications are made before using it for detecting OA. The output dimension is changed into 1*1*2 and the given images must be reshaped to 224*224 since this dimension is compatible with VGGNet. The dimensions and other terms like padding, stride, number of filters, dimension of filter are chosen by researchers and found optimal. In general, any number can be used in this place.

The numbers given below the figure correspond to the layer number. So the VGGNet is 13 layered and is CNN till layer 10 and the rest are FNN.

Colour indexName
GreyConvolution
RedPooling
BlueFFN

Computations and parameters for each layer

Input

224*224 images are converted into a vector whose dimension is 224*224*3 based on the RGB value.

Layer 1-C1

This is the first convolutional layer. Here 64 filters are used.

Wi =224, P=1, S=1, K=64, f=3*3

Wo =224 (this is the input Wi for the next layer)

Dim= 224*224*64

Parameter= 64*3*3= 576

Layer 2-P1

This is the first pooling layer

 Wi =224, S=2, P=1, f=3

Wo=112 (this is the input Wi for the next layer)

Dim= 112*112*3

Parameter= 0

Layer 3-C2C3

Here two convolutions are applied. 128 filters are used.

Wi =112, P=1, S=1, K=64, f=3

Wo=112 (this is the input Wi for the next layer)

Dim= 112*112*128

Parameter= 128*3*3=1152

Layer 4- P2

Second pooling layer

Wi =112, P=1, S=2, f=3*3

Wo =56 (this is the input Wi for the next layer)

Dim= 56*56*3

Parameter= 0

Layer 5- C4C5C6

Combination of three convolutions

Wi =56, P=1, S=1, K=256, f=3*3

Wo = 56 (this is the input Wi for the next layer)

Dim= 224*224*64

Parameter= 64*3*3= 576

Layer 6-P3

Third pooling layer

Wi =56, P=1, S=2, f=3*3

Wo =28 (this is the input Wi for the next layer)

Dim= 28*28*3

Parameter= 0

Layer 7-C7C8C9

Combination of three convolutions

Wi =28, P=1, S=1, K=512, f=3*3

Wo =28 (this is the input Wi for the next layer)

Dim= 28*28*512

Parameter= 512*3*3= 4608

Layer 8-P4

Fourth pooling layer

Wi =28, P=1, S=2, f=3*3

Wo =14 (this is the input Wi for the next layer)

Dim= 14*14*3

Parameter= 0

Layer 9-C10C11C12

Last convolution layer, Combination of three convolutions

Wi =14, P=1, S=1, K=512, f=3*3

Wo =14 (this is the input Wi for the next layer)

Dim= 14*14*512

Parameter= 512*3*3= 4608

Layer 10-P5

Last pooling layer and last layer in CNN

Wi =14, P=1, S=2, f=3*3

Wo =7 (this is the input Wi for the next layer)

Dim= 7*7*3

Parameter= 512*3*3= 4608

With here the CNN gets over. So a complex 224*224*3 boil down to 7*7*3

Trends in CNN

As the layer number increases,

  1. The dimension decreases.
  2. The filter number increases.
  3. Filter dimension is constant.

In convolution

Padding of 1 and stride of 1 to transfer original dimensions to output

In pooling

Padding of 1 and stride of 2 are used in order to half the dimensions.

Layer 11- FF1

4096 neurons

Parameter= 512*7*7*4096=102M

Wo= 4096

Layer 12- FF2

4096 neurons

Wo= 4096

Parameter= 4096*4096= 16M

Output layer

2 classes

  • non-osteoarthritic
  • osteoarthritic

Parameter= 4096*2= 8192

Parameters

LayerValue of parameters
Convolution16M
FF1102M
FF216M
Total134M

It takes a very large amount of time nearly hours for a machine on CPU to learn all the parameters. Hence they came with speed enhancers like faster processors known as GPU Graphic Processing Unit which may finish the work up to 85% faster than CPU.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 8

This image has an empty alt attribute; its file name is deep-learning-logo-picture-id871793108

The previous article was about the padding, stride, and parameters of CNN. This article is about the pooling and the procedure to build an image classifier.

Pooling

This is another aspect of CNN. There are different types of pooling like min pooling, max pooling, avg pooling, etc. the process is the same as before i.e. the kernel vector slides over the input vector and does computations on the dot product. If a 3*3 kernel is considered then it is applied over a 3*3 region inside the vector, it finds the dot product in the case of convolution. The same in pooling finds a particular value and substitutes that value in the output vector. The kernel value decides the type of pooling. The following table shows the operation done by the pooling.

Type of poolingThe value seen in the output layer
Max poolingMaximum of all considered cells
Min poolingMinimum of all considered cells
Avg poolingAverage of all considered cells



The considered cells are bounded within the kernel dimensions.

This image has an empty alt attribute; its file name is image-8.png

The pictorial representation of average pooling is shown above. The number of parameters in pooling is zero.

Convolution and pooling are the basis for feature extraction. The vector obtained from this step is fed into an FFN which then does the required task on the image.

Features of CNN

  1. Sparse connectivity
  2. Weight sharing.



This image has an empty alt attribute; its file name is image-9.png

    

    Feature extraction-CNN              classifier-FNN

In general, CNN is first then FFN is later. But the order or number or types of convolution and pooling can vary based on the complexity and choice of the user.

Already there are a lot of models like VGGNet, AlexNet, GoogleNet, and ResNet. These models are made standard and their architecture has been already defined by researchers. We have to reshape our images in accordance with the dimensions of the model.

General procedure to build an image classifier using CNN

  1. Obtain the data in the form of image datasets.
  2. Set the output classes for the model to perform the classification on.
  3. Transform or in specific reshape the dimension of the images compatible to the model. The image size maybe 20*20 but the model accepts only 200*200 images; then we must reshape them to that size.
  4. Split the given data into training data and evaluation data. This is done by creating new datasets for both training and validation. More images are required for training.
  5. Define the model used for this task.
  6. Roughly sketch the architecture of the network.
  7. Determine the number of convolutions, pooling etc. and their order
  8. Determine the dimensions for the first layer, padding, stride, number of filters and dimensions of filter.
  9. Apply the formula and find the output dimensions for the next layer.
  10. Repeat 5d till the last layer in CNN.
  11. Determine the number of layers and number of neurons per layer and parameters in FNN.
  12. Sketch the architecture with the parameters and dimension.
  13. Incorporate these details into the machine.
  14. Or import a predefined model.  In that case the classes in the last layer in the FNN must be replaced with ‘1’ for binary classification or with the number of classes. This is known as transfer learning.
  15. Train the model using the training dataset and calculate the loss function for periodic steps in the training.
  16. Check if the machine has performed correctly by comparing the true output with model prediction and hence compute the training accuracy.
  17. Test the machine with the evaluation data and verify the performance on that data and compute the validation accuracy.
  18.   If both the accuracies are satisfactory then the machine is complete.

HAPPY LEARNING!!



DEEP LEARNING SERIES- PART 7

The previous article was about the process of convolution and its implementation. This article is about the padding, stride and the parameters involved in a CNN.

We have seen that there is a reduction of dimension in the output vector. A technique known as padding is done to preserve the original dimensions in the output vector. The only change in this process is that we add a boundary of ‘0s’ over the input vector and then do the convolution process.

Procedure to implement padding

  1. To get n*n output use a (n+2*n+2) input
  2. To get 7*7 output use 9*9 input
  3. In that 9*9 input fill the first row, first column, last row and last column with zero.
  4. Now do the convolution operation on it using a filter.
  5. Observe that the output has the same dimensions as of the input.

Zero is used since it is insignificant so as to keep the output dimension without affecting the results

Here all the elements in the input vector have been transferred to the output. Hence using padding we can preserve the originality of the input. Padding is denoted using P. If P=1 then one layer of zeroes is added and so on.

It is not necessary that the filter or kernel must be applied to all the cells. The pattern of applying the kernel onto the input vector is determined using the stride. It determines the shift or gaps in the cells where the filter has to be applied.-

S=1 means no gap is created. The filter is applied to all the cells.

S=2 means gap of 1. The filter is applied to alternative cells. This halves the dimensions on the output vector.

This diagram shows the movement of filter on a vector with stride of 1 and 2. With a stride of 2; alternative columns are accessed and hence the number of computations per row decreases by 2. Hence the output dimensions reduce while use stride.

The padding and stride are some features used in CNN.

Parameters in a convolution layer

The following are the terms needed for calculating the parameter for a convolution layer.

Input layer

Width Wi – width of input image

Height Hi – height of input image

Depth Di – 3 since they follow RGB

We saw that 7*7 inputs without padding and stride along with 3*3 kernels gave a 5*5 output. It can be verified using this calculation.

The role of padding can also be verified using this calculation.

The f is known as filter size. It can be a 1*1, 3*3 and so on. It is a 1-D value so the first value is taken. There is another term K which refers to the number of kernels used. This value is fixed by user.

These values are similar to those of w and b. The machine learns the ideal value for these parameters for high efficiency. The significance of partial connection or CNN can be easily understood through the parameters.

Consider the same example of (30*30*3) vector. The parameter for CNN by using 10 kernels will be 2.7 million. This is a large number. But if the same is done using FNN then the parameters will be at least 100 million. This is almost 50 times that of before. This is significantly larger than CNN. The reason for this large number is due to the full connectivity. 

                                                 

Parameter= 30*30*3*3*10= 2.7M

HAPPY READING!!

DEEP LEARNING SERIES- PART 6

The previous article was about the procedure to develop a deep learning network and introduction to CNN. This article concentrates on the process of convolution which is the process of taking in two images and doing a transformation to produce an output image. This process is common in mathematics and signals analysis also. The CNN’s are mainly used to work with images.

In the CNN partial connection is observed. Hence all the neurons are not connected to those in the next layer. So the number of parameters reduces leading to lesser computations.

Sample connection is seen in CNN.

Convolution in mathematics refers to the process of combining two different functions. With respect to CNN, convolution occurs between the image and the filter or kernel. Convolution itself is one of the processes done on the image.

Here also the operation is mathematical. It is a kind of operation on two vectors. The input image gets converted into a vector-based on color and dimension. The kernel or filter is a predefined vector with fixed values to perform various functions onto the image.

Process of convolution

The kernel or filter is chosen in order of 1*1, 3*3, 5*5, 7*7, and so on. The given filter vector slides over the image and performs dot product over the image vector and produces an output vector with the result of each 3*3 dot product over the 7*7 vector.

A 3*3 kernel slides over the 7*7 input vector to produce a 5*5 output image vector. The reason for the reduction in the dimension is that the kernel has to do dot product operation on the input vector-only with the same dimension. I.e. the kernel slides for every three rows in the seven rows. The kernel must perfectly fit into the input vector. All the cells in the kernel must superimpose onto the vector. No cells must be left open. There are only 5 ways to keep a 3-row filter in a 7-row vector.    

This pictorial representation can help to understand even better. These colors might seem confusing, but follow these steps to analyze them.

  1. View at the first row.
  2. Analyse and number the different colours used in that row
  3. Each colour represents a 3*3 kernel.
  4. In the first row the different colours are red, orange, light green, dark green and blue.
  5. They count up to five.
  6. Hence there are five ways to keep a 3 row filter over a 7 row vector.
  7. Repeat this analysis for all rows
  8. 35 different colours will be used. The math is that in each row there will be 5 combinations. For 7 rows there will be 35 combinations.
  9. The colour does not go beyond the 7 rows signifying that kernel cannot go beyond the dimension of input vector.

These are the 35 different ways to keep a 3*3 filter over a 7*7 image vector. From this diagram, we can analyse each row has five different colours. All the nine cells in the kernel must fit inside the vector. This is the reason for the reduction in the dimension of output vector.

Procedure to implement convolution

  1. Take the input image with given dimensions.
  2. Flatten it into 1-D vector. This is the input vector whose values represent the colour of a pixel in the image.
  3. Decide the dimension, quantity and values for filter. The value in a filter is based on the function needed like blurring, fadening, sharpening etc. the quantity and dimension is determined by the user.
  4. Take the filter and keep it over the input vector from the first cell. Assume a 3*3 filter kept over a 7*7 vector.
  5. Perform the following computations on them.

5a. take the values in the first cell of the filter and the vector.

5b. multiply them.

5c. take the values in the second cell of the filter and the vector.

5d. multiply them.

5e. repeat the procedure till the last cell.

5f. take the sum for all the nine values.

  • Place this value in the output vector.
  • Using the formula mentioned later, find the dimensions of the output vector.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 5

The previous article was on algorithm and hyper-parameter tuning. This article is about the general steps for building a deep learning model and also the steps to improve its accuracy along with the second type of network known as CNN.

General procedure to build an AI machine

  1. Obtain the data in the form of excel sheets, csv (comma separated variables) or image datasets.
  2. Perform some pre-processing onto the data like normalisation, binarisation etc. (apply principles of statistics)
  3. Split the given data into training data and testing data. Give more preference to training data since more training can give better accuracy. Standard train test split ratio is 75:25.
  4. Define the class for the model. Class includes the initialisation, network architecture, regularisation, activation functions, loss function, learning algorithm and prediction.
  5. Plot the loss function and interpret the results.
  6. Compute the accuracy for both training and testing data and check onto the steps to improve it.

Steps to improve the accuracy

  1. Increase the training and testing data. More data can increase the accuracy since the machine learns better.
  2. Reduce the learning rate. High learning rate often affects the loss plot and accuracy.
  3. Increase the number of iterations (epochs). Training for more epochs can increase the accuracy
  4. Hyper parameter tuning. One of the efficient methods to improve the accuracy.
  5. Pre-processing of data. It becomes hard for the machine to work on data with different ranges. Hence it is recommended to standardise the data within a range of 0 to 1 for easy working.

These are some of the processes used to construct a network. Only basics have been provided on the concepts and it is recommended to learn more about these concepts. 

Implementation of FFN in detecting OSTEOARTHRITIS (OA)

Advancements in the detection of OA have occurred through AI. Technology has developed where machines are created to detect OA using the X-ray images from the patient. Since the input given is in the form of images, optimum performance can be obtained using CNN’s. Since the output is binary, the task is binary classification. A combination of CNN and FFN is used. CNN handles feature extraction i.e. converting the image into a form that is accepted by the FFN without changing the values. FFN is used to classify the image into two classes.

CNN-convolutional neural network

The convolutional neural network mainly works on image data. It is used for feature extraction from the image. This is a partially connected neural network. Image can be interpreted by us but not by machines. Hence they interpret images as a vector whose values represent the color intensity of the image. Every color can be expressed as a vector of 3-D known as RGB- Red Green Blue. The size of the vector is equal to the dimensions of the image.

                                                  

This type of input is fed into the CNN. There are several processing done to the image before classifying it. The combination of CNN and FNN serves a purpose for image classification.

Problems are seen in using FFN for image

  • We have seen earlier that the gradients are chain rule of gradient at different layers. For image data, large number of layers in order of thousands may require. It can result in millions of parameters. It is very tedious to find the gradient for the millions of these parameters.
  • Using FFN for image data can often overfit the data. This may be due to the large layers and large number of parameters.

The CNN can overcome the problems seen in FFN.

HAPPY LEARNING!!!

DEEP LEARNING SERIES- PART 4

The previous article dealt with the networks and the backpropagation algorithm. This article is about the mathematical implementation of the algorithm in FFN followed by an important concept called hyper-parameter tuning.

In this FFN we apply the backpropagation to find the partial derivative of the loss function with respect to w1 so as to update w1.

Hence using backpropagation the algorithm determines the update required in the parameters so as to match the predicted output with the true output. The algorithm which performs this is known as Vanilla Gradient Descent.

The way of reading the input is determined using the strategy.

StrategyMeaning
StochasticOne by one
BatchSplitting entire input into batches
Mini-batchSplitting batch into batches

The sigmoid here is one of the types of the activation function. It is defined as the function pertaining to the transformation of input to output in a particular neuron. Differentiating the activation function gives the respective terms in the gradients.

There are two common phenomena seen in training networks. They are

  1. Under fitting
  2. Over fitting

If the model is too simple to learn the data then the model can underfit the data. In that case, complex models and algorithms must be used.

If the model is too complex to learn the data then the model can overfit the data. This can be visualized by seeing the differences in the training and testing loss function curves. The method adopted to change this is known as regularisation. Overfit and underfit can be visualized by plotting the graph of testing and training accuracies over the iterations. Perfect fit represents the overlapping of both curves.

Regularisation is the procedure to prevent the overfitting of data. Indirectly, it helps in increasing the accuracy of the model. It is either done by

  1. Adding noises to input to affect and reduce the output.
  2. To find the optimum iterations by early stopping
  3. By normalising the data (applying normal distribution to input)
  4. By forming subsets of a network and training them using dropout.

So far we have seen a lot of examples for a lot of procedures. There will be confusion arising at this point on what combination of items to use in the network for maximum optimization. There is a process known as hyper-parameter tuning. With the help of this, we can find the combination of items for maximum efficiency. The following items can be selected using this method.

  1. Network architecture
  2. Number of layers
  3. Number of neurons in each layer
  4. Learning algorithm
  5. Vanilla Gradient Descent
  6. Momentum based GD
  7. Nesterov accelerated gradient
  8. AdaGrad
  9. RMSProp
  10. Adam
  11. Initialisation
  12. Zero
  13. He
  14. Xavier
  15. Activation functions
  16. Sigmoid
  17. Tanh
  18. Relu
  19. Leaky relu
  20. Softmax
  21. Strategy
  22. Batch
  23. Mini-batch
  24. Stochastic
  25. Regularisation
  26. L2 norm
  27. Early stopping
  28. Addition of noise
  29. Normalisation
  30. Drop-out

 All these six categories are essential in building a network and improving its accuracy. Hyperparameter tuning can be done in two ways

  1. Based on the knowledge of task
  2. Random combination

The first method involves determining the items based on the knowledge of the task to be performed. For example, if classification is considered then

  • Activation function- softmax in o/p and sigmoid for rest
  • Initialisation- zero or Xavier
  • Strategy- stochastic
  • Algorithm- vanilla GD

The second method involves the random combination of these items and finding the best combination for which the loss function is minimum and accuracy is high.

Hyperparameter tuning would already be done by researchers who finally report the correct combination of items for maximum accuracy.

HAPPY READING!!!

DEEP LEARNING SERIES- PART 3

The previous article gave some introduction to the networks used in deep learning. This article provides more information on the different types of neural networks.

In a feed-forward neural network (FFN) all the neurons in one layer are connected to the next layer. The advantage is that all the information processed from the previous neurons is fed to the next layer hence getting clarity in the process. But the number of weights and biases significantly increases when there is a large number of input. This method is best used for text data.

In a convolutional neural network (CNN), some of the neurons are only connected to the next layer i.e. connection is partial. Batch-wise information is fed into the next layer. The advantage is that the number of parameters significantly reduces when compared to FFN. This method is best used for image data since there will be thousands of inputs.

In recurrent neural networks, the output of one neuron is fed back as an input to the neuron in the previous layer. A feed-forward and a feedback connection are established between the neurons. The advantage is that the neuron in the previous layer can perform efficiently and can update based on the output from the next neuron. This concept is similar to reinforcement learning in the brain. The brain learns an action based on punishment or reward given as feedback to the neuron corresponding to that action.

Once the final output is computed by the network, it is then compared with the original value, and their difference is taken in different forms like the difference of squares, etc. this term is known as loss function.

It will be better to explain the role of the learning algorithms here. The learning algorithm is the one that tries to find the relation between the input and output. In the case of neural networks, the output is indirectly related to input since there are some hidden layers in between them. This learning algorithm works in such a way so as to find the optimum w and b values for the loss function is minimum or ideally zero.

The algorithm in neural networks do this using a method called backpropagation. In this method, the algorithm starts tracing from the output. It then computes the values for the parameters corresponding to the neuron in that layer. It then goes back to the previous layer does the computations for the parameters of the neurons in that layer. This procedure is done till it encounters the inputs. In this way, we can find the optimum values for the parameters.

The computations made by the algorithm are based on the type of the algorithm. Most of the algorithms find the derivative of a parameter in one layer with respect to the loss function using backpropagation. This derivative is then subtracted from the original value.

Where lr is the learning rate; provided by the user. The lesser the learning rate, the better will be the results but more the time is taken. The starting value for w and b is determined using the initialization.

MethodMeaning
ZeroW and b are set to zero
Xavierw and b indirectly proportional to root n
He w and b indirectly proportional to root n/2

 Where n; refers to the number of neurons in a layer. These depend on the activation function used.

The derivative of the loss function determines the updating of the parameters.

Value of derivativeConsequence
-veIncreases
0No change
+veDecreases

The derivative of the loss function with respect to the weight or bias in a particular layer can be determined using the chain rule used in calculus.

HAPPY READING!!

DEEP LEARNING- PART 2

This image has an empty alt attribute; its file name is deep-learning-logo-picture-id871793108

The previous article gave a brief introduction to deep learning. This article deals with the networks used in deep learning. This network is known as a neural network. As the name suggests the network is made up of neurons

The networks used in artificial intelligence are a combination of blocks arranged in layers. These blocks are called an artificial neurons. They mimic the properties of a natural neuron. One of the neurons is the sigmoid neuron.

This is in general the formula for the sigmoid function. Every neural network consists of weights and biases.

Weights- The scalar quantities which get multiplied to the input

Biases- the threshold quantity above which a neuron fires

NotationMeaning
XInput
YOutput
WWeight
BBias

Working of a neuron

This is the simple representation of a neuron. This is similar to the biological neuron. In this neuron, the inputs are given along with some priority known as weights. The higher the value of the weights, the more prioritized is that input. This is the reason for our brain to choose one activity over the other. Activity is done only if the neuron fires. A similar situation is seen here. The particular activity is forwarded to the next layer only if this particular neuron fires. That is the output must be produced from the neuron.

Condition for the neuron to fire

The neuron will produce an output only if the inputs follow the condition.

As mentioned before, the bias is the threshold value and the neuron will fire only when the value crosses this bias. Thus the weighted sum for all the inputs must be greater than the bias in order to produce an output.

Classification of networks

Every neural network consists of three layers majorly: –

  1. Input layer
    1. Hidden layer
    1. Output layer

Input layer

The input layer consists of inputs in the form of vectors. Images are converted into 1-D vectors. Input can be of any form like audio, text, video, image, etc. which get converted into vectors.

Hidden layer

This is the layer in which all the computations occur. This is generally not visible to the user hence termed as a hidden layer. This layer may be single or multiple based on the complexity of the task to be performed. Each layer processes a part of the task and it is sent to the next layer. Vectors get multiplied with the weight matrix of correct dimensions and this vector gets passed onto the next layer.

Output layer

The output layer gets information from the last layer of the hidden layer. This is the last stage in the network. This stage depends upon the task given by the user. The output will be a 1-D vector. In the case of classification, the vector will have a value high for a particular class. In the case of regression, the output vector will have numbers representing the answer to those questions posed by the user.

The next article is about the feed-forward neural network.

HAPPY LEARNING!!

DEEP LEARNING SERIES- PART 1

Have you ever wondered how the brain works? One way of understanding it is by cutting open the brain and analyzing the structures present inside it. This however can be done by researchers and doctors. Another method is by using electricity to stimulate several regions of the brain. But what if I say that it is possible to analyze and mimic the brain in our computers? Sounds quite interesting right! This particular technology is known as deep learning.

Deep learning is the technique of producing networks that process unstructured data and gives output. With the help of deep learning, it is possible to produce and use brain-like networks for various tasks in our systems. It is like using the brain without taking it out.  Deep learning is advanced than machine learning and imitates the brain better than machine learning and also the networks built using deep learning consists of parts known as neurons which is similar to biological neurons. Artificial intelligence has attracted researchers in every domain for the past two decades especially in the medical field; AI is used to detect several diseases in healthcare.

Sl.noNameDescriptionExamples
1DataType of data provided to inputBinary(0,1) Real
2TaskThe operation required to do on the inputClassification(binary or multi) Regression(prediction)
3ModelThe mathematical relation between input and output. This varies based on the task and complexityMP neuron(Y=x+b) Perceptron(Y=wx+b) Sigmoid or logistic(Y=1/1+exp(wx+b)) *w and b are parameters corresponding to the model
4Loss functionKind of a compiler that finds errors between the output and input (how much the o/p leads or lags the i/p).Square error= square of the difference between the predicted and actual output.  
5AlgorithmA kind of learning procedure that tries to reduce the error computed beforeGradient descent
NAG
AdaGrad
Adam
RMSProp
6EvaluationFinding how good the model has performedAccuracy
Mean accuracy

Every model in this deep learning can be easily understood through these six domains. Or in other words, these six domains play an important role in the construction of any model. As we require cement, sand, pebbles, and bricks to construct a house we require these six domains to construct a network.

 Now it will be more understandable to tell about the general procedure for networks.

  1. Take in the data (inputs and their corresponding outputs) from the user.
  2. Perform the task as mentioned by the user.
  3. Apply the specific relation to the input to compute the predicted output as declared by the user in the form of model by assigning values to parameters in the model.
  4.  Find the loss the model has made through computing the difference between the predicted and actual output.
  5. Use a suitable learning algorithm so as to minimize the loss by finding the optimum value for parameters in the network
  6. Run the model and evaluate its performance in order to find its efficiency and enhance it if found less.

By following these steps correctly, one can develop their own machine. In order to learn better on this, pursuing AI either through courses or opting as a major is highly recommended. The reason is that understanding those concepts requires various divisions in mathematics like statistics, probability, calculus, vectors and matrices apart from programming. 

       

HAPPY READING!!

Everything you need to know about Artificial Intelligence (AI)

Artificial Intelligence (AI)

AI is well known for its superiority in image and speech recognition, smartphone personal assistants, map navigation, songs, movies or series recommendations, etc. The scope of AI is so much more and expandable that, it can be used in self-driving cars, health care sectors, defense sectors, and financial industries. It is predicted that the AI market will grow to a $190 billion industry by 2025 creating new job opportunities in programming, development, testing, support, and maintenance.

What is AI?

Artificial Intelligence can be described as a set of tools or software that enables a machine to mimic the perception, learning, problem-solving, and decision-making capabilities of the human mind. The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal. The two main subsets of AI are machine learning (the ability of the machine to learn through experience) and deep learning (networks capable of learning unsupervised from data that is unstructured or unlabelled). We have to note here that, deep learning is also a subset of machine learning.

History of AI

In 1943,Warren McCullough and Walter Pitts published “A Logical Calculus of Ideas Immanent in Nervous Activity.” The paper proposed the first mathematic model for building a neural network. Alan Turing published “Computing Machinery and Intelligence”, proposing what is now known as the Turing Test, a method for determining if a machine is intelligent in 1950. A self-learning program to play checkers was developed by Arthur Samuel in 1952. In 1956, the phrase artificial intelligence was coined at the “Dartmouth Summer Research Project on Artificial Intelligence.” In 1963, John McCarthy started the AI Lab at Stanford. There was a competition between Japan and the US in developing a super-computer-like performance and a platform for AI development during 1982-83. In 1997, IBM’s Deep Blue beats world chess champion, Gary Kasparov. In 2005, STANLEY, a self-driving car wins DARPA Grand Challenge. In 2008, Google introduces speech recognition. In 2016, Deepmind’s AlphaGo beats world champion Go player Lee Sedol.

How does AI work?

In 1950, Alan Turning asked, “Can machines think?” The ultimate goal of AI is to answer this very question. In a groundbreaking textbook “Artificial Intelligence: A Modern Approach”, authors Stuart Russell and Peter Norvig approach this question by unifying their work around the theme of intelligent agents in machines. They put forth 4 different approaches: Thinking humanly, Thinking rationally, Acting humanly, Acting rationally.

AI works by combining large amounts of data with fast, iterative processing and intelligent algorithms, allowing the software to learn automatically from patterns or features in the data. AI is a broad field of study that includes many theories, methods, and technologies, as well as the following major subfields:

Stages of AI

There are 3 different stages of AI. The first stage is Artificial Narrow Intelligence (ANI) and as the name suggests, the scope of AI is limited and restricted to only one area. Amazon’s Alexa is one such example. The second stage is Artificial General Intelligence (AGI) which is very advanced. It covers more than one field like the power of reasoning, problem-solving, and abstract thinking. Self-driving cars come under this category. The final stage of AI is Artificial Super Intelligence (ASI) and this AI surpasses human intelligence across all fields.

Examples of AI

  • Smart assistants (like Siri and Alexa)
  • Disease mapping and prediction tools
  • Manufacturing and drone robots
  • Optimized, personalized healthcare treatment recommendations
  • Conversational bots for marketing and customer service
  • Robo-advisors for stock trading
  • Spam filters on email
  • Social media monitoring tools for dangerous content or false news
  • Song or TV show recommendations from Spotify and Netflix

Risk factors of AI

There is always a downside to technology. Though scientists assure that machines may not show any feeling of anger or love, there are many risk factors associated with intelligent machines. The AI is designed in such a manner that it is very difficult to turn off and, in such conditions when in the hands of a wrong person, things could go devastating. AI does the job that it needs to do but it could take dangerous paths to do so. For example, in driving an automated car, if we tell the AI to reach the destination soon, it may take rash and risky routes or may exceed the speed limit causing immense pain for us. Therefore, a key role of AI research is to develop good technology without such devastating effects.

Photo by Tracy Le Blanc on Pexels.com

Deep Learning AI Image Recognition

It seems like everyone these days is implementing some form of image recognition such as google facebook and car companies etc. How exactly does a machine learn what a Siberian cat looks like? That is what we will look at today on the feed.

Now, with the help of artificial intelligence, we are able to do meaningful things with each of those squares and hexagons in order to boost our productivity and make our overall lives much easier today.

How an image recognition works

Machine learning is a subset of artificial intelligence that strives on completing specific tasks by prediction based on input and algorithms. If we go even deeper, we learn about deep learning. AI is a subset of machine learning, which attempts to mimic our own brain’s network of neurons to a machine.

Learn every day we’re getting image recognition more involved in order to help us with our personal daily lives. For example, if you see some strange-looking plant in the living room simply point google as its image and it will tell you what it is.

If your discord friend uploads a photo of their new cat and you want to know what breed it is. Just run a google image reverse search and you will find out what it is. Self-driving vehicles need to know where they can drive, which is a road, where are the lanes, where they can make a turn, what the difference is between a red light green light, etc.

Image recognition is a huge part of deep learning. The basic explanation is that in order for that car to know what a stop sign looks like it must be given an image of a stop sign the machine will read the stop sign. Through a variety of algorithms, it will then study the stop sign and analyze how the image is going to look by going section per section what color is the stop sign, what shape is it what’s written on it and where is it usually seen in a driver’s peripheral vision.

If there are any errors, scientists can simply correct them once the image has been completely red. It could be labeled and categorized but why stop with one image in our perspective we don’t really need to think for half a second about what a stop sign is and what we must do when we see it.

We have seen so many stop signs in our lives it is pretty much embedded in our brains. The machine must read many different stop signs for better accuracy. That way it doesn’t matter whether the stop sign is seen during foggy or rainy conditions, during the night, or during the day. The machine has seen a stop sign many times. It can know it’s a stop sign just by looking at its shape and color alone.

If you upload and backup your photos go check out your photos, if you haven’t sorted anything you will notice that Google has done it for you. There’s a category for places, things, videos, and animations. Google has sorted photos into albums based on where Google thinks they belong.

The photos labeled as food, beaches, trains, buses, and whatever else you may have photographed in the past. This is the work of Google’s image recognition analysis. It has analyzed over a million photos on the internet. It’s not just Google that uses image recognition as well if someone uploads a photo and Facebook recognizes it.

It will automatically tag them. It’s kind of creepy considering it’s a privacy concern but some people may appreciate the convenience anyways because it saves some time no matter how cool or scary it is. Image recognition plays a huge role in society and will continue to be in development many companies are continuing to implement image recognition and other AI technologies.

The more we can automate certain tasks with machines the more productive we can be as a society.