Neural Network was one of the electives available during my 5th semester, and knowing that it was available, I was certain that I wanted to study it over any other electives. But, unfortunately (or fortunately), my college, as it is with many other colleges of Nepal, made it a compulsion to take Cryptography as the elective subject. I, obviously, wanted to study Neural Net and could not convince the college on teaching ANN instead of Cryptography. I was left with no choice but to study both, which I did. A few of us studied both the subjects as an elective. Since the teacher was not available during the weekdays, we took classes on every weekend for 3 hours each. So, there were no holidays for us, for 4 months, I think.
When it came time to study, I enjoyed Cryptography more than I enjoyed Neural Network. I’d say, this is in part due to the course content, as well as the teacher’s ability to make the class interesting. Anyway, as the teacher put it, we had to implement the “hello world” problem of Neural Network as a class project. Meaning, handwritten digit recognition using MNIST dataset. I followed a great practical introductory book that helped me understand and implement it, which I will share just in a moment.
Having tried out the MNIST dataset, I decided I should give Nepali handwritten digit recognition a shot too. Looking around, I found a dataset provided by Computer Vision Research Group, Nepal. It contains both the devanagari letters and digits. I chose to work on just the digits. Since reading the images one by one and doing the computation would have been restrictive, I extracted the pixels from all the images and stored them in a CSV file. That way, its easier to feed the pixels into the neural net, and also to share the dataset among enthusiastic readers who would want to try it out themselves.
All the images of the digits in the original dataset were used. In total, 1700 examples of each digit was used to create a training set and 300 examples of each digit was used to create a testing set.
In the CSV file, the first value in the row is the actual value of the image, ie. the digit it represents, and the other values are its pixels. All the images in the dataset are of 32×32 dimension, so in total there are 1025 values (or columns) for each images including the first value which is the digit the image represents. The new dataset was then shuffled in order to avoid the occurrence of same kinds of examples one after another, which could lead to an inefficient learning.
Dataset
You can learn more about the dataset and download it from here.
Neural Network
Now that we have pre-processed the existing raw images in order to create a dataset that is easy to work with, we have a training set and a testing set.
Our Neural network will consist of 3 layers: an input, hidden and output layer. The number of these are governed by our data. Since we have 1024 pixel values for each images, there will be 1024 input nodes. Similarly, since our output needs to be one of the 10 digits (0 to 9), that means we will have 10 output nodes. Whichever node has the highest value will be the output of the network. However, the science of hidden layer is not quite clear, and what is done here is rather a simple guess at choosing the number of hidden layers. It is expected to be less than the input layer, and a value of 300 has been chosen. The learning rate is also chosen without any scientific reason whatsoever, at 0.3.
By Offnfopt [CC BY-SA 3.0]
Signals are transferred from input layer to next layer by multiplying the matrix of weight values and inputs. The weights are chosen randomly between -0.5 and +0.5. The output of this multiplication is the combined moderated signal into the hidden layer.
We have used sigmoid function as an activation function. The combined moderated signal is passed to the sigmoid function, 1/ (1 + e-x ), which squashes the input and gives an output for the layer. The same process is repeated between hidden and output layer.
The final output of the network is compared with the expected output in order to calculate the error. We can compute the error in the output layer by simply subtracting its output to the expected output. But, to learn the error in the hidden layer, we have to propagate the error backwards i.e. we need to split the errors according to the weight of the contributing links. This can be obtained by multiplying the matrix of weights of hidden layer to output layer and the error of the output layer.
Once we have the error, we need to update the link weights according to the error. We use gradient descent algorithm to refine the weights so that we can minimize our cost function.
Output
An accuracy of about 95%.
Code
Available on my GitHub: https://github.com/sknepal/devanagari_digit_recognition
If all these seemed foreign to you, I strongly recommend you to read this book called Make Your Own Neural Network by Tariq Rashid. The python implementation of the Neural network on GitHub is based on the book.
Next Challenge
Breaking a simple text captcha.