I used this dataset while doing a project for my undergrad coursework of Neural Network. We had to implement a handwritten digit recognition neural net using the MNIST dataset. Upon accomplishing it, I looked around for Devanagari dataset and found one located at CVResearchNepal.com, which seems to have expired as of this moment.
If you want the whole character dataset, then you can download it from Google drive. According to the authors, it consists of 92 thousand images of 46 different classes of characters of Devanagari script. You can learn more about the dataset from this archive of the original website.
Since I needed only the handwritten Devanagari digits, I went ahead and converted the digits dataset to an easer-to-work CSV format. It was hell of a lot easier and faster to feed the CSV content into the Neural network as an input.
Each image is of 32×32 dimension, so there are in total 1025 columns in the CSV file: 1024 columns for the image pixels, and 1 column for the actual digit value. The first column contains the actual digit that is represented by the next 1024 pixels/columns.
You can download the CSV file of the handwritten Nepali or Devanagari digits from my github repo here: https://github.com/sknepal/DHDD_CSV. The training set consists of 17000 images/examples (1700 images of each digit), whereas the testing set consists of 3000 images.
Dataset Conversion
Here’s the code for Image dataset conversion to CSV, in case you need it. (Also available on github.)
from scipy.misc import imread import numpy as np import pandas as pd import os root = './train' # or ‘./test’ depending on for which the CSV is being created # go through each directory in the root folder given above for directory, subdirectories, files in os.walk(root): # go through each file in that directory for file in files: # read the image file and extract its pixels im = imread(os.path.join(directory,file)) value = im.flatten() # I renamed the folders containing digits to the contained digit itself. For example, digit_0 folder was renamed to 0. # so taking the 9th value of the folder gave the digit (i.e. "./train/8" ==> 9th value is 8), which was inserted into the first column of the dataset. value = np.hstack((directory[8:],value)) df = pd.DataFrame(value).T df = df.sample(frac=1) # shuffle the dataset with open('train.csv', 'a') as dataset: df.to_csv(dataset, header=False, index=False)
All credits of the dataset obviously, goes to the Computer Vision Research Group, Nepal, specifically, Shailesh Acharya, Ashok Kumar Pant, and Prashnna Kumar Gyawali. If you’d like to cite the authors, you can find the proper citation for reference mentioned here (scroll to the end).
Continue to the next post, where I talk about the implementation.
Subigya Nepal (@SkNepal)
November 1, 2016 @ 2:40 pm
#Devanagari Handwritten Character Dataset: https://t.co/ZGYacEQJnu
Srijal
November 11, 2016 @ 3:55 pm
Interesting. I am a recent graduate, a beginner in python and I have found machine learning to be quite interesting.
Your project looks very nice. I was wondering about a few things:
1. The accuracy of the result classification?
2. Modules that you used in the project.
Also, any tips on how I could be better at machine learning or data science is really appreciated. :)
Subigya Nepal
December 14, 2016 @ 5:11 pm
Hi,
Sorry for the delay in response.
I’m a beginner too. You can read my latest post on the implementation here: https://www.thelacunablog.com/devnagari-digit-recognition-neural-network.html
Hopefully, you’ll find your answers there. :)