I used this dataset while doing a project for my undergrad coursework of Neural Network. We had to implement a handwritten digit recognition neural net using the MNIST dataset. Upon accomplishing it, I looked around for Devanagari dataset and found one located at CVResearchNepal.com, which seems to have expired as of this moment.

If you want the whole character dataset, then you can download it from Google drive. According to the authors, it consists of 92 thousand images of 46 different classes of characters of Devanagari script. You can learn more about the dataset from this archive of the original website.

Since I needed only the handwritten Devanagari digits, I went ahead and converted the digits dataset to an easer-to-work CSV format. It was hell of a lot easier and faster to feed the CSV content into the Neural network as an input.

Each image is of 32×32 dimension, so there are in total 1025 columns in the CSV file: 1024 columns for the image pixels, and 1 column for the actual digit value. The first column contains the actual digit that is represented by the next 1024 pixels/columns.

You can download the CSV file of the handwritten Nepali or Devanagari digits from my github repo here: https://github.com/sknepal/DHDD_CSV. The training set consists of 17000 images/examples (1700 images of each digit), whereas the testing set consists of 3000 images.

Dataset Conversion

Here’s the code for Image dataset conversion to CSV, in case you need it. (Also available on github.)

All credits of the dataset obviously, goes to the Computer Vision Research Group, Nepal, specifically, Shailesh Acharya, Ashok Kumar Pant, and Prashnna Kumar Gyawali. If you’d like to cite the authors, you can find the proper citation for reference mentioned here (scroll to the end).

Continue to the next post, where I talk about the implementation.