Handwritten digit database


This training dataset is derived from the original MNIST database available at http://yann.lecun.com/exdb/mnist/


We have processed the database and provide a separate training data file for each class 0 to 9. Use the following links to download them. Right click and save target in a desired folder.


data0 data1 data2 data3 data4 data5 data6 data7 data8 data9


File format:

Each file has 1000 training examples. Each training example is of size 28x28 pixels. The pixels are stored as unsigned chars (1 byte) and take values from 0 to 255. The first 28x28 bytes of the file correspond to the first training example, the next 28x28 bytes correspond to the next example and so on.


Eg. In Matlab, I would use the following to read the files

fid=fopen(data8,r);-- open the file corresponding to digit 8

[t1,N]=fread(fid,[28 28],uchar); -- read in the first training example and store it in a 28x28 size matrix t1

[t2,N]=fread(fid,[28 28],uchar); -- read the second example into t2

and so on

To display the image use imshow(t1)


Note: Make sure you are reading the files correctly. Find a way to display the first few and the last few images in each class. In C/C++ you have to treat each file as a binary file.


If you have difficulty reading the files contact sachin@jhu.edu