Friday, May 19, 2017

Image Recognition ( MNIST Dataset) - 98% Accuracy- Under 30 code lines

Convolutional_MNIST - TensorFlow

Introduction :

In this exercise, we will use TensorFlow library for image classification of MNIST digits. MNIST dataset is used widely for benchmarking image classification algorithms.The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Main focus of this exercise is to show the power of TensorFlow library. I will not cover the details and fundamentals of covnets and neural networks. However, i will post the required URLs for someone who wants to learn about it.The reason i want to avoid any fundamentals is because the focus is showing potential of Tensorflow. Those who wants to learn about Tensorflow basics. Here is the URL that i found very useful. http://learningtensorflow.com/

Those who wants to learn more about neural networks and covnets in general. Here is hte URL from Tensorflow website: https://www.tensorflow.org/tutorials/

Lets get started by importing the dataset and required libraries

In [1]:
import tensorflow as tf
import math
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
tf.set_random_seed(0)

Read the data set

In [2]:
mnist=read_data_sets("data",one_hot=True,reshape=False,validation_size=0)
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/t10k-labels-idx1-ubyte.gz

Architecture of network we are going to build in this exercise

In [3]:
# neural network structure for this sample:
#
# · · · · · · · · · ·      (input data, 1-deep)                 X [batch, 28, 28, 1]
# @ @ @ @ @ @ @ @ @ @   -- conv. layer 5x5x1=>4 stride 1        W1 [5, 5, 1, 4]        B1 [4]
# ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶                                             Y1 [batch, 28, 28, 4]
#   @ @ @ @ @ @ @ @     -- conv. layer 5x5x4=>8 stride 2        W2 [5, 5, 4, 8]        B2 [8]
#   ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶                                             Y2 [batch, 14, 14, 8]
#     @ @ @ @ @ @       -- conv. layer 4x4x8=>12 stride 2       W3 [4, 4, 8, 12]       B3 [12]
#     ∶∶∶∶∶∶∶∶∶∶∶                                               Y3 [batch, 7, 7, 12] => reshaped to YY [batch, 7*7*12]
#      \x/x\x\x/        -- fully connected layer (relu)         W4 [7*7*12, 200]       B4 [200]
#       · · · ·                                                 Y4 [batch, 200]
#       \x/x\x/         -- fully connected layer (softmax)      W5 [200, 10]           B5 [10]
#        · · ·                                                  Y [batch, 10]

Create Tensorflow Placeholders for input image, Ouput Labels and Learning Rate

Those who don't know what are tensorflow placeholders.Here is good tutorial: http://learningtensorflow.com/lesson4/

In [4]:
X=tf.placeholder(tf.float32,[None,28,28,1]) # Placeholder for input images. 
Y_=tf.placeholder(tf.float32,[None,10]) # Placeholder for ouput labels
lr=tf.placeholder(tf.float32) # Learning rate placeholder
In [5]:
K,L,M,N=4,8,12,200 # Here N is Fully connected layer and others are conv layers

Initialize the weights and biases for our conv layers and fully connected layer

In [6]:
W1=tf.Variable(tf.truncated_normal([5,5,1,K],stddev=0.1),name="W1")
B1=tf.Variable(tf.ones([K]),name="B1")
W2=tf.Variable(tf.truncated_normal([5,5,K,L],stddev=0.1),name="W2")
B2=tf.Variable(tf.ones([L]),name="B2")
W3=tf.Variable(tf.truncated_normal([5,5,L,M],stddev=0.1),name="W3")

#fully connected layer
B3=tf.Variable(tf.ones([M]),name="B3")
W4=tf.Variable(tf.truncated_normal([7*7*M,N],stddev=0.1),name="W4")
B4=tf.Variable(tf.ones([N]),name="B4")
W5=tf.Variable(tf.truncated_normal([N,10]),name="W5")
B5=tf.Variable(tf.ones([10]),name="B5")

Build the conv layers by applying relu activation function with stride of 1,2,2 respectively

In [7]:
Y1=tf.nn.relu(tf.nn.conv2d(X,W1,strides=[1,1,1,1],padding='SAME',name="conv1")+B1)
Y2=tf.nn.relu(tf.nn.conv2d(Y1,W2,strides=[1,2,2,1],padding='SAME',name="conv2")+B2)
Y3=tf.nn.relu(tf.nn.conv2d(Y2,W3,strides=[1,2,2,1],padding='SAME',name="conv3")+B3)

Reshape the output of convolution layer to be fed into fully connected layer

In [8]:
YY=tf.reshape(Y3,shape=[-1,7*7*M])
Y4=tf.nn.relu(tf.matmul(YY,W4,name="Fc1")+B4)

Build the Fully Connected layer and apply Softmax activation function

In [9]:
Ylogits=tf.matmul(Y4,W5,name="Fc2")+B5
Y=tf.nn.softmax(Ylogits)

Lets define our loss function

In [10]:
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits,labels=Y_))*100

Formula to calculate the accuracy

In [11]:
correct_prediction=tf.equal(tf.argmax(Y,1),tf.argmax(Y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

Define our Batch size,epochs and learning rate for training

In [16]:
epochs,batch,learning_rate=30000,256,0.0001

Define our optimizer function

In [17]:
train_step=tf.train.AdamOptimizer(lr).minimize(cross_entropy)

Lets train the network

In [18]:
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(epochs):
    # training on batches of 256 images with 256 labels
    batch_X, batch_Y = mnist.train.next_batch(batch)
    if i%5000==0:        
        train_accuracy= sess.run(accuracy,feed_dict= {X: batch_X, Y_: batch_Y})
        print("Step %d, Training Accuracy %g"%(i,train_accuracy)) 
    sess.run(train_step, {X: batch_X, Y_: batch_Y, lr: learning_rate})
Step 0, Training Accuracy 0.113281
Step 5000, Training Accuracy 0.886719
Step 10000, Training Accuracy 0.933594
Step 15000, Training Accuracy 0.945312
Step 20000, Training Accuracy 0.980469
Step 25000, Training Accuracy 0.992188

Time to see our model performance on Test Data.

In [19]:
print(sess.run(accuracy, feed_dict={X: mnist.test.images, Y_: mnist.test.labels}))
sess.close()
0.9792

We can see that how easily we were able to achieve ~98% accuracy. It is possible to further tweak it and get better performance but i would not worry because the purpose of this article is demonstrate the power of Tensorflow library. If i add more epochs and do some additional cleaning i can go over 99% accuracy.

Computing Options:

Dataset used here is relatively small and the network used is relatively straight forward but you will still see that it will take about an hour to train the network, if you run it on your local machine.In this exercise, i have used AWS EC2 instance to train the network. It took me less than 5 minutes to train the network.So always make sure you don't waste time in training your network on local machine. There are lot of free cloud providers that you can use for the purpose of learning. Google has recently announced Google cloud, which can be used as well.

No comments:

Post a Comment