Training Neural Networks with Tensorflow


Birds inspired us to fly, nature has inspired us for numerous other inventions. Such is an invention of ANNs (artificial neural networks) which is inspired from neurons in human brain. We'll learn about the ways of creating and training ANNs using Tensorflow.
Note: This tutorial assumes conceptual understanding of ANNs. Learn more
Note: ANN (Artificial neural network) and NN (Neural network) terms are generally used interchangeably in Machine learning.
DNN (Deep neural network) on the other hand is a special type of ANN (or NN) with one or more hidden layers.
We'll discuss multiple ways of training a neural network:
  1. Using Tensorflow's high-level TF.Learn API
  2. Using plain Tensorflow (low-level)
  3. Using Tensorflow's pre-built functions

1. Training a deep neural network using TF.Learn API

Code example: The DNNClassifier class makes it easy to train a deep neural network with any number of layers and neurons.
import tensorflow as tf
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
iris = load_iris()
X = iris.data[:, (2,3)]
y = (iris.target == 0).astype(np.int)

# Split data into train (70%) and test (30%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

# Automatically find columns
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)

# Create and train a deep neural network classifier
dnn_classifier = tf.contrib.learn.DNNClassifier(hidden_units=[30, 10], n_classes=10, feature_columns=feature_columns)
dnn_classifier.fit(x=X_train, y=y_train, batch_size=50, steps=5000)

# Predict on test data
y_predicted = dnn_classifier.predict(X_test)

# Measure accuracy on test data
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, list(y_predicted)))

By default, the DNNClassifier uses the ReLU activation function, which can be changed with 'activation_fn' hyperparameter.

Q. DNNClassifier can be used for implementing,
 
 
 
 

DNNClassifier is great for quickly creating deep nets, however, it only allows limited flexibility. In the next section we'll learn about the low-level APIs from Tensorflow which would allow any level of customization.

2. Training a deep neural net using plain Tensorflow (lower-level APIs)

We'll implement mini-batch Gradient descent to train the MNIST dataset.
import tensorflow as tf
import numpy as np
n_inputs = 28*28
n_outputs = 10

# Add placeholders for input data
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')

# Create a utility to produce one hidden layer at a time
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1]) # X[1] is features, X[0] is samples

        # Create a variable initializer (the particular method is discussed below)
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)

        # Weights, biases and output
        W = tf.Variable(init, name='weights')
        b = tf.Variable(tf.zeros([n_neurons]), name='biases')

        z = tf.matmul(X, W) + b
        
        if activation == 'relu':
            return tf.nn.relu(z)
        else:
            return z

# Create the deep neural network (DNN)
with tf.name_scope('dnn'):
    # Create a hidden layer with 300 neurons
    hidden1 = neuron_layer(X, 300, 'hidden1', activation='relu')

    # Create another hidden layer with 100 neurons
    hidden2 = neuron_layer(hidden1, 100, 'hidden2', activation='relu')

    logits = neuron_layer(hidden2, n_outputs, 'outputs')

# Define loss function (cross-entropy)
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name='loss')

# Define the optimizer, i.e. loss minimizer (GradientDescent) for training
learning_rate = 0.01
with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

# Define metric to compute while training (accuracy)
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

# Load input data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data')

# Train the model
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 10
batch_size = 100
iterations = mnist.train.num_examples // batch_size

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(iterations):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict = {X: X_batch, y: y_batch})

        # Compute accuracy
        acc_train = accuracy.eval(feed_dict = {X: mnist.train.images, y: mnist.train.labels})
        acc_test = accuracy.eval(feed_dict = {X: mnist.test.images, y: mnist.test.labels})

        print('epoch:', epoch)
        print('train accuracy:', acc_train)
        print('test accuracy:', acc_test)
        
        save_path = saver.save(sess, './dnn_model.ckpt')

Few things to note:
  • For a given training/test sample, the no. of inputs to the neural network is equal to the no. of features for that sample, i.e. X[1].
  • The bias b is initialized to 0 (no symmetry issue in this case).
  • The particular method of creating initializer helps the algorithm converge much faster.
    stddev=2 / np.sqrt(n_inputs)
    init=tf.truncated_normal((n_inputs,n_neurons),stddev=stddev)
    It is one of those small tweaks to DNNs that have had a great impact on their efficiency.
  • It's important to initialize connection weights randomly for all hidden layers, in order to avoid any symmetries that the gradient descent algorithm wouldn't be able to break.
  • Large weights can slow down training. Using truncated normal distribution (instead of regular normal distribution) ensures that there won't be any large weights.
  • in_top_k helps determine if neural network's prediction is correct by checking whether or not the highest logit corresponds to the target class.

  • This works great, however we created the hidden layers with Tensorflow's manually with the neuron_layer method, which is quite cumbersome. In the next section we'll discuss a way to avoid that.

    3. Training a deep neural network using Tensorflow's pre-built fully_connected layer

    Instead of manually constructing the layers, we can directly use the fully_connected layer from tensorflow.
    # Create the deep neural network (DNN)
    from tensorflow.contrib.layers import fully_connected
    
    with tf.name_scope('dnn'):
        hidden1 = fully_connected(X, 300, scope='hidden1')
        hidden2 = fully_connected(hidden1, 100, scope='hidden2')
    
        logits = fully_connected(hidden2, n_outputs, scope='outputs', activation_fn=None)
    
    Full code

    Generating predictions

    with tf.Session() as sess:
        # Restore model
        saver.restore(sess, 'dnn_model.ckpt')
    
        # Get images to predict labels
        X_new_scaled = [..] # Some images
    
        # Evaluate
        Z = logits.eval(feed_dict = {X: X_new_scaled})
        y_pred = np.argmax(Z, axis=1)