Keras is a compact and easy-to-learn high-level Python library for deep learning that can run on top of TensorFlow (or Theano or CNTK). It allows developers to focus on the main concepts of deep learning, such as creating layers for neural networks, while taking care of the nitty-gritty details of tensors, their shapes, and their mathematical details. TensorFlow (or Theano or CNTK) has to be the back end for Keras. You can use Keras for deep learning applications without interacting with the relatively complex TensorFlow (or Theano or CNTK). There are two major kinds of framework: the sequential API and the functional API. The sequential API is based on the idea of a sequence of layers; this is the most common usage of Keras and the easiest part of Keras. The sequential model can be considered as a linear stack of layers.

In short, you create a sequential model where you can easily add layers, and each layer can have convolution, max pooling, activation, drop-out, and batch normalization. Let’s go through major steps to develop deep learning models in Keras.

Major Steps to Deep Learning Models

The four core parts of deep learning models in Keras are as follows:

  1. 1.

    Define the model. Here you create a sequential model and add layers. Each layer can contain one or more convolution, pooling, batch normalization, and activation function.

  2. 2.

    Compile the model. Here you apply the loss function and optimizer before calling the compile() function on the model.

  3. 3.

    Fit the model with training data. Here you train the model on the test data by calling the fit() function on the model.

  4. 4.

    Make predictions. Here you use the model to generate predictions on new data by calling functions such as evaluate() and predict().

There are eight steps to the deep learning process in Keras:

  1. 1.

    Load the data.

  2. 2.

    Preprocess the data.

  3. 3.

    Define the model.

  4. 4.

    Compile the model.

  5. 5.

    Fit the model.

  6. 6.

    Evaluate the model.

  7. 7.

    Make the predictions.

  8. 8.

    Save the model.

Load Data

Here is how you load data :

figure a
figure b

Preprocess the Data

Here is how you preprocess data :

figure c
figure d

Define the Model

Sequential models in Keras are defined as a sequence of layers. You create a sequential model and then add layers. You need to ensure the input layer has the right number of inputs. Assume that you have 3,072 input variables; then you need to create the first hidden layer with 512 nodes/neurons. In the second hidden layer, you have 120 nodes/neurons. Finally, you have ten nodes in the output layer. For example, an image maps onto ten nodes that shows the probability of being label1 (airplane), label2 (automobile), label3 (cat), …, label10 (truck). The node of highest probability is the predicted class/label.

figure e

One image has three channels (RGB), and in each channel, the image has 32×32 = 1024 pixels. So, each image has 3×1024 = 3072 pixels (features/X/inputs).

With the help of 3,072 features, you need to predict the probability of label1 (Digit 0), label2 (Digit 1), and so on. This means the model predicts ten outputs (Digits 0–9) where each output represents the probability of the corresponding label. The last activation function (sigmoid, as shown earlier) gives 0 for nine outputs and 1 for only one output. That label is the predicted class for the image (Figure 2-1).

For example, 3,072 features ➤ 512 nodes ➤ 120 nodes ➤ 10 nodes.

Figure 2-1
figure 1

Defining the model

The next question is, how do you know the number of layers to use and their types? No one has the exact answer. What’s best for evaluation metrics is that you decide the optimum number of layers and the parameters and steps in each layer. A heuristics approach is also used. The best network structure is found through a process of trial-and-error experimentation. Generally, you need a network large enough to capture the structure of the problem.

In this example, you will use a fully connected network structure with three layers. A dense class defines fully connected layers.

In this case, you initialize the network weights to a small random number generated from a uniform distribution (uniform) in this case between 0 and 0.05 because that is the default uniform weight initialization in Keras. Another traditional alternative would be normal for small random numbers generated from a Gaussian distribution. You use or snap to a hard classification of either class with a default threshold of 0.5. You can piece it all together by adding each layer.

Compile the Model

Having defined the model in terms of layers, you need to declare the loss function, the optimizer, and the evaluation metrics. When the model is proposed, the initial weight and bias values are assumed to be 0 or 1, a random normally distributed number, or any other convenient numbers. But the initial values are not the best values for the model. This means the initial values of weight and bias are not able to explain the target/label in terms of predictors (Xs). So, you want to get the optimal value for the model. The journey from initial values to optimal values needs a motivation, which will minimize the cost function/loss function. The journey needs a path (change in each iteration), which is suggested by the optimizer. The journey also needs an evaluation measurement, or evaluation metrics.

figure f

Popular loss functions are binary cross entropy, categorical cross entropy, mean_squared_logarithmic_error and hinge loss. Popular optimizers are stochastic gradient descent (SGD), RMSProp, adam, adagrad, and adadelta. Popular evaluation metrics are accuracy, recall, and F1 score.

In short, this step is aimed at tuning the weights and biases based on loss functions through iterations based on the optimizer evaluated by metrics such as accuracy.

Fit the Model

Having defined and compiled the model, you need to make predications by executing the model on some data. Here you need to specify the epochs; these are the number of iterations for the training process to run through the data set and the batch size, which is the number of instances that are evaluated before a weight update. For this problem, the program will run for a small number of epochs (10), and in each epoch, it will complete 50(=50,000/1,000) iterations where the batch size is 1,000 and the training data set has 50,000 instances/images. Again, there is no hard rule to select the batch size. But it should not be very small, and it should be much less than the size of the training data set to consume less memory.

figure g

Evaluate Model

Having trained the neural networks on the training data sets, you need to evaluate the performance of the network. Note that this will only give you an idea of how well you have modeled the data set (e.g., the train accuracy), but you won’t know how well the algorithm might perform on new data. This is for simplicity, but ideally, you could separate your data into train and test data sets for the training and evaluation of your model. You can evaluate your model on your training data set using the evaluation() function on your model and pass it the same input and output used to train the model. This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

figure h

Prediction

Once you have built and evaluated the model, you need to predict for unknown data.

figure i

Save and Reload the Model

Here is the final step:

figure j
figure k

Optional: Summarize the Model

Now let’s see how to summarize the model.

figure l

Additional Steps to Improve Keras Models

Here are some more steps to improve your models:

  1. 1.

    Sometimes, the model building process does not complete because of a vanishing or exploding gradient. If this is the case, you should do the following:

    figure n
  2. 2.

    Model the output shape.

    #Shape of the n-dim array (output of the model at the current position)   model.output_shape

  3. 3.

    Model the summary representation.

    model.summary()

  4. 4.

    Model the configuration.

    model.get_config()

  5. 5.

    List all the weight tensors in the model.

    model.get_weights()

Here I am sharing the complete code for the Keras model. Can you attempt to explain it?

figure m

Keras with TensorFlow

Keras provides high-level neural networks by leveraging a powerful and lucid deep learning library on top of TensorFlow/Theano. Keras is a great addition to TensorFlow as its layers and models are compatible with pure-TensorFlow tensors. Moreover, it can be used alongside other TensorFlow libraries.

Here are the steps involved in using Keras for TensorFlow:

  1. 1.

    Start by creating a TensorFlow session and registering it with Keras. This means Keras will use the session you registered to initialize all the variables that it creates internally.

    import TensorFlow as tf sess = tf.Session() from keras import backend as K K.set_session(sess)

  2. 2.

    Keras modules such as the model, layers, and activation are used to build models. The Keras engine automatically converts these modules into the TensorFlow-equivalent script.

  3. 3.

    Other than TensorFlow, Theano and CNTK can be used as back ends to Keras.

  4. 4.

    A TensorFlow back end has the convention of making the input shape (to the first layer of your network) in depth, height, width order, where depth can mean the number of channels.

  5. 5.

    You need to configure the keras.json file correctly so that it uses the TensorFlow back end. It should look something like this:

    {        "backend": "theano",        "epsilon": 1e-07,        "image_data_format": "channels_first",        "floatx": "float32" }

In next chapters, you will learn how to leverage Keras for working on CNN, RNN, LSTM, and other deep learning activities.