Tools

Open in Colab

# if you're using colab, then install the required modules
import sys

IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    %pip install --quiet --upgrade pytorch-lightning

Overview

There is huge variety of machine learning and deep learning tools.

In this course, we’ll focus on:

The tool you choose depends many considerations, for example:

  • Your research problem

  • Model availability (e.g., pre-trained, state-of-the-art)

  • Ecosystem (e.g., compatibility with other tools)

  • Personal preferences

  • Deployment (e.g., hardware)

There are many discussions on the different choices e.g., 1, 2.

scikit-learn

Scikit-learn has a wide range of simple and efficient classic machine learning tools.

There are ones for:

  • Linear Models (examples)

    • A set of methods where the output is a linear combination of the inputs.

    • For example, fitting a straight line to the data using Linear Regression (also known as ordinary least squares).

  • Nearest Neighbours (examples)

    • Find a (pre-defined) number of training samples closest in distance to the new point, and predict the label from these.

    • The number of samples can be defined in different ways.

    • There are various measures of distance.

    • For example, classifying labels based on their closeness to other samples in Nearest Neighbor Classification.

  • Support Vector Machines (examples)

    • Place a decision function (i.e., the support vector) between data points to classify, regress, or find outliers.

    • For example, find the two hardest to categorise samples and place a decision boundary between them in Support Vector Classification.

  • Decision trees (examples)

    • Predict the value of a target variable by learning simple decision rules inferred from the data features.

    • Many decisions are grouped together into a tree.

    • For example, many decision trees together in an ensemble is a Random Forest.

  • And many more.

TensorFlow

Tensorflow is an end-to-end open source machine learning platform.

TensorFlow has a user-friendly, high-level API (Application Programming Interface) called Keras.

Keras includes a wide range of high-level objects (tutorials) including:

You can always go lower level when required (e.g., custom objects).

Through Keras and TensorFlow you create models and layers using any of the following APIs:

Sequential

Functional

Subclassing

Data structure

Graph: Linear stack of layers.

Graph: Non-linear DAG (directed acyclic graph) of layers.

Object-orientated. Write the forward pass (backward pass is automatic).

Shared layers and multiple inputs/outputs

No. Each layer has one input and one output.

Yes. Each layer can have multiple inputs and outputs.

Yes.

Main benefits and drawbacks

Simplest, (re)usability (easily saved), model checks to catch errors early, static.

Similar to seqential, but more flexible.

Maximum flexibility, no model checks, more complex, dynamic.

Show model graph?

Yes.

Yes.

Can add via the guidance here.

There are many libraries and extensions including:

PyTorch

PyTorch is an end-to-end open source machine learning platform.

PyTorch has user-friendly APIs:

PyTorch (and its extensions) include a wide range of high-level objects including:

You can always go lower level when required (e.g., custom objects).

Similar to TensorFlow/Keras, you can create models and layers in PyTorch using either Sequential or Subclassing APIs (or in combination). These have similar features to the table above, where the Sequential API is simpler and the Subclassing API enables flexibility.

There are many libraries and extensions including:

Example - Linear regression

Let’s start with a introductory example fitting a straight line to data.

Don’t worry too much about some of the details as we’ll cover them in later lesson.

For now, focus on the general workflow.

We’ll see how this in done in each of three key tools we cover here: scikit-learn, TensorFlow, and PyTorch.

Let’s create some (noisy) data to train on:

import numpy as np
def create_noisy_linear_data(num_points):
    x = np.arange(num_points)
    noise = np.random.normal(0, 1, num_points)
    y = 2 * x + noise
    # convert to 2D arrays
    x, y = x.reshape(-1, 1), y.reshape(-1, 1)
    return x, y
x_train, y_train = create_noisy_linear_data(10)

Caution

Input arrays to models needs to be 2 dimensional (2D) i.e., a column of rows.

For example, instead of one row:

>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Convert this to a column of rows using .reshape(-1, 1):

>>> np.arange(10).reshape(-1, 1)
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

scikit-learn

First, let’s try with scikit-learn:

from sklearn import linear_model
model_sklearn = linear_model.LinearRegression()

When fit is called for Linear Regression, the loss that is trying to be minimised is the mean squared error between the predictions and the actual values.

This determines what parameters the model learns.

model_sklearn.fit(x_train, y_train)
LinearRegression()

The data was from the line y = 2x, so the gradient was 2.

Let’s see what the model estimated it to be:

model_sklearn.coef_[0]
array([2.01038254])

Pretty close, considering there was only 10 training data points.

TensorFlow

Now, for TensorFlow:

import tensorflow as tf
2022-04-25 10:33:57.462069: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-25 10:33:57.462104: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Create the model (using the simpler sequential API).

Note, it’s helpful to name the layers in the model.

model_tf = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(1,), name="inputs"),
        tf.keras.layers.Dense(units=1, name="outputs"),
    ],
    name="sequential",
)
2022-04-25 10:33:58.999755: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-25 10:33:58.999794: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-04-25 10:33:58.999818: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az39-99): /proc/driver/nvidia/version does not exist
2022-04-25 10:33:59.000070: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

For reference, here’s what this would have looked like using the functional and subclassing APIs:

inputs = tf.keras.Input(shape=(1,), name="inputs")
outputs = tf.keras.layers.Dense(units=1, name="outputs")(inputs)
model_tf_functional = tf.keras.Model(inputs, outputs, name="functional")
class MyModel(tf.keras.Model):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)  # handles standard arguments e.g., name
        self.outputs = tf.keras.layers.Dense(units=1, name="outputs")

    def call(self, inputs):  # have inputs as argument to call, rather than define
        x = self.outputs(inputs)
        return x


model_tf_subclassing = MyModel(name="subclassing")

You can now show the model summary.

Note, this only shows layers (not the Input object).

model_tf.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 outputs (Dense)             (None, 1)                 2         
                                                                 
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________

You can also show the model graph:

tf.keras.utils.plot_model(model_tf, show_shapes=True)
_images/02_tools_33_0.png

Now, compile the model.

The keyword arguments to optimizer, loss, and metrics can either be strings (e.g., mean_squared_error) or TensorFlow objects (e.g., tf.keras.losses.MeanSquaredError())

model_tf.compile(
    optimizer="sgd",
    loss="mean_squared_error",
    metrics=["accuracy"],
)

And, train the model.

Epochs are how many passes over the whole training set.

model_tf.fit(
    x_train,
    y_train,
    epochs=10,
    verbose=False,  # print out the metrics per epoch
);

And, let’s see what this model though the gradient was:

model_tf.weights[0].numpy()
array([[1.9333118]], dtype=float32)

PyTorch

import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

Create the dataset and dataloader:

x_train_tensor = torch.from_numpy(x_train).type(torch.float32)
y_train_tensor = torch.from_numpy(y_train).type(torch.float32)
ds_train = TensorDataset(x_train_tensor, y_train_tensor)
dataloader_train = DataLoader(ds_train)

Create the model (using the simpler sequential API):

model_torch = nn.Sequential(nn.Linear(in_features=1, out_features=1))
print(model_torch)
Sequential(
  (0): Linear(in_features=1, out_features=1, bias=True)
)

For reference, here’s what this would have looked like using the subclassing APIs:

class NeuralNetwork(nn.Module):
    def __init__(self):  # model definition
        super(NeuralNetwork, self).__init__()  # instantiate the nn.Module
        self.outputs = nn.Linear(in_features=1, out_features=1)

    def forward(self, x):  # the computations for the forward layer, not called directly
        logits = self.outputs(x)
        return logits


model_torch_subclassing = NeuralNetwork()
print(model_torch_subclassing)
NeuralNetwork(
  (outputs): Linear(in_features=1, out_features=1, bias=True)
)

Note

The backward propagation is calculated automatically, though you can do it manually if you like.

Define the loss and optimiser:

loss_function = nn.MSELoss()
optimiser = torch.optim.SGD(model_torch.parameters(), lr=1e-3)

Define a single training step:

def train(dataloader, model, loss_function, optimiser):
    size = len(dataloader.dataset)
    model.train()  # set the model in training mode, rather than in evaluation mode i.e., `model.eval()`

    # for each batch of data
    for batch, (X, y) in enumerate(dataloader):

        # step 1: make a prediction for these inputs
        prediction = model(X)

        # step 2: compute the loss for that prediction
        loss = loss_function(prediction, y)

        # step 3: first, clean the gradients
        optimiser.zero_grad()

        # step 4: backpropagate the gradients for that loss
        loss.backward()

        # step 5: update the parameters accordingly
        optimiser.step()

Note, that testing doesn’t need the gradients (i.e., steps 3-5).

Hence, the test function would look something like:

def test(dataloader, model, loss_function):
    size = len(dataloader.dataset)
    model.eval()  # set the model in evaluation mode
    ...
    
    with torch.no_grad():  # don't track gradients
        for batch, (X, y) in enumerate(dataloader):
            # step 1: make a prediction for these inputs
            prediction = model(X)

            # step 2: compute the loss for that prediction
            loss = loss_function(prediction, y)
            ...

We’ll see more examples of testing later.

Run the training step over multiple epochs:

NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
    train(dataloader_train, model_torch, loss_function, optimiser)

And, let’s see what this model thought the gradient was:

# to check parameter names
for name, parameter in model_torch.named_parameters():
    print(name)
0.weight
0.bias
model_torch[0].weight
Parameter containing:
tensor([[1.7857]], requires_grad=True)

Now, we can see how well these models fit a line to the data.

First, grab the predictions of each model (from the training data for plotting purposes).

y_pred_sklearn = model_sklearn.predict(x_train)
y_pred_tf = model_tf.predict(x_train)
y_pred_torch = model_torch(x_train_tensor).detach().numpy()

Then, show these lines on a plot:

import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
colors = {"data": "#1b9e77", "sklearn": "#d95f02", "tf": "#7570b3", "torch": "#66a61e"}
def make_plot(ax, y_pred, label, title):
    ax.scatter(x_train, y_train, color=colors["data"])
    ax.plot(x_train, y_pred, color=colors[label], linewidth=3)
    ax.set_title(title)
    ax.set_ylim([0, 18])
    ax.set_xlim([0, 9])
    ax.set_facecolor("whitesmoke")
fig = plt.figure(1, figsize=(12, 4))
ax1, ax2, ax3 = fig.subplots(1, 3)

make_plot(ax1, y_pred_sklearn, "sklearn", "scikit-learn")
make_plot(ax2, y_pred_tf, "tf", "TensorFlow")
make_plot(ax3, y_pred_torch, "torch", "PyTorch")

plt.show()
_images/02_tools_70_0.png

They all did a good job of fitting a function to the data.

In other words, they found the association in the data.

However, this was a very simple example that probably didn’t require machine learning (let alone deep learning).

Though it demonstrates what they do.

Now, let’s look at something a little more suitable.

Example - Digit classification

Let’s train a model to recognise handwritten digits using the classic MNIST dataset.

This is a classification task.

scikit-learn

First, with scikit-learn:

from sklearn import datasets, linear_model, metrics, svm
from sklearn.model_selection import train_test_split

Load the data

digits = datasets.load_digits()

Take a look at the labelled data:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Label: {label}")
_images/02_tools_78_0.png

Preprocess and split the data

def preprocess_data(digits):
    # the data comes as 2D 8x8 pixels
    # flatten the images to 1D 64 pixels
    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1))
    return n_samples, data
n_samples, data = preprocess_data(digits)
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False
)

Create a model

Here, we will use a Support Vector Classifier.

Don’t worry about what gamma is for now (if you’re interested, read the documentation).

model = linear_model.LogisticRegression()
model = svm.SVC(gamma=0.001)

Fit the model to the training data

model.fit(X_train, y_train)
SVC(gamma=0.001)

Use the model to predict the test data

y_pred = model.predict(X_test)

Take a look at the predictions for these test digits:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, y_pred):
    ax.set_axis_off()
    image = image.reshape(8, 8)  # 1D 64 pixels to 2D 8*8 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {prediction:.0f}")
_images/02_tools_91_0.png

Looking good. The predicted labels match the ground truth images.

How well did our model do overall?

overall_accuracy = metrics.accuracy_score(y_test, y_pred)
overall_accuracy
0.9688542825361512

97% accuracy is good.

Let’s do some quick error analysis using a confusion matrix.

This shows how well the classification model did for each category.

The predictions are on the x-axis and the true labels from the test data are on the y-axis.

A perfect score would be where the predictions always match the true labels (i.e., all values are on the diagonal line).

confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()
_images/02_tools_97_0.png

We can see that the although the model did well, it struggled with 3’s by confusing them with 5’s, 7’s, and 8’s.

This points us in the direction of how we might improve the model.

We could also use cross-validation to find the variation in the training score:

from sklearn.model_selection import KFold, cross_val_score
cv = KFold(n_splits=5, shuffle=False)
test_scores = cross_val_score(model, X_train, y_train, cv=cv)
test_scores
array([0.93333333, 0.99444444, 0.90555556, 0.98882682, 0.95530726])
print(f"CV accuracy = {test_scores.mean():0.2f} (+/- {test_scores.std():0.2f})")
CV accuracy = 0.96 (+/- 0.03)

Save the model

You can save models using joblib:

from joblib import dump
import os
from pathlib import Path

path_models = f"{os.getcwd()}/models"
Path(path_models).mkdir(parents=True, exist_ok=True)

You can then save the model using:

dump(model, f"{path_models}/mnist_model_sklearn.joblib")

You could then load this model back using:

from joblib import load

reloaded_model = load(f'{path_models}/mnist_model_sklearn.joblib')

TensorFlow

Now, with TensorFlow.

Check whether there are any GPUs (Graphical Processing Units) available.

Note, the device is the hardware that TensorFlow runs on (e.g., CPUs (Central Processing Units), GPUs).

print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))
Num GPUs Available:  0

Load and split the data

(train_images, train_labels), (
    test_images,
    test_labels,
) = tf.keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
   16384/11490434 [..............................] - ETA: 0s

 3588096/11490434 [========>.....................] - ETA: 0s

11411456/11490434 [============================>.] - ETA: 0s

11493376/11490434 [==============================] - 0s 0us/step

11501568/11490434 [==============================] - 0s 0us/step

Take a look at some of the training data:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, train_images, train_labels):
    ax.set_axis_off()
    image = image.reshape(28, 28)  # 1D 784 pixels to 2D 28*28 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Label: {label}")
_images/02_tools_116_0.png

Create the model

Can use any of the sequential, functional, or subclassing APIs.

Let’s use the simpler Sequential API for now.

You could also use many .add() calls instead of the list.

Note

You could make the final layer a softmax (to output probabilities directly), though this is discouraged for numerical stability reasons.

Tip

It’s often useful to place pre-processing steps into the model pipeline too.

For example, here we flatten the 2D image to a 1D tensor and normalise the images to greyscale (i.e., convert the values to between 0 and 1).

model = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(28, 28), name="inputs"),
        tf.keras.layers.Flatten(name="flatten"),
        tf.keras.layers.Rescaling(1.0 / 255, name="normalise"),
        tf.keras.layers.Dense(128, activation="relu", name="layer1"),
        tf.keras.layers.Dense(128, activation="relu", name="layer2"),
        tf.keras.layers.Dense(10, name="outputs"),  # 1 unit per class
    ]
)

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 normalise (Rescaling)       (None, 784)               0         
                                                                 
 layer1 (Dense)              (None, 128)               100480    
                                                                 
 layer2 (Dense)              (None, 128)               16512     
                                                                 
 outputs (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________

We can now also visualise the architecure:

tf.keras.utils.plot_model(model, show_shapes=True)
_images/02_tools_122_0.png

Compile the model

It’s useful to name the metrics, especially if there’s more than one.

Here, we’ll use the Adam optimiser, sparse categorical crossentropy loss, and a metric of accuracy.

model.compile(
    optimizer="adam",
    loss=tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True
    ),  # ensure classifies using logits
    metrics=["accuracy"],
)

Fit the model to the training data

The fit() call returns a history object.

Note

The validation_split keyword argument can only be used for NumPy training data.

BATCH_SIZE = 32

history = model.fit(
    train_images,
    train_labels,
    epochs=2,
    batch_size=BATCH_SIZE,
    verbose=False,  # print the output from each epoch
    validation_split=0.2,  # automatically set apart a validation set: 0.2 means 20% for validation
);

The history.history dictionary then contains the loss and metrics per epoch:

history.history
{'loss': [0.2573760449886322, 0.10243479162454605],
 'accuracy': [0.9240000247955322, 0.9691666960716248],
 'val_loss': [0.12908461689949036, 0.11556709557771683],
 'val_accuracy': [0.9612500071525574, 0.9665833115577698]}

Predictions

Use the model for predictions with model.predict() (i.e., inference).

Models return logits or log-odds. If you’d like these be to probabilities, add a softmax layer:

probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
y_pred = probability_model.predict(test_images)

Each prediction has a probability per category:

y_pred[0]
array([1.94486702e-07, 2.15670184e-06, 3.55272961e-04, 8.12521976e-05,
       1.30339366e-08, 5.19397645e-08, 6.03369993e-11, 9.99556482e-01,
       1.00925604e-07, 4.41959219e-06], dtype=float32)

The most likely category can be found by finding the maximum of these (using np.argmax):

np.argmax(y_pred[0])
7

So, the model thinks the first digit is a 7.

Let’s see if that’s right by plotting the first four test digits with their predictions:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, test_images, y_pred):
    ax.set_axis_off()
    image = tf.reshape(image, (28, 28))  # 1D 784 pixels to 2D 28*28 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {np.argmax(prediction):.0f}")
_images/02_tools_138_0.png

Let’s now evaluate the model overall

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy (R2): {test_acc}")
  1/313 [..............................] - ETA: 30s - loss: 0.0277 - accuracy: 1.0000

 38/313 [==>...........................] - ETA: 0s - loss: 0.1217 - accuracy: 0.9613 

 75/313 [======>.......................] - ETA: 0s - loss: 0.1434 - accuracy: 0.9538

112/313 [=========>....................] - ETA: 0s - loss: 0.1335 - accuracy: 0.9576

149/313 [=============>................] - ETA: 0s - loss: 0.1346 - accuracy: 0.9555

186/313 [================>.............] - ETA: 0s - loss: 0.1224 - accuracy: 0.9600

223/313 [====================>.........] - ETA: 0s - loss: 0.1187 - accuracy: 0.9615

261/313 [========================>.....] - ETA: 0s - loss: 0.1101 - accuracy: 0.9646

299/313 [===========================>..] - ETA: 0s - loss: 0.1018 - accuracy: 0.9673

313/313 [==============================] - 1s 1ms/step - loss: 0.1056 - accuracy: 0.9663
Test accuracy (R2): 0.9663000106811523

Similar to scikit-learn an overall test accuracy of 97% is good.

Note, that the training accuracy and validation accuracy were both 97% too.

As before, let’s have a look at a confusion matrix for some quick error analysis.

Note, TensorFlow does have its own confusion_matrix method. Though I’ll use the scikit-learn one here again as it has a nice plot feature.

confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(
    test_labels, np.argmax(y_pred, axis=1)
)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()
_images/02_tools_143_0.png

This model did well for most digits, though struggled a bit with 5’s.

Save the model

A model includes:

  • Architecture

  • Weights (i.e., state)

  • Configuration (e.g., optimiser, loss, metrics)

You can save the whole or parts.

The different formats are:

  • TensorFlow SavedModel: single archive (recommended)

    • Save: model.save() or tf.keras.models.save_model()

    • Load: tf.keras.models.load_model()

    • Note, Keras H5 was the older format.

  • Architecture only (JSON)

    • Save: get_config() and tf.keras.models.model_to_json()

    • Load: from_config() and tf.keras.models.model_from_json()

  • Weights only

model.save(f"{path_models}/model_tf_mnist")
2022-04-25 10:34:12.645326: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /home/runner/work/intro_ml/intro_ml/docs/models/model_tf_mnist/assets
!ls {path_models}/model_tf_mnist
assets	keras_metadata.pb  saved_model.pb  variables

Load the model

Reload the saved model and evaluate it on the test data.

new_model = tf.keras.models.load_model(f"{path_models}/model_tf_mnist")
new_model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 784)               0         
                                                                 
 normalise (Rescaling)       (None, 784)               0         
                                                                 
 layer1 (Dense)              (None, 128)               100480    
                                                                 
 layer2 (Dense)              (None, 128)               16512     
                                                                 
 outputs (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
313/313 - 0s - loss: 0.1056 - accuracy: 0.9663 - 380ms/epoch - 1ms/step
Restored model, accuracy: 96.63%

PyTorch (Lightning)

Here, we’ll do a simple example using PyTorch Lightning.

This avoids creating some of the boilerplate code needed for pure PyTorch.

This will just include training for now (i.e., no validation or testing).

import os

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from pytorch_lightning.callbacks.progress import TQDMProgressBar
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import MNIST

Note

torch.nn.functional contains functions for neural networks, while torch.nn defines them as modules.

BATCH_SIZE = 32
PATH_DATASETS = f"{os.getcwd()}/data"

Prepare the data

train_dataloader = DataLoader(
    MNIST(PATH_DATASETS, train=True, download=True, transform=transforms.ToTensor()),
    batch_size=BATCH_SIZE,
)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/train-images-idx3-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/train-labels-idx1-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/runner/work/intro_ml/intro_ml/docs/data/MNIST/raw

Create the model

This include the loss, optimiser, and training steps.

pl.LightningModule is a nn.Module with more features.

For more information on how to convert a PyTorch model to a PyTorch Lightning model see:

class MNISTModel(pl.LightningModule):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten inputs
        x = self.layer1(x)  # pass inputs through hidden layer
        output = torch.relu(x)  # run activation function for layer
        return output

    def training_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)  # predicted y output
        loss = F.cross_entropy(y_hat, y)
        tensorboard_logs = {"train_loss": loss}
        return {"loss": loss, "log": tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)
mnist_model = MNISTModel()
print(mnist_model)
MNISTModel(
  (layer1): Linear(in_features=784, out_features=10, bias=True)
)

Create the trainer

Warning

The progress bar can be too fast for Colab / Kaggle. If developing in these platforms, be sure to slow the refresh rate by increasing the value in: callbacks=TQDMProgressBar(refresh_rate=20).

trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

Fit the model

if IN_COLAB:
    trainer.fit(mnist_model, train_dataloader)

We can see the loss reduce at the right of the progress bar.

You can change what is logged by editing the training_step method.

(Optional) Adding in validation and testing to the model creation

Note, DataLoaders are now incorporated into the model creation.

class MNISTModel(pl.LightningModule):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten x
        x = self.layer1(x)  # pass inputs through hidden layer
        output = torch.relu(x)  # run activation function for layer
        return output

    def training_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)  # predicted y output
        loss = F.cross_entropy(y_hat, y)
        tensorboard_logs = {"train_loss": loss}
        return {"loss": loss, "log": tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

    # -------------------------
    # same as above up to here
    # new stuff below

    def validation_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)
        val_loss = F.cross_entropy(y_hat, y)
        return {"val_loss": val_loss}

    def test_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)
        test_loss = F.cross_entropy(y_hat, y)
        return {"test_loss": test_loss}

    def validation_epoch_end(self, outputs):  # hook for validation
        average_val_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
        tensorboard_logs = {"val_loss": average_val_loss}
        return {"val_loss": average_val_loss, "log": tensorboard_logs}

    def test_epoch_end(self, outputs):  # hook for test
        average_test_loss = torch.stack([x["test_loss"] for x in outputs]).mean()
        logs = {"test_loss": average_test_loss}
        self.log_dict(logs)
        return {"test_loss": average_test_loss, "log": logs, "progress_bar": logs}

    # also added in the dataloaders below

    def train_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=True,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

    def val_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=True,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

    def test_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=False,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )
mnist_model = MNISTModel()
print(mnist_model)
MNISTModel(
  (layer1): Linear(in_features=784, out_features=10, bias=True)
)

trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

Note, the trainer only required the model as input, as the train_dataloader is part of the model now.

if IN_COLAB:
    trainer.fit(mnist_model)

Evaluation

Now, testing the model is simply done by running:

if IN_COLAB:
    trainer.test(mnist_model)

Save the model

The model is saved automatically to lightning_logs/.

It is incrementally split over versions e.g., version_0.

This then saves checkpoints per epoch, overwriting with the latest epoch.

To save a model in PyTorch (without Lightning):

state_dict = model.state_dict()  # extract the parameters
torch.save(state_dict, "my_model_weights.pth")  # save the parameters

Load the model

path_checkpoints = f"{os.getcwd()}/lightning_logs/version_0/checkpoints"
path_model = f"{path_checkpoints}/{os.listdir(path_checkpoints)[0]}"

reloaded_model = MNISTModel.load_from_checkpoint(path_model)

To load a model in PyTorch (without Lightning):

new_state_dict = torch.load("my_weights.pth")  # load the parameters
new_model = MNISTModel(..)  # instantiate a model
new_model.load_state_dict(new_state_dict)  # setup the new model with these parameters

Questions

Question 1

If you were looking to do classic machine learning, what tool is a good choice?

Question 2

If you were looking to do deep learning using a high-level API, what tools are a good choice?

Question 3

What are good reasons for choosing a high or low-level API?

Question 4

When creating a model, which API is simpler to use?

  • Sequential

  • Subclassing

Question 5

Put these general steps in order:

  • Compile the model

  • Preprocess the data

  • Test the model

  • Fit the model to the training data

  • Create the model

  • Download the data

Question 6

Which machine learning library is the best?

Key Points

Important

  • scikit-learn is great for classic machine learning problems.

  • TensorFlow and PyTorch are both great for deep learning problems.

  • Keras (high-level API for TensorFlow) and PyTorch Lightning (high-level API for PyTorch) have many high-level objects to help you create deep learning models.

  • You can use low-level APIs for any custom objects.

  • Explore your data before using it.

  • Check your model before fitting the training data to it.

  • Evaluate your model and analyse the errors it makes.

Further information

Good practices

  • Many decisions around model architecture are based on previous work, literature, and trial-and-error.

  • Debugging:

    • Test each part individually, before testing the whole.

    • Check the model summary and visualise the architecture.

    • Use debug modes:

      • Add run_eagerly=True with the call to fit() in Keras.

      • Use Trainer(fast_dev_run=True) in PyTorch Lightning.

    • Tips for Keras and PyTorch Lightning.

  • Offloading computations to a GPU may not be beneficial for small models.

  • Tips for optimising GPU performance from TensorFlow, NVIDIA.

Other options

There are many other tools for machine learning, including:

  • JAX

    • A library for GPU accelerated NumPy with automatic differentiation.

  • Flax

    • A neural network library and ecosystem for JAX that is designed for flexibility.

  • Haiku

    • Built on top of JAX to provide simple, composable abstractions for machine learning research.

  • XGBoost

    • Gradient boosting library.

  • Caffe

    • Deep learning framework.

  • Sonnet

    • High-level API for TensorFlow.

  • fastai

    • High-level API for PyTorch.