Skip to content

PyTorch Cheat Sheet

PyTorch stands out for its dynamic computation graph, intuitive syntax, and Pythonic integration, making it a preferred library for researchers and developers working on cutting-edge AI projects.

Great books to learn Pytorch:

Neural Networks torch.nn

In PyTorch, a neural network model is typically defined by subclassing the torch.nn class to specify layers, activation functions, and loss functions. Your model class should include two main components:

  • The __init__ method where you'll define the layers of your model.
  • The forward method where you'll specify the forward pass of the network.
import torch.nn as nn

Defining a Neural Network Model

In PyTorch, models are usually defined by subclassing nn.Module and defining the layers in the __init__ method. The data flow is specified in the forward method.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)  # Convolutional layer
        self.fc1 = nn.Linear(20 * 20 * 20, 10)  # Fully connected layer

    def forward(self, x):
        x = F.relu(self.conv1(x))  # Activation function
        x = x.view(-1, 20 * 20 * 20)
        x = self.fc1(x)
        return x

net = Net()

Essential Components

PyTorch offers a wide array of layers, functions, loss functions, and optimizers to build and train neural networks.

Common Layers

PyTorch offers a wide variety of layers to use in your models, including:

  • Convolutional Layers nn.Conv2d: Apply a convolution operation to your data, ideal for image processing tasks.
  • Pooling Layers nn.MaxPool2d: Reduce the spatial dimensions of the input, useful for reducing the computational load and overfitting.
  • Fully Connected Layers nn.Linear: Apply a linear transformation to the incoming data, often used in the final layers of a network.
  • Dropout Layers nn.Dropout: Randomly zeroes some of the elements of the input tensor with probability p during training, which helps prevent overfitting.

Recurrent Layers

nn.RNN

  • Overview: The basic RNN module that can process inputs with temporal dependencies, suitable for simpler sequences.
  • Usage:

    import torch.nn as nn
    rnn_layer = nn.RNN(input_size=10, hidden_size=20, num_layers=2)
    
    • input_size: The number of expected features in the input x.
    • hidden_size: The number of features in the hidden state h.
    • num_layers: Number of recurrent layers.

nn.LSTM

  • Overview: Long Short-Term Memory (LSTM) module, an extension of the basic RNN that can learn long-term dependencies. LSTMs are widely used due to their effectiveness in avoiding the vanishing gradient problem.
  • Usage:

    lstm_layer = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
    

nn.GRU

  • Overview: Gated Recurrent Unit (GRU) module, similar to LSTM but uses a simplified gating mechanism. GRUs are known for their efficiency and performance on par with LSTMs for many tasks.
  • Usage:

    gru_layer = nn.GRU(input_size=10, hidden_size=20, num_layers=2)
    

Building a Recurrent Neural Network

Constructing a recurrent neural network in PyTorch typically involves defining a model class that inherits from nn.Module and includes your chosen RNN layer. Here's a basic outline for creating an RNN model:

class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNNModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        # Initialize an RNN/LSTM/GRU layer
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        # You can switch the above line to nn.LSTM or nn.GRU depending on your needs
        # Add a fully connected layer for output
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Initialize hidden and cell states
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Forward propagate the RNN
        out, _ = self.rnn(x, h0)

        # Pass the output of the last time step to the classifier
        out = self.fc(out[:, -1, :])
        return out

In this example:

  • input_size: The number of expected features in the input x.
  • hidden_size: The number of features in the hidden state h.
  • num_layers: Number of recurrent layers.
  • num_classes: Number of classes for the output layer.

Training and Using RNNs

Training an RNN in PyTorch follows the same fundamental steps as training any other type of model in PyTorch, involving a training loop where you perform forward passes, calculate the loss, perform backpropagation, and update the model's weights.

Activation Functions

Activation functions are crucial for introducing non-linearities into the network, allowing it to learn complex patterns. Common activation functions include:

  • ReLU F.relu: Applies the rectified linear unit function element-wise.
  • Softmax F.softmax: Applies the Softmax function to an n-dimensional input Tensor.

Loss Functions

PyTorch provides several loss functions under the torch.nn module, which you can use depending on your specific problem, such as:

  • CrossEntropyLoss nn.CrossEntropyLoss: Used for multi-class classification problems, computes cross-entropy loss between input and target.
  • MSELoss nn.MSELoss: Used for regression tasks, creates a criterion that measures the mean squared error.

Optimizers

Optimizers are used to update the weights and biases of your network based on the gradients computed during backpropagation. Common optimizers include:

  • SGD torch.optim.SGD: Implements stochastic gradient descent.
  • Adam torch.optim.Adam: Implements the Adam algorithm.

Model Inputs and Data Preparation

Neural networks require properly prepared input data for optimal performance. This section covers how to format and preprocess your data for PyTorch models.

Input Data Format

  • PyTorch Tensors: PyTorch models operate on PyTorch tensors, not NumPy arrays or pandas DataFrames directly.
  • Converting from NumPy: Use torch.from_numpy(np_array) to convert NumPy arrays to PyTorch tensors.
  • Converting from DataFrames: Convert pandas DataFrames to NumPy first, then to PyTorch tensors.
import torch
import numpy as np
import pandas as pd

# From NumPy array
np_array = np.array([[1, 2], [3, 4]])
tensor = torch.from_numpy(np_array).float()  # Convert to float tensor

# From pandas DataFrame
df = pd.DataFrame({'feature1': [1, 2, 3], 'feature2': [4, 5, 6]})
tensor = torch.tensor(df.values).float()

Normalization and Standardization

Proper scaling of input data is crucial for neural network training:

  1. Standardization (zero mean, unit variance):

    # Using scikit-learn with PyTorch
    from sklearn.preprocessing import StandardScaler
    import numpy as np
    
    # Standardize NumPy array
    scaler = StandardScaler()
    X_np = df.values
    X_scaled = scaler.fit_transform(X_np)
    
    # Convert to PyTorch tensor
    X_tensor = torch.tensor(X_scaled, dtype=torch.float32)
    
    # PyTorch native approach (if data is already a tensor)
    def standardize(x):
        mean = x.mean(dim=0, keepdim=True)
        std = x.std(dim=0, keepdim=True)
        return (x - mean) / std
    
    X_standardized = standardize(X_tensor)
    
  2. Normalization (scaling to a range, typically [0,1]):

    # Using scikit-learn
    from sklearn.preprocessing import MinMaxScaler
    
    # Normalize NumPy array
    scaler = MinMaxScaler()
    X_normalized = scaler.fit_transform(X_np)
    X_tensor = torch.tensor(X_normalized, dtype=torch.float32)
    
    # PyTorch native approach
    def normalize(x):
        min_vals = x.min(dim=0, keepdim=True)[0]
        max_vals = x.max(dim=0, keepdim=True)[0]
        return (x - min_vals) / (max_vals - min_vals)
    
    X_normalized = normalize(X_tensor)
    
  3. Important Considerations:

  4. Always fit scalers on training data only, then apply to validation/test data
  5. For image data, normalization with mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] is common (ImageNet statistics)
  6. Save your preprocessing parameters for consistent application during inference

  7. Using PyTorch Transforms:

    from torchvision import transforms
    
    # For image data
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])
    

Training a Model

Training a model in PyTorch involves running a forward pass, computing the loss, performing a backward pass, and updating the model parameters. Unlike Keras, we have direct control over the epochs and forward/backward flow of model training.

A training loop for a regression model might look like this:

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):
        # Forward pass
        predicted = model(data)
        loss = criterion(scores, targets)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Training a Model: Step-by-Step

Training a neural network model in PyTorch is a multi-step process that involves several stages.

Here's a breakdown of each step within the training loop example:

  1. Initialize Optimizer and Loss Function:
    • Optimizer: Responsible for updating the model parameters using the gradients computed during backpropagation. Example: torch.optim.Adam.
    • Loss Function: Measures the difference between the output of the model and the actual target values. Example: nn.CrossEntropyLoss.
  2. Iterate Over Epochs:
    • An epoch represents a full pass through the entire training dataset.
  3. Iterate Over Batches:
    • Data is processed in batches, allowing for more efficient computation and the introduction of stochasticity in the training process.
  4. Forward Pass:
    • The model processes each batch of data to produce predictions.
  5. Compute Loss:
    • The loss function calculates how far the model's predictions are from the actual targets.
  6. Backward Pass:
    • Compute the gradient of the loss function with respect to each parameter of the model.
  7. Update Model Parameters:
    • The optimizer adjusts the parameters based on the computed gradients to minimize the loss function.

Step 1: Initialize the Optimizer and Loss Function

Before entering the training loop, you initialize the optimizer and the loss function, which are essential components for the training process.

  • Optimizer (torch.optim.Adam): This line creates an optimizer object, which is responsible for adjusting the model's parameters (weights and biases) based on the computed gradients. The learning rate (lr) controls how much we adjust the parameters by with respect to the loss gradient.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
  • Loss Function (nn.CrossEntropyLoss): Here, we define the criterion that measures the difference between the model's predictions and the actual labels. CrossEntropyLoss is commonly used for classification tasks.
criterion = nn.CrossEntropyLoss()

Step 2: Iterate Over Epochs

The outer loop iterates over each epoch. An epoch represents one complete pass through the entire training dataset.

for epoch in range(num_epochs):

Step 3: Iterate Over Batches

Within each epoch, the training data is divided into batches. This loop iterates over each batch of data. enumerate(train_loader) provides both the index of the batch (batch_idx) and the data and targets for the current batch (data, targets).

for batch_idx, (data, targets) in enumerate(train_loader):
  • batch_idx: The index of the current batch within the epoch.
  • (data, targets): A tuple where data contains the input features for the batch, and targets contains the corresponding labels.
  • enumerate(train_loader): Iterates over the training data loader, yielding the batch index and data (input features and targets).

Step 4: Forward Pass

For each batch, perform a forward pass through the model by passing the input data (data) through the model (model(data)). This step computes the predicted outputs (predicted) for the input data.

predicted = model(data)

Step 5: Compute Loss

Calculate the loss by comparing the model's predictions (predicted) with the actual labels (targets). The criterion object uses the specified loss function to compute this.

loss = criterion(predicted, targets)

The choice of the loss function should align with the type of task and the nature of the model's output.

  • For classification tasks, nn.CrossEntropyLoss is a common choice. It expects the model to output raw scores (logits) for each class, and it internally applies a log-softmax layer.
  • For regression tasks, loss functions like nn.MSELoss (Mean Squared Error Loss) are suitable. These functions typically expect the model's output to directly correspond to the target values.

Step 6: Backward Pass

Perform a backward pass through the network. This step involves computing the gradient of the loss function with respect to each parameter (weight and bias) in the model.

  • Clear existing gradients (optimizer.zero_grad()): Clearing gradients is crucial because it prevents the accumulation of gradients from multiple backward passes. Without clearing, gradients from previous batches would influence the updates made based on the current batch, leading to incorrect parameter updates. This clearing step ensures that each update is based solely on the latest batch, maintaining the integrity of the training process.
    • Why clear the gradients? Should I always do this?

      In PyTorch, gradients accumulate by default whenever .backward() is called on the loss tensor. This feature is particularly useful when you need to compute the gradients over several forward passes before updating the model parameters, which can be the case in more complex models or training regimes.

      However, in most standard training loops, we update the model parameters after each batch. If we don't clear the gradients at the beginning of each iteration, the gradients computed from the backward pass of the current batch will add up to the gradients computed from the previous batches, leading to incorrect updates to the model parameters. This is why we use optimizer.zero_grad() to reset the gradients to zero before computing the gradients for the current batch.

      Effect of Clearing Gradients:

      • Ensures that the model's parameter updates are based only on the most recent batch of data and not influenced by the data from previous batches.
      • Prevents the gradients from accumulating and growing excessively, which could lead to unstable training or exploding gradients.

      When Accumulating Gradients is Desired:

      There are scenarios where accumulating gradients across multiple forward passes is beneficial:

      • Gradient Accumulation: In cases where the available hardware cannot handle a large batch size due to memory constraints, gradients can be accumulated over multiple smaller batches. After several forward-backward passes, the accumulated gradients are used to update the model parameters, effectively simulating a larger batch size.
      • Multi-GPU Training: When training on multiple GPUs, it's common to divide a batch across GPUs. Each GPU computes the gradients for its subset of the batch, and these gradients are then accumulated (summed) across GPUs before updating the model parameters.
optimizer.zero_grad()
  • Compute gradients (loss.backward()): This function computes the gradient of the loss with respect to all parameters in the model that have requires_grad=True.
loss.backward()

Step 7: Update Model Parameters

Update the model parameters based on the gradients computed during the backward pass. The optimizer step (optimizer.step()) applies the update to the parameters.

optimizer.step()

Conclusion

This process repeats for each batch in every epoch, gradually improving the model's performance on the training dataset. By iterating through this loop, the model learns to make more accurate predictions by minimizing the loss function.

Aside: Understanding Batching with enumerate

In PyTorch, data is typically processed in batches using a DataLoader. The DataLoader creates an iterable over the dataset and automatically batches the data for you. When iterating over the DataLoader, enumerate is often used to keep track of both the batch index and the data within each batch.

Here's a breakdown of the batching step in the training loop:

for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):
        # Your training code here
  • enumerate(train_loader): This function iterates over the train_loader iterable, providing a count (batch_idx) and the values returned by the train_loader (a tuple containing data and targets for each batch).
    • batch_idx: A counter starting from 0 that indicates the index of the batch within the current epoch.
    • (data, targets): A tuple where data contains a batch of input features, and targets contains the corresponding labels for the batch. The size of data and targets is determined by the batch_size parameter specified when creating the DataLoader.

Batching Explained

  • Batching: The process of dividing the dataset into smaller, manageable groups called batches. This technique is essential for efficient training, especially with large datasets.
    • Advantages:
      • Memory Efficiency: Processing smaller batches requires less memory, making it feasible to train on large datasets.
      • Stochastic Training: Using batches introduces randomness in the optimization process, which can help the model escape local minima and generalize better.
    • Batch Size: A hyperparameter that defines the number of samples to work through before updating the internal model parameters. Choosing the right batch size is crucial for model performance and training efficiency.

In summary, enumerate with DataLoader in PyTorch simplifies the batching process, allowing for efficient and effective model training by iterating over batches of data and their corresponding indices.

Evaluating the Model

Model evaluation in PyTorch typically involves disabling gradient computation, running the model in evaluation mode, and calculating the performance metrics.

with torch.no_grad():
    net.eval()  # Set the model to evaluation mode
    output = net(input)
    # Calculate metrics such as accuracy 
        # (details below)

Step-by-Step Model Evaluation Process

Evaluating a neural network model in PyTorch involves assessing its performance on a dataset, typically the validation or test set. Here's a detailed explanation of each step in the model evaluation example provided:

  1. Disable Gradient Calculation: Use torch.no_grad() to improve performance and reduce memory usage during evaluation.
  2. Set Model to Evaluation Mode: net.eval() adjusts the behavior of certain layers (e.g., dropout, batch normalization) appropriate for inference.
  3. Forward Pass: Generate predictions using the model.
  4. Calculate Performance Metrics: Assess the model's performance using metrics like accuracy, F1 score, etc.

Step 1: Disable Gradient Computation

with torch.no_grad():
  • torch.no_grad(): This context manager disables gradient computation, making the code run faster and reducing memory usage. During evaluation, gradients are not needed since the model's parameters are not being updated.

Step 2: Set the Model to Evaluation Mode

net.eval()
  • net.eval(): Switches the model to evaluation mode. This is necessary because some layers, like dropout and batch normalization, behave differently during training and evaluation. For instance, dropout layers will not drop any units during evaluation, ensuring that the full network is used to make predictions.

Step 3: Make Predictions on the Input Data

output = net(input)
  • output = net(input): Passes the input data (input) through the model (net) to obtain the output predictions (output). In evaluation mode, the model uses its learned parameters to make predictions based on the input data.

Step 4: Calculate Performance Metrics

After obtaining the predictions, you would typically calculate performance metrics to evaluate the model's performance. Common metrics include accuracy, precision, recall, F1 score, etc. The specific metrics you choose depend on the nature of your problem (e.g., classification, regression).

## Calculate metrics such as accuracy

## Convert output probabilities to predicted class
_, predicted = torch.max(output.data, 1)
correct = (predicted == labels).sum().item()

## Calculate accuracy
accuracy = correct / labels.size(0)

## Calculate F1 Score 
## Ensure labels and predictions are on CPU if using GPU
f1 = sklearn.metrics.f1_score(labels.cpu(), predicted.cpu(), average='weighted')

Pre-trained Models

PyTorch, like Keras, offers a collection of pre-trained models that can be used for various tasks such as image classification, object detection, and more. These models have been trained on a large dataset like ImageNet and can be fine-tuned for specific tasks. Leveraging these pre-defined models can significantly accelerate the development process, especially in the domain of computer vision.

Torchvision Models

PyTorch provides these models through the torchvision.models submodule. Here's an overview of some of the most popular pre-defined models available in PyTorch:

ResNet

  • Overview: ResNet models, known for their "residual" connections which help in training very deep networks, are widely used for image classification tasks.
  • Usage:

    import torchvision.models as models
    resnet = models.resnet50(pretrained=True)
    
    • pretrained=True loads the model that has been pre-trained on the ImageNet dataset.

VGG

  • Overview: The VGG models, characterized by their simplicity (using only 3x3 convolutional layers), are another popular choice for image classification.
  • Usage:

    vgg = models.vgg16(pretrained=True)
    

Inception

  • Overview: Inception models, or GoogLeNet models, use inception blocks that allow the model to choose from multiple kernel sizes for each convolutional layer.
  • Usage:

    inception = models.inception_v3(pretrained=True)
    

Fine-tuning Pre-trained Models

When using these models for your specific tasks, you might want to fine-tune them. Fine-tuning involves a few steps:

  1. Load the Pre-trained Model: Start by loading the model with pre-trained weights.
  2. Modify the Output Layer: Adjust the final layer(s) to match the number of classes in your specific task.
  3. Freeze the Parameters: Optionally, you can freeze the parameters (weights) of earlier layers, so only the last few layers are trained.
  4. Train the Model: Train the model on your dataset, which adjusts the weights of the unfrozen layers to your specific task.

Example: Fine-tuning ResNet for a New Task

Here's a simple example of how you might fine-tune a ResNet model for a classification task with a different number of classes:

## Load a pre-trained ResNet model
resnet = models.resnet50(pretrained=True)

## Freeze all the parameters in the network
for param in resnet.parameters():
    param.requires_grad = False

## Replace the final layer with a new one (adjust for your number of classes)
num_classes = 100  # Example: 100 classes
resnet.fc = nn.Linear(resnet.fc.in_features, num_classes)

## Now you can train the model on your dataset

Leveraging Pre-trained Models

These pre-trained models are not only a great way to achieve high performance with less data but also serve as a powerful starting point for experimentation and learning. By understanding the architectures and training processes of these models, you can gain insights into deep learning best practices and apply these principles to your projects.

Remember, while pre-trained models offer a significant head start, fine-tuning and customization are key to adapting these models to your specific needs and data.