Skip to content

5. ResNet Model

1. Introduction ResNet

ResNet (Residual Network) is a type of deep learning model introduced by He et al. in the 2015 paper “Deep Residual Learning for Image Recognition”. ResNet was a breakthrough in training deep neural networks by addressing the problem of vanishing and exploding gradients, which prevented effective training of very deep networks. With ResNet, models could be trained with hundreds or even thousands of layers, surpassing the limitations of traditional deep learning architectures.

The key innovation of ResNet is the introduction of residual connections, which allow gradients to flow more easily during backpropagation, solving the degradation problem that arises in very deep networks. ResNet has been widely adopted in computer vision tasks like image classification, object detection, and segmentation, and it also serves as a backbone for other models such as Faster R-CNN and YOLO.

2. Key Concepts of ResNet

  1. Degradation Problem:

    • As the depth of a neural network increases, accuracy saturates and then degrades, even if the network has more parameters. This phenomenon occurs because deep networks have difficulty propagating gradients backward during training, leading to vanishing or exploding gradients.
  2. Residual Learning:

    • ResNet solves this problem by introducing shortcut (skip) connections that skip one or more layers. Instead of learning a direct mapping from the input to the output , ResNet forces the network to learn a residual function , and the original mapping becomes .

    • The motivation behind this is that it’s easier for the network to learn the residual function , which represents small adjustments to the input , rather than learning the direct function from scratch.

    • Mathematically, the residual block is represented as: where is the input, is the residual function (a small modification learned by the network), and the addition of is the skip connection.

3. Components of ResNet

  1. Residual Block:

    • The basic building block of ResNet is the residual block, which consists of two or three convolutional layers. The key difference from standard convolutional layers is the addition of the skip connection that bypasses these layers and adds the input directly to the output of the block.

    • A typical residual block looks like this:

      • Two convolutional layers, each followed by batch normalization (BN) and a ReLU activation.
      • A skip connection that adds the input directly to the output of the second convolutional layer.

    The mathematical formulation for a residual block with two convolutional layers is: where:

    • is the input,
    • Conv2D represents a 2D convolution operation,
    • BN represents batch normalization.
  2. Bottleneck Block:

    • In deeper versions of ResNet (e.g., ResNet-50, ResNet-101), the architecture uses a bottleneck block to reduce the computational complexity of the network while maintaining model accuracy. A bottleneck block consists of three convolutional layers instead of two:

      1. A convolution (used for reducing the dimensionality).
      2. A convolution (used for feature extraction).
      3. A convolution (used for restoring the dimensionality).
    • This arrangement reduces the number of parameters while preserving the model’s expressiveness. The residual connection still skips over these three layers:

  3. Identity Mapping:

    • The identity mapping refers to the simple addition of the input to the output of the residual block. This direct path allows the gradient to flow backward through the network more easily, solving the vanishing gradient problem.
  4. Downsampling with Projection Shortcut:

    • In cases where the input and output dimensions of the residual block differ (for example, when downsampling occurs), a projection shortcut is used. This shortcut employs a convolution to match the dimensions of the input and output. where is the projection matrix that adjusts the dimensions of the input.

4. ResNet Architectures

ResNet comes in various depths, with the most commonly used versions being ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The numbers indicate the total number of layers (including convolutional, pooling, and fully connected layers).

  1. ResNet-18 and ResNet-34:

    • These are shallower versions, and they use simple residual blocks with two convolutions in each block.
  2. ResNet-50, ResNet-101, and ResNet-152:

    • These deeper models use the bottleneck block architecture. For instance, ResNet-50 has 50 layers, with 16 bottleneck blocks, while ResNet-101 and ResNet-152 have even more blocks.

5. Mathematical Operations Behind ResNet

  1. Convolution:

    • ResNet uses standard 2D convolution layers to process image data. A 2D convolution operation is defined as: where:
    • is the input image,
    • is the convolution kernel,
    • and are the dimensions of the kernel.
  2. Batch Normalization:

    • Batch normalization (BN) is applied after each convolution to normalize the activations and improve training stability. BN normalizes the input by adjusting the mean and variance for each mini-batch: where:
    • is the activation,
    • and are the mean and variance of the batch,
    • is a small constant for numerical stability.
  3. ReLU Activation:

    • After each convolution and batch normalization, a ReLU (Rectified Linear Unit) activation function is applied. ReLU introduces non-linearity, which allows the model to learn complex patterns:
  4. Residual Function:

    • The key component of ResNet is the residual function , which typically consists of multiple convolutions. Instead of learning the mapping , ResNet models learn the residual , where:
    • The shortcut connection ensures that if the residual function is zero, the network still passes the input unchanged, simplifying the learning process.

6. Advantages of ResNet

  1. Deep Networks without Degradation:

    • ResNet allows training of very deep networks without suffering from the degradation problem. With skip connections, gradients can flow more freely, and deeper networks can be trained effectively.
  2. Efficient Gradient Flow:

    • Residual connections ensure that gradients during backpropagation can pass through the network efficiently, avoiding the vanishing gradient problem. This is especially important when training networks with hundreds or thousands of layers.
  3. Modularity and Flexibility:

    • The residual block can be used as a building block for other tasks and models. For example, ResNet serves as the backbone for many modern object detection models (e.g., Faster R-CNN, Mask R-CNN).

7. Applications of ResNet

  1. Image Classification:

    • ResNet architectures have been widely used for image classification tasks, achieving state-of-the-art performance on datasets like ImageNet.
  2. Object Detection:

    • ResNet is often used as the backbone for object detection models, where its feature extraction capabilities are leveraged to detect and localize objects within images.
  3. Semantic Segmentation:

    • In tasks like semantic segmentation, where every pixel in an image is classified, ResNet is used to extract multi-scale features, which are then refined for pixel-level classification.
  4. Transfer Learning