Skip to content

1.6 Feature Detectors

6.1 SIFT

SIFT, which stands for Scale-Invariant Feature Transform, is a computer vision algorithm used to detect and describe local features in images. It was developed by David Lowe in 1999 and is widely used in tasks such as object recognition, image stitching, and 3D reconstruction. SIFT is particularly valued for its robustness to changes in scale, rotation, and illumination.

6.1 Introduction

1.1 Key Components of SIFT

  1. Scale-Space Extrema Detection*

    • The first step involves detecting key points in the image at multiple scales. This is done by constructing a scale-space using a Gaussian function to blur the image at different levels.
    • The difference of Gaussians (DoG) is then computed between successive blurred images to identify potential key points, called extrema, which are points where the intensity changes sharply.
  2. Keypoint Localization**:

    • On key points are detected, SIFT refines their positions by fitting a 3D quadratic function to the local sample points. This process eliminates low-contrast key points and edge responses that are less stable, leaving only strong and stable key points.
  3. Orientation Assignment**:

    • Ea key point is assigned a consistent orientation based on the local image gradient directions. This step makes the feature descriptor invariant to image rotation.
  4. Keypoint Descriptor**:

    • descriptor is generated for each key point. This descriptor is a vector representing the local image gradient information around the key point. The image region is divided into smaller sub-regions, and histograms of gradient directions are computed for each sub-region. These histograms are then concatenated to form a 128-dimensional vector that describes the key point’s appearance.
  5. Feature Matching**:

    • On descriptors are generated for key points in different images, they can be matched to find corresponding points between images. This matching process is usually done using the Euclidean distance between descriptors.

1.2 Applications of SIFT

  • Object Recognition: SIFT can recognize objects even if they are partially occluded or viewed from different angles.
  • Image Stitching: It is used to align and stitch images together, particularly in panoramic photography.
  • 3D Reconstruction: SIFT helps in matching points across multiple images taken from different viewpoints, enabling the reconstruction of 3D structures.

1.3 Advantages

  • Scale and Rotation Invariance: SIFT features are invariant to changes in scale and rotation, making it highly reliable.
  • Robustness: It performs well under various lighting conditions and against image noise and distortion.

1.4 Limitations

  • Computationally Expensive: The process of detecting key points and computing descriptors is resource-intensive.
  • Patented: SIFT was patented, which limited its usage in open-source projects for some time, leading to the development of alternative algorithms like SURF and ORB.

SIFT remains a foundational algorithm in computer vision due to its robustness and effectiveness in feature detection and matching.

6.2 Detecting Keypoints

Detecting keypoints in the SIFT algorithm involves identifying distinctive points in an image that are invariant to scale and rotation. This process is broken down into several steps:

1. Constructing the Scale-Space

  • The scale-space is created by repeatedly smoothing the image with a Gaussian filter at different scales. For an image ( I(x, y) ), the scale-space ( L(x, y, \sigma) ) is defined as: [ L(x, y, \sigma) = G(x, y, \sigma) * I(x, y) ] where ( G(x, y, \sigma) ) is the Gaussian function: [ G(x, y, \sigma) = \frac{1}{2\pi\sigma^2} e^{-\frac{x^2 + y^2}{2\sigma^2}} ] and ( * ) denotes convolution.

  • The Gaussian filter is applied with varying standard deviations ( \sigma ) to produce blurred images at different scales.

2. Difference of Gaussians (DoG)

  • To detect keypoints efficiently, SIFT uses the Difference of Gaussians (DoG) method. The DoG is computed by subtracting two successive blurred images: [ D(x, y, \sigma) = L(x, y, k\sigma) - L(x, y, \sigma) ] where ( k ) is a constant multiplicative factor.

  • The DoG highlights areas in the image where there is a significant change in intensity, which often corresponds to edges or corners.

3. Finding Scale-Space Extrema

  • The keypoints are identified as local extrema in the DoG images across both space and scale. For each pixel in the DoG images, it is compared to its 26 neighbors: 8 in the same scale (3x3 region), 9 in the scale above, and 9 in the scale below.

  • If a pixel is either greater than all of its neighbors (a maximum) or less than all of its neighbors (a minimum), it is considered a candidate keypoint.

3. Keypoint Localization

  • Once candidate keypoints are detected, they are refined to ensure stability. This involves fitting a 3D quadratic function to the surrounding points to determine the exact position, scale, and contrast of the keypoint.

  • Low-contrast keypoints (which are sensitive to noise) and edge-like points (which are less stable) are discarded. This is done by analyzing the Hessian matrix at the keypoint and rejecting points where the ratio of principal curvatures is high (indicating an edge).

4. Orientation Assignment

  • For each keypoint, an orientation is assigned based on the direction of the image gradients around the keypoint. This is done by creating a histogram of gradient directions within a region around the keypoint, weighted by gradient magnitude and Gaussian distance from the keypoint.

  • The highest peak in the histogram determines the dominant orientation, which makes the keypoint descriptor rotation invariant.

These steps allow SIFT to detect keypoints that are distinctive, stable, and invariant to scale and rotation, making them suitable for tasks such as image matching and object recognition.