1.7 Keypoint Detectors
1. SIFT
In this exercise, we will delve into the properties of the Scale-Invariant Feature Transform (SIFT), particularly its invariance characteristics.
Steps:
-
Keypoint Detection on Checkerboard Image:
- Use the SIFT algorithm to detect keypoints on a natural image.
- Count the number of keypoints identified.
- Visualize these keypoints using the
drawKeypoints
method. - Observation: Pay attention to the locations of the keypoints. How do they compare to the Harris corners from Practical 3? What can you infer about their main orientation?
-
SIFT Descriptors:
- Extract SIFT descriptors for the detected keypoints.
- Analyze the number of descriptors and the dimensionality of each.
- Compare your findings with your lecture notes on SIFT. Are they consistent?
- Convert the descriptors to an intensity image and inspect it.
- Observation: Look for any discernible patterns in the intensity image. What do the descriptors convey?
-
Invariance Test with Rotated and Scaled Image:
- Rotate (10 deg) and slightly scale (1.2) the input image.
- Repeat the SIFT keypoint detection and descriptor extraction on this transformed image.
- Convert the new set of descriptors to an intensity image.
- Observation: Compare the intensity images of the original and transformed checkerboard. Can you validate the scale invariance of the SIFT descriptors?
- Choose a few keypoint pairs from both images and delve deeper into their descriptors. What similarities or differences can you spot?
- Extra: perform correspondence between the original image and its rotated and scaled version. Draw a line between any two matched keypoints in the two images.
Objective:
Through this exercise, you’ll gain hands-on experience with the SIFT algorithm, understanding its robustness against transformations and its ability to capture distinctive features in images. By comparing it with other methods like Harris corners, you’ll appreciate the nuances and ---strengths of each approach.
SIFT (Scale-Invariant Feature Transform)
SIFT is a method in computer vision to detect and describe local features in images. The algorithm provides key advantages when it comes to scale, rotation, and translation invariances.
Keypoint Detection
Scale-space Extrema Detection
- The image is progressively blurred using Gaussian filters, creating a series of scaled images.
- The Difference of Gaussians (DoG) is found between successive Gaussian blurred images.
- Extrema (maxima and minima) in the DoG images are potential keypoints.
Keypoint Localization
- Refines keypoints to eliminate less stable ones.
- Uses a method similar to the Harris corner detection to discard keypoints that have low contrast or lie along an edge.
Orientation Assignment
- Each keypoint is given one or more orientations based on local image gradient directions.
- This ensures the keypoint descriptor is rotation invariant.
SIFT Descriptor
Descriptor Representation
- For each keypoint, a descriptor is computed.
- A 16x16 neighborhood around the keypoint is considered, divided into 4x4 sub-blocks.
- An 8-bin orientation histogram is computed for each sub-block, resulting in a 128-bin descriptor for each keypoint.
Descriptor Normalization
- The descriptor is normalized to ensure invariance to illumination changes.
Summary
SIFT is powerful for detecting and describing local features in images. It’s invariant to image scale, rotation, and partially invariant to affine transformations and illumination changes. This makes it suitable for tasks like object recognition, panorama stitching, and 3D scene reconstruction.
SIFT Feature Extraction on Checkerboard Image
This code demonstrates the process of detecting and extracting SIFT (Scale-Invariant Feature Transform) keypoints and descriptors from a checkerboard image and its rotated and scaled version.
Step-by-Step Explanation:
-
Loading Image Files:
filenames = glob.glob(os.path.join(path, '*.png'))filename = filenames[0]- The code first fetches all the image filenames with
.png
extension from the specified directory. - It then selects the first image from this list for further processing.
- The code first fetches all the image filenames with
-
Reading the Image and Initializing SIFT:
img = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)sift = cv2.SIFT_create()- The selected image is read in grayscale mode.
- A SIFT detector object is initialized.
-
Detecting SIFT Keypoints:
keypoints = sift.detect(img, None)- The SIFT keypoints are detected for the grayscale image.
-
Extracting SIFT Descriptors:
keypoints, descriptors = sift.compute(img, keypoints)plt.figure(figsize=(5, 5))plt.imshow(descriptors, cmap='gray')plt.title('SIFT Descriptors')plt.axis('off')plt.show()- SIFT descriptors are computed for the detected keypoints.
- The descriptors are visualized as an intensity image.
-
Rotating and Scaling the Image:
rows, cols = img.shapeM = cv2.getRotationMatrix2D((cols/2, rows/2), 2, 1.2)rotated_scaled_img = cv2.warpAffine(img, M, (cols, rows))- The original image is rotated by 2 degrees and scaled by a factor of 1.2.
- The transformed image is stored in
rotated_scaled_img
.
OpenCV’s SIFT Function Explanation:
Initialization:
To initialize a SIFT detector object, use the following code:
sift = cv2.SIFT_create()
SIFT stands for Scale-Invariant Feature Transform. It’s an algorithm in computer vision to detect and describe local features in images.
Key Point Detection:
To detect the SIFT keypoints of an image, use the following code:
keypoints = sift.detect(img, None)
Parameters:
img
: The input image where keypoints are to be detected.None
: Mask of the image. It’s an optional parameter. If provided, the function will look for keypoints only in the specified region.
Descriptor Extraction:
To compute the descriptors from the detected keypoints, use the following code:
keypoints, descriptors = sift.compute(img, keypoints)
Parameters:
img
: The input image.keypoints
: The detected keypoints for which descriptors are to be computed.
Return:
keypoints
: List of keypoints.descriptors
: The SIFT descriptors of the keypoints. Each keypoint is represented by a vector of 128 values.
Explanation of cv2.drawKeypoints()
The cv2.drawKeypoints()
function is a utility provided by OpenCV to visualize the keypoints detected in an image.
Parameters:
-
input_image:
- Description: The original image on which keypoints were detected.
- Type: 2D or 3D array (typically a grayscale or color image).
-
keypoints:
- Description: A list of detected keypoints on the
input_image
. These keypoints encapsulate information about the location, scale, orientation, and other characteristics of local features in the image. - Type: List of
cv2.KeyPoint
objects. These objects are typically obtained from feature detection methods like SIFT, SURF, etc.
- Description: A list of detected keypoints on the
-
output_image (optional):
- Description: Image on which the keypoints will be drawn. If not provided, a new image is created to draw the keypoints. In most scenarios, this is the same as the
input_image
. - Type: 2D or 3D array.
- Description: Image on which the keypoints will be drawn. If not provided, a new image is created to draw the keypoints. In most scenarios, this is the same as the
-
flags (optional):
- Description: Determines the drawing characteristics of the keypoints.
- Type: Integer or combination of flag values.
- Notable Flag:
cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
: Ensures that the size of the keypoint is visualized along with its orientation, providing a richer representatin of the keypoint.
Output:
- img_keypoints:
- Description: Image with the keypoints drawn on it.
- Type: 2D or 3D array, sthe
img_keypoints
variable.
Observations
1. SIFT Keypoints on Original Image:
When we detect SIFT keypoints on the original checkerboard image, we observe that the keypoints are not just located at the corners of the squares (as we might expect with Harris corners). Instead, SIFT keypoints are distributed across the image, capturing more intricate details and patterns.
Why is this the case?
SIFT keypoints are scale-invariant and rotation-invariant. This means that they are designed to capture features that remain consistent across different scales and orientations. The keypoints are detected based on the difference of Gaussian functions applied at different scales, allowing them to capture features at various levels of detail.
2. SIFT Descriptors:
The SIFT descriptors provide a unique fingerprint for each keypoint, capturing the local image gradient information around the keypoint. When visualized as an intensity image, the descriptors might not show a clear pattern to the human eye, but they contain rich information about the local image gradients.
What can we infer from the descriptors?
The intensity image of the descriptors might seem like random patterns, but these patterns encode the gradient orientations in the keypoint’s neighborhood. This makes the descriptors robust to transformations like rotation and scaling.
3. SIFT Keypoints on Rotated and Scaled Image:
Upon rotating and scaling the checkerboard image, and then detecting SIFT keypoints, we observe that some of the keypoints are still consistently detected in similar regions as in the original image. This demonstrates the scale and rotation invariance of SIFT keypoints.
4. SIFT Descriptors on Rotated and Scaled Image:
When we extract the SIFT descriptors for the keypoints detected on the rotated and scaled image and visualize them as an intensity image, we notice that the patterns are quite similar to the descriptors of the original image. This is because the descriptors capture the local gradient information, which remains consistent even after transformations.
Scale Invariance Verification:
By comparing the SIFT descriptors of the original image and the rotated and scaled image, we can verify the scale invariance property of SIFT. Even though the image underwent transformations, the descriptors remain consistent, proving the robustness of SIFT features.
Note
The performance of SIFT might not be optimal on chessboard images. This is primarily because chessboard patterns lack the intricate textures and features that SIFT excels at detecting. For more pronounced results, consider applying SIFT on images with richer details, such as car license plates.
Binary Shape Analyais
https://pyimagesearch.com/2021/02/22/opencv-connected-component-labeling-and-analysis/
In this exercise, you’ll be working with binary shape analysis to extract blob features from a gray-scale input image. The main goal is to separate individual characters from the image and then extract several binary features from them.
Steps:
-
Image Thresholding:
- Convert the input gray-scale image into a binary image.
- Use Otsu’s thresholding method (available in OpenCV) to achieve this.
- Examine the resulting binary image to ensure the thresholding is effective.
-
Connected Component Labeling (CCL):
- Implement a CCL algorithm to separate the blobs, which in this context refers to the characters in the image.
- Refer to lecture notes or documentation for guidance on implementing this algorithm.
-
Blob Extraction:
- Apply the CCL algorithm to the binary image.
- For each detected blob, identify the tightest bounding box.
- Extract the character within each bounding box.
- Verify the accuracy of the bounding boxes by either:
- Saving individual characters as image files, or
- Drawing the bounding boxes on the original image.
-
Feature Computation: For each extracted character, compute the following features:
- Area: Total number of foreground pixels.
- Height: Height of the bounding box.
- Width: Width of the bounding box.
- Fraction of Foreground Pixels: Calculated as
- Distribution in X-direction: Analyze the spread of foreground pixels horizontally.
- Distribution in Y-direction: Analyze the spread of foreground pixels vertically.
-
Feature Analysis:
- Compare the features obtained for different pairs of characters.
- Identify which features help in distinguishing different characters.
- Determine which features consistently describe characters that appear the same.
By the end of this exercise, you should be able to effectively separate characters from an image and analyze their binary features to understand their distinct characteristics.
Observations from the Binary Shape Analysis Task
1. Image Thresholding:
After loading the image, we first converted it to grayscale. This simplifies the image and helps in thresholding. We applied three types of thresholding:
- Basic Binary Thresholding: Pixels with intensity above 127 are set to 255 (white), and those below are set to 0 (black).
- Otsu’s Thresholding: This method automatically calculates an optimal threshold value based on the image histogram.
- Gaussian Blur + Otsu’s Thresholding: Before applying Otsu’s thresholding, we smoothed the image using Gaussian blur. This can help in removing noise and improving the thresholding result.
From the displayed images, we can observe that the combination of Gaussian blur and Otsu’s thresholding provides a cleaner binary image, especially if the original image has noise.
2. Connected Component Analysis:
After thresholding, we performed connected component labeling to identify individual blobs or regions in the binary image. Each unique blob is assigned a unique label.
- The number of labels gives us the number of unique regions detected, including the background.
- The maximum label value provides an idea of the labeling range.
The colored components image visually represents each unique region with a different color. This helps in understanding the separation of different blobs and verifying the accuracy of the connected component analysis.
3. Blob Statistics:
The connectedComponentsWithStats
function provides statistics for each detected blob:
- Leftmost coordinate: The x-coordinate of the top-left corner of the bounding box.
- Topmost coordinate: The y-coordinate of the top-left corner of the bounding box.
- Width: Width of the bounding box.
- Height: Height of the bounding box.
- Area: Total number of pixels in the blob.
From the displayed statistics, we can analyze the size and position of each blob. This information can be crucial for further processing, such as feature extraction or classification tasks.
Before diving into the main steps, let’s understand the utility functions that will be used throughout the process.
Function: display_image(img, title="")
This function displays an image using the matplotlib
library.
- Parameters:
img
: The image to be displayed.title
(optional): A title for the image display.
- Functionality:
- The function creates a figure with a specified size.
- It then displays the image in grayscale format.
- The title is set, and the axis is turned off for better visualization.
Function: imshow_components(labels)
This function visualizes the connected components (or blobs) in an image with different colors.
- Parameters:
labels
: The labeled image obtained from the connected component analysis.
- Functionality:
- Maps each label to a unique hue.
- Merges it with blank channels to create an HSV image.
- Converts the HSV image to BGR format for visualization.
- Sets the background label (usually 0) t
Function: extract_features(labels_im, stats)
This function extracts specific features from each detected blob or character in the image.
- Parameters:
labels_im
: The labeled image from the connected component analysis.stats
: Statistics of each blob, usually obtained alongside the labeled image.
- Functionality:
- For each blob, the function computes:
Area
: Total number of foreground pixels.Height
: Height of the blob’s bounding box.Width
: Width of the blob’s bounding box.Fraction of Foreground Pixels
: Ratio of the area to the bounding box’s area.X Distribution
: Distribution of foreground pixels along the x-axis.Y Distribution
: Distribution of foreground pixels along the y-axis.
- The function returns a list of dictionaries, where each dictionary contains the features for a specific blob. o black.
- For each blob, the function computes:
Image Loading and Preprocessing
In this section, we load an image and convert it to grayscale, and apply different thresholding techniques to binarize the image.
Step-by-Step Explanation:
-
Loading Image Files: We first fetch all the image filenames with
.png
extension from the specified directory and then select one of the images for further processing.path = "images" # Replace with your pathfilenames = glob.glob(os.path.join(path, '*.png'))filename = filenames[2]img = cv2.imread(filename, cv2.IMREAD_COLOR) -
Displaying the Original Image: We use the
display_image
function to visualize the original image.display_image(img, "Original Image") -
Grayscale Conversion: The color image is converted to grayscale. This is done to simplify the image and to prepare it for thresholding.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) -
Thresholding: Thresholding is a technique to segment an image by setting a pixel to a foreground value if it’s above a certain threshold and to a background value if it’s below that threshold.
-
Basic Binary Thresholding: Pixels with intensity above 127 are set to 255 (white), and those below are set to 0 (black).
_, th1 = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)display_image(th1, "Basic Binary Thresholding") -
Otsu’s Thresholding: Otsu’s method calculates an “optimal” threshold by maximizing the variance between two classes of pixels (foreground and background). It’s more adaptive than basic thresholding.
_, th2 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)display_image(th2, "Otsu's Thresholding") -
Gaussian Blur + Otsu’s Thresholding: Before applying Otsu’s thresholding, we smooth the image using a Gaussian blur. This helps in removing noise and can lead to better thresholding results.
blur = cv2.GaussianBlur(gray, (5, 5), 0)_, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)display_image(th3, "Gaussian Blur + Otsu's Thresholding")
-
By the end of these steps, we have three binarized versions of the original image using different thresholding techniques. This allows us to compare and choose the best method for further processing.
Connected Component Analysis and Feature Extraction
In this section, we perform connected component analysis on the thresholded image to identify and label individual blobs (connected components). After labeling, we will extract specific features for each blob and compare them.
Step-by-Step Explanation:
-
Inverting the Image: The connected component function in OpenCV treats white as the foreground and black as the background. Since our image has black characters on a white background, we invert the image colors.
th = cv2.bitwise_not(th3) -
Connected Component Analysis: We perform connected component labeling on the inverted image. This function labels each connected component with a unique label.
num_labels, labels_im, stats, centroids = cv2.connectedComponentsWithStats(th, connectivity=4) -
Visualizing the Components: Using the
imshow_components
function, we color each connected component differently to visualize them distinctly.colored_components_img = imshow_components(labels_im)display_image(colored_components_img, "Colored Components") -
Extracting Features for Each Blob: For each labeled component (blob), we extract specific features like area, height, width, and distributions using the
extract_features
function.blob_features = extract_features(labels_im, stats) -
Comparing Features of Blobs: As a demonstration, we compare the features of the first two blobs.
blob1_features = blob_features[1]blob2_features = blob_features[2]-
Displaying features for the first blob:
print("Features for Blob 1:")for key, value in blob1_features.items():if key not in ["X Distribution", "Y Distribution"]:print(f"{key}: {value}") -
Displaying features for the second blob:
print("\nFeatures for Blob 2:")for key, value in blob2_features.items():if key not in ["X Distribution", "Y Distribution"]:print(f"{key}: {value}")
-
-
Demonstrating Feature Comparison: As an example, we demonstrate how to compare the area of the first two blobs.
difference_in_area = blob1_features["Area"] - blob2_features["Area"]print(f"\nDifference in Area between Blob 1 and Blob 2: {difference_in_area}")
By the end of these steps, we have labeled each connected component in the image, extracted features for each blob, and demonstrated how to compare these features. This process can be useful in various image processing tasks, such as character recognition, where distinguishing between different characters based on their features is essential.
cv2.connectedComponentsWithStats
Explanation
Input:
- th: This is the binary image on which connected component labeling is performed. In our case, it’s the inverted thresholded image.
- connectivity: This parameter determines how a pixel is connected to its neighbors. A value of
4
means a pixel is connected to its top, bottom, left, and right neighbors. A value of8
would also include diagonal neighbors.
Outputs:
- num_labels: This is the total number of unique labels assigned. It includes the background as one label.
- labels_im: This is an image of the same size as the input where each pixel’s value is its label. Pixels belonging to the same connected component have the same label.
- stats: This is a matrix where each row corresponds to a label and contains statistics related to that label. The columns represent:
- The x-coordinate of the top-left point of the bounding box.
- The y-coordinate of the top-left point of the bounding box.
- The width of the bounding box.
- The height of the bounding box.
- The total area (in pixels) of the connected component.
- centroids: This is a matrix where each row corresponds to a label, and the columns represent the x and y coordinates of the centroid of the connurther processing.
Exercise 3 - Histogram Feature Extraction
Objective:
Develop a program to compute the histogram of a given input gray-scale image patch. Utilize this program to analyze the characters segmented in Exercise 2
Steps:
-
Histogram Computation:
- Write a function or program that calculates the histogram of an input gray-scale image patch.
- Decide on the number of bins for the histogram. This choice will affect the resolution and the details captured by the histogram.
-
Application on Exercise 2:
- Revisit the results from Exercise 2 where individual characters were segmented using bounding boxes.
- For each segmented character, compute its histogram using the program developed in the first step.
-
Analysis:
- Compare the histograms of different characters. Observe the differences and similarities.
- Compare the histograms of characters that are identical. Note the variations, if any, and the consistencies.
- Discuss the effectiveness of the histogram as a feature for character differentiation.
-
Resolution Dependency:
- Analyze how the histogram feature’s effectiveness changes with the resolution (i.e., the number of bins).
- Does increasing the number of bins provide more discriminative power, or does it introduce noise? Conversely, does reducing the number of bins oversiplify the feature?
Deliverables:
- A program or function for histogram computation.
- Histograms of segmented characters from Exercise 2.
- Analysis and comments on the utility of the histogram feature for character differentiation and its dependency on resolution.
Utility Functions Explanation
Function: compute_histogram(image, bins=256)
This function calculates the histogram of a given grayscale image.
- Parameters:
image
: The input grayscale image for which the histogram is to be computed.bins
(optional): The number of bins for the histogram. Default is set to 256, which represents each pixel intensity value from 0 to 255.
- Functionality:
- The function uses OpenCV’s
calcHist
method to compute the histogram of the input image. - It returns the histogram as a list of pixel intensities.
- The function uses OpenCV’s
Function: display_histogram(hist, title="Histogram")
This function visualizes the computed histogram using the matplotlib
library.
- Parameters:
hist
: The histogram values that need to be plotted.title
(optional): The title for the histogram plot. Default is set to “Histogram”.
- Functionality:
- The function creates a figure with a specified size.
- It then plots the histogram values with pixel values on the x-axis and their frequencies on the y-axis.
- A grid is added for better visualization.
Function: display_image(img, title="")
This function displays a grayscale image using the matplotlib
library.
- Parameters:
img
: The grayscale image that needs to be displayed.title
(optional): A title for the image display.
- Functionality:
- The function creates a figure with a specified size.
- It then displays the image in grayscale format.
- The title is set, and the axis is turned on for better visualization.
Histogram Computation and Visualization for Each Character
In the provided code, we’re iterating through each detected character (or blob) in the image, extracting its bounding box, computing its histogram, and then visualizing both the character with its bounding box and its histogram.
Loop: for k in range(1, num_labels)
This loop iterates over each detected character. We start from 1 because the label 0 is reserved for the background.
-
Inside the Loop:
-
Bounding Box Extraction:
x, y, w, h, area = stats[k]
: For each character, we extract its bounding box’s top-left coordinates(x, y)
, its widthw
, heighth
, and the total areaarea
from thestats
array.character_patch = gray[y:y+h, x:x+w]
: Using the bounding box coordinates and dimensions, we extract the character’s region from the grayscale image.
-
Histogram Computation:
hist = compute_histogram(character_patch, bins=32)
: We compute the histogram of the extracted character patch using 256 bins. The number of bins can be adjusted based on the desired resolution.
-
Histogram Visualization:
display_histogram(hist, title=f"Histogram for Character {k}")
: We visualize the computed histogram using thedisplay_histogram
function.
-
Character Visualization with Bounding Box:
output = img.copy()
: We create a copy of the original image to draw the bounding box.cv2.rectangle(output, (x, y), (x + w, y + h), (0, 255, 0), 3)
: We draw a green bounding box around the detected character.display_image(output, title=f"Character {k} with Bounding Box")
: We display the character with its bounding box using thedisplay_image
function.
-
Post Loop:
After processing all characters, you can compare the histograms of different characters using various methods like correlation, chi-square, etc. This step is essential to understand the similarity or difference between characters based on their histograms.
Comparing Histograms of Characters:
When we analyze the histograms of characters, several observations can be made:
-
Same Characters:
- Histograms of the same characters tend to be very similar. This is because the distribution of pixel intensities for the same character will closely match.
- When comparing the histograms of the same characters, the correlation value will be close to 1, indicating a high degree of similarity.
-
Different Characters:
- For different characters, the histograms might vary significantly. This is especially true if the characters have distinct shapes or structures.
- The correlation between the histograms of different characters will be lower. In some cases, if the intensity distributions are contrasting sharply, the correlation might even be negative.
Insights on the Histogram Feature:
The histogram, which represents the distribution of pixel intensities in an image, provides both advantages and limitations when used for character recognition:
-
Advantages:
- Simplicity: Histograms are straightforward to compute and understand.
- Robustness: They can be robust against minor variations or noise in the image.
- Profile Capture: Histograms can capture the general profile of a character, such as whether it’s generally bright or dark, which can be useful in differentiating certain characters.
-
Limitations:
- Loss of Spatial Information: While histograms capture the intensity distribution, they lose the spatial arrangement of these intensities. This means two very different characters might have the same histogram if they have a similar count of dark and light pixels.
- Discrimination: For characters that have similar pixel intensity distributions, histograms might not be able to distinguish them effectively.
Resolution of the Histogram:
The number of bins in the histogram, which determines its resolution, plays a crucial role in its effectiveness:
-
High Resolution (Many Bins):
- Can capture intricate details of the pixel intensity distribution.
- Might be overly sensitive to minor variations or noise in the image.
- Requires more memory and might be slower when used for comparisons.
-
Low Resolution (Fewer Bins):
- Provides a more generalized view of the pixel intensity distribution, which can make it more robust against noise.
- However, it might miss out on capturing subtle differences between characters.
- Computationally more efficient due to fewer bins.
In Conclusion: The resolution of the histogram is a crucial parameter. A very high-resolution histogram might be too sensitive to noise, while a very low-resolution histogram might miss out on important character details. It’s always a good idea to experiment with different resolutions to find the one that works best for the specific dataset and task at hand.