1. Image Segmentation
1. Watershed Image Segmentation
Watershed image segmentation is a method used in image processing to separate different objects in an image. It’s based on the concept of topography, treating the grayscale image as a surface where the brightness values represent elevation. The algorithm works as follows:
- Treat the image like a landscape: Brighter areas are “hills” and darker areas are “valleys.”
- Flooding: Imagine water starting to fill the valleys of the landscape. As the water level rises, the water from different valleys starts to merge.
- Barriers: To prevent merging from different valleys (which represent different objects in the image), barriers are built where water from different valleys would meet. These barriers define the boundaries of different segments.
- Result: Once the entire image is “flooded,” the barriers that remain are used to segment the image into different regions, effectively separating distinct objects.
This technique is useful for segmenting images with distinct object boundaries but may require preprocessing (such as smoothing or noise removal) to avoid over-segmentation.
Example
import cv2import numpy as npfrom matplotlib import pyplot as plt
# Load the imageimage = cv2.imread('image.jpg')gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply thresholding_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Remove noise with morphological operationskernel = np.ones((3, 3), np.uint8)opening = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel, iterations=2)
# Sure background areasure_bg = cv2.dilate(opening, kernel, iterations=3)
# Sure foreground areadist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)_, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
# Find unknown regionsure_fg = np.uint8(sure_fg)unknown = cv2.subtract(sure_bg, sure_fg)
# Marker labeling_, markers = cv2.connectedComponents(sure_fg)
# Add one to all labels to ensure the background is not 0markers = markers + 1
# Mark the unknown region with 0markers[unknown == 255] = 0
# Apply watershed algorithmmarkers = cv2.watershed(image, markers)image[markers == -1] = [0, 0, 255] # Mark boundaries in red
# Display the resultplt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))plt.title('Watershed Segmentation')plt.show()
Explanation:
- Thresholding: Converts the grayscale image to binary (black and white) to prepare for segmentation.
- Morphological operations: Helps clean the image by removing noise.
- Distance transform: Calculates the distance to the nearest background pixel, which helps identify the foreground.
- Watershed: Segments the image by treating the regions as “watersheds,” and marking the boundaries.
Distance Transformation
In the watershed algorithm, the Distance Transform helps create the sure foreground:
- Initial binary mask: After thresholding, you may have a rough segmentation of the image, but the boundaries between objects may not be clear.
- Distance Transform: By calculating the distance transform, you identify the regions that are the “core” of each object (the sure foreground), as they are the farthest from the background.
- Thresholding: You threshold the distance map to create a clear separation between the sure foreground (the center of objects) and areas that are uncertain or closer to the background.
2. UNet For Image Segmentation
In a U-Net architecture, the goal is to regress from an input image to a corresponding segmentation mask.
How it works:
- Input: The U-Net takes an image (often RGB) as input.
- Output: It outputs a segmentation mask, where each pixel represents a class (e.g., object or background). In binary segmentation, the output mask will typically contain values between 0 and 1, where each value indicates the likelihood of that pixel belonging to the target class (foreground) or not (background).
Regression in U-Net:
- During training, U-Net learns to map the input image to the mask by minimizing a loss function (like binary cross-entropy for binary segmentation or categorical cross-entropy for multi-class segmentation).
- The process is referred to as regression because U-Net tries to predict the continuous probability values (between 0 and 1 for each pixel) in the mask.
After the U-Net outputs the continuous mask, a threshold is typically applied to convert it into discrete classes (e.g., assigning a pixel to class 1 if its probability is greater than 0.5).