2.3 Hirarchical Clustering

1. Introduction

Hierarchical clustering is a method of cluster analysis that builds a hierarchy of clusters. It aims to group similar objects into clusters based on their proximity or similarity, creating a tree-like structure known as a dendrogram. This dendrogram can be cut at different levels to form clusters at varying granularities, making hierarchical clustering versatile for exploratory data analysis.

2. Components

Algorithm Overview
- Agglomerative Clustering:
  - Start with each data point as a separate cluster.
  - Merge the closest pairs of clusters iteratively until all data points belong to a single cluster.
  - Construct a dendrogram to visualize the merging process.
- Divisive Clustering:
  - Start with all data points in one cluster.
  - Recursively split the cluster into smaller clusters until each data point forms its own cluster.
  - Construct a dendrogram showing the splitting process.
Mathematical Formulation
- Distance Metric: Hierarchical clustering uses a distance measure (e.g., Euclidean distance, Manhattan distance) to determine the similarity between data points or clusters.
- Linkage Criteria: Defines how the distance between clusters is computed during merging or splitting:
  - Single Linkage: Minimum distance between points in different clusters:
  - Complete Linkage: Maximum distance between points in different clusters:
  - Average Linkage: Average distance between all pairs of points from different clusters.
- Dendrogram Construction: A dendrogram visually represents the hierarchical clustering process, showing the order and distances at which clusters are merged or split.

3. Key Concepts

Cluster Fusion: Agglomerative clustering merges clusters based on similarity until a single cluster is formed.
Cluster Splitting: Divisive clustering recursively splits clusters into smaller clusters or individual data points.
Cutting the Dendrogram: Determining the number of clusters by cutting the dendrogram at a desired similarity level or height.
Cluster Similarity Measures: Choosing an appropriate distance metric and linkage criterion influences the resulting clusters’ shape and compactness.

4. Advantages and Limitations

Advantages:
- No need to specify the number of clusters beforehand.
- Provides a hierarchy of clusters, aiding in exploratory data analysis and interpretation.
- Handles non-convex clusters and varying cluster sizes.
Limitations:
- Computationally intensive for large datasets, especially with agglomerative methods.
- Dendrogram interpretation can be subjective, requiring domain knowledge.
- Sensitivity to distance metrics and linkage criteria selection.

5. Applications

Biology: Clustering genes based on expression patterns.
Marketing: Segmenting customers based on purchasing behavior.
Image Processing: Grouping similar regions in image segmentation.
Anomaly Detection: Identifying unusual patterns or outliers in data.

6. Variants and Extensions

Ward’s Method: Minimizes the variance within each cluster during merging.
Centroid-based Hierarchical Clustering: Uses centroids of clusters instead of individual data points.
Hierarchical Density-Based Clustering: Integrates density-based methods for robust clustering.

Enter Password