cluster
This module initializes the cluster functions of the hsi-wizard package.
Module Overview
Functions
- wizard._processing.cluster.kmeans(dc, n_clusters=5, n_init=10)[source]
Perform KMeans clustering on a hyperspectral DataCube without spatial smoothing.
This function reshapes the spectral data into pixel vectors and applies KMeans to segment the data into the specified number of clusters.
- Parameters:
dc (DataCube) – Hyperspectral data cube with shape (v, x, y), where v is the spectral resolution.
n_clusters (int) – Number of clusters to form. Default is 5.
n_init (int) – Number of time the k-means algorithm will be run with different centroid seeds. Default is 10.
- Returns:
2D array of shape (x, y) containing cluster labels for each pixel.
- Return type:
np.ndarray
- Raises:
ValueError – If the input DataCube is malformed or n_clusters or n_init are invalid.
Notes
This function does not perform any spatial regularization or smoothing.
- wizard._processing.cluster.smooth_kmeans(dc, n_clusters=5, threshold=0.1, mrf_iterations=5, kernel_size=12, sigma=1.0)[source]
Segment a hyperspectral DataCube using KMeans clustering with MRF-based spatial smoothing.
Performs initial clustering on spectral data using KMeans, followed by iterative Markov Random Field (MRF)-based smoothing using Gaussian kernel convolution to enforce spatial consistency.
- Parameters:
dc (DataCube) – Hyperspectral data cube with shape (v, x, y), where v is spectral resolution.
n_clusters (int, optional) – Maximum number of clusters for initial KMeans clustering. Default is 5.
threshold (float, optional) – Minimum allowable distance between KMeans centroids to stop increasing cluster count. Default is 0.1.
mrf_iterations (int, optional) – Number of spatial smoothing iterations using MRF regularization. Default is 5.
kernel_size (int, optional) – Size of the Gaussian kernel used for smoothing. Must be an odd integer. Default is 12.
sigma (float, optional) – Standard deviation of the Gaussian kernel. Default is 1.0.
- Returns:
2D array (x, y) of cluster labels after smoothing.
- Return type:
np.ndarray
- Raises:
ValueError – If input DataCube is malformed or smoothing parameters are invalid.
Notes
The function uses optimal_clusters to determine a suitable number of clusters before applying KMeans. MRF smoothing is implemented by convolving binary masks of each cluster with a Gaussian kernel and reassigning pixels based on weighted responses.
Examples
>>> labels = smooth_kmeans(dc, n_clusters=6, mrf_iterations=3) >>> plt.imshow(labels, cmap='viridis') >>> plt.show()
- wizard._processing.cluster.isodata(dc, k=10, it=10, p=2, theta_m=10, theta_s=0.1, theta_c=2, theta_o=0.05, k_=None)[source]
Classify hyperspectral image data using the ISODATA clustering algorithm.
Performs iterative clustering on a hyperspectral DataCube, adapting the number of clusters by splitting and merging based on data distribution, until convergence or iteration limit is reached.
- Parameters:
dc (DataCube) – DataCube object containing the hyperspectral image with shape (v, x, y).
k (int, optional) – Initial number of clusters to begin clustering. Default is 10.
it (int, optional) – Maximum number of iterations. Default is 10.
p (int, optional) – Maximum number of cluster pairs allowed to merge. Default is 2.
theta_m (int, optional) – Minimum number of pixels required per cluster. Default is 10.
theta_s (float, optional) – Threshold for standard deviation to trigger cluster splitting. Default is 0.1.
theta_c (int, optional) – Distance threshold for merging clusters. Default is 2.
theta_o (float, optional) – Threshold for minimum change in cluster centers to stop iteration. Default is 0.05.
k – Alternative number of clusters to override initial k. Default is None.
k_ (int | None)
- Returns:
2D array (x, y) with cluster labels assigned to each pixel.
- Return type:
np.ndarray
- Raises:
ValueError – If intermediate clustering steps produce inconsistent dimensions or invalid data.
Notes
This implementation adapts clustering during iterations by merging or splitting clusters, based on the intra-cluster statistics and spatial constraints. The algorithm stops early if cluster centers converge according to theta_o.
Examples
>>> labels = isodata(dc, k=5, it=15)
Note
The Isodata code was inspired by pyRadar <https://github.com/PyRadar/pyradar/>_ from PyRadar.
- wizard._processing.cluster.spectral_spatial_kmeans(dc, n_clusters, spatial_radius)[source]
Spectral–spatial K-Means clustering for hyperspectral images.
Applies K-Means to pixel spectra augmented by the mean spectrum of their local neighborhood.
- Parameters:
dc (DataCube) – Hyperspectral data cube with shape (v, x, y), where v is number of bands.
n_clusters (int) – Number of clusters to form.
spatial_radius (int) – Radius (in pixels) of the square neighborhood for local averaging.
- Returns:
labels – Integer cluster label for each pixel.
- Return type:
np.ndarray of shape (x, y)
- Raises:
ValueError – If spatial_radius < 0 or n_clusters <= 0.
Notes
Uses a uniform filter to compute the local mean spectrum for each pixel.
Flattens the augmented spectral features and applies scikit-learn’s KMeans.
Examples
>>> labels = spectral_spatial_kmeans(dc, n_clusters=5, spatial_radius=1)
- wizard._processing.cluster.spatial_agglomerative_clustering(dc, n_clusters)[source]
Agglomerative clustering with a 4-connected grid graph enforcing spatial contiguity.
Flattens the spectral vectors and uses grid_to_graph for pixel connectivity, so only spatial neighbors can merge.
- Parameters:
dc (DataCube) – Hyperspectral data cube (v, x, y).
n_clusters (int) – Desired number of clusters.
- Returns:
labels – Connected clusters that respect spatial adjacency.
- Return type:
np.ndarray of shape (x, y)
Notes
Uses sklearn.feature_extraction.image.grid_to_graph to build a sparse connectivity matrix over the x×y grid.
May be more memory-intensive for large images.
Examples
>>> labels = spatial_agglomerative_clustering(dc, n_clusters=8)
Warning
Agglomerative clustering with spatial connectivity is conceptually elegant, but it doesn’t scale well to large 2D grids, and processing very large datasets can lead to high computing time.
- wizard._processing.cluster.smooth_cluster(img, sigma=1.0, n_iter=1)[source]
Smooth a cluster label image to remove mislabelled pixels.
Apply a Gaussian filter to the input cluster label image to reduce spurious mislabelled pixels by smoothing label intensities, then round back to the nearest integer labels.
- Parameters:
img (numpy.ndarray) – Integer label image of shape (H, W) or (H, W, …).
sigma (float, optional) – Standard deviation for Gaussian kernel. Default is 1.0.
n_iter (int, optional) – Number of iterations to apply the Gaussian filter. Default is 1.
- Returns:
Smoothed label image with same shape and dtype as input.
- Return type:
numpy.ndarray
- Raises:
TypeError – If img is not a numpy.ndarray.
ValueError – If img is empty.
Notes
Internally, the image labels are converted to float, smoothed, then rounded back to integer labels. This may remove small isolated noisy pixels.
Examples
>>> import numpy as np >>> img = np.array([[1,1,2],[1,2,2],[2,2,2]]) >>> smooth_cluster(img, sigma=0.5) array([[1,1,2], [1,2,2], [2,2,2]])
- wizard._processing.cluster.pca(dc, n_components=25)[source]
Perform principal component analysis to reduce the spectral dimensionality of a DataCube.
This function reshapes the DataCube’s underlying array from shape (v, x, y) to (x*y, v), applies PCA to reduce the number of spectral bands to n_components, then reshapes the result back to (n_components, x, y). It also updates the DataCube’s wavelengths to a simple integer index for each new component.
- Parameters:
dc (DataCube) – The DataCube instance whose cube attribute (a numpy array of shape (v, x, y)) will be reduced in its spectral dimension.
n_components (int, optional) – The number of principal components to retain. Defaults to 25.
- Returns:
The same DataCube instance, with its cube attribute replaced by the reduced cube of shape (n_components, x, y) and its wavelengths attribute set to integer indices from 0 to n_components-1.
- Return type:
- Raises:
ValueError – If n_components is greater than the original number of spectral bands (v).
Notes
Uses sklearn.decomposition.PCA under the hood.
The new wavelengths are not actual physical wavelengths but simple indices.
The transformation is done in-place on the provided DataCube.
Examples
>>> import wizard >>> dc = wizard.read("hyperspectral_data.cube") >>> dc = pca(dc, n_components=10) >>> print(dc.cube.shape) (10, 512, 512)