cluster

This module initializes the cluster functions of the hsi-wizard package.

Module Overview

Functions

wizard._processing.cluster.kmeans(dc, n_clusters=5, n_init=10)[source]

Perform KMeans clustering on a hyperspectral DataCube without spatial smoothing.

This function reshapes the spectral data into pixel vectors and applies KMeans to segment the data into the specified number of clusters.

Parameters:
  • dc (DataCube) – Hyperspectral data cube with shape (v, x, y), where v is the spectral resolution.

  • n_clusters (int) – Number of clusters to form. Default is 5.

  • n_init (int) – Number of time the k-means algorithm will be run with different centroid seeds. Default is 10.

Returns:

2D array of shape (x, y) containing cluster labels for each pixel.

Return type:

np.ndarray

Raises:

ValueError – If the input DataCube is malformed or n_clusters or n_init are invalid.

Notes

This function does not perform any spatial regularization or smoothing.

wizard._processing.cluster.smooth_kmeans(dc, n_clusters=5, threshold=0.1, mrf_iterations=5, kernel_size=12, sigma=1.0)[source]

Segment a hyperspectral DataCube using KMeans clustering with MRF-based spatial smoothing.

Performs initial clustering on spectral data using KMeans, followed by iterative Markov Random Field (MRF)-based smoothing using Gaussian kernel convolution to enforce spatial consistency.

Parameters:
  • dc (DataCube) – Hyperspectral data cube with shape (v, x, y), where v is spectral resolution.

  • n_clusters (int, optional) – Maximum number of clusters for initial KMeans clustering. Default is 5.

  • threshold (float, optional) – Minimum allowable distance between KMeans centroids to stop increasing cluster count. Default is 0.1.

  • mrf_iterations (int, optional) – Number of spatial smoothing iterations using MRF regularization. Default is 5.

  • kernel_size (int, optional) – Size of the Gaussian kernel used for smoothing. Must be an odd integer. Default is 12.

  • sigma (float, optional) – Standard deviation of the Gaussian kernel. Default is 1.0.

Returns:

2D array (x, y) of cluster labels after smoothing.

Return type:

np.ndarray

Raises:

ValueError – If input DataCube is malformed or smoothing parameters are invalid.

Notes

The function uses optimal_clusters to determine a suitable number of clusters before applying KMeans. MRF smoothing is implemented by convolving binary masks of each cluster with a Gaussian kernel and reassigning pixels based on weighted responses.

Examples

>>> labels = smooth_kmeans(dc, n_clusters=6, mrf_iterations=3)
>>> plt.imshow(labels, cmap='viridis')
>>> plt.show()
wizard._processing.cluster.isodata(dc, k=10, it=10, p=2, theta_m=10, theta_s=0.1, theta_c=2, theta_o=0.05, k_=None)[source]

Classify hyperspectral image data using the ISODATA clustering algorithm.

Performs iterative clustering on a hyperspectral DataCube, adapting the number of clusters by splitting and merging based on data distribution, until convergence or iteration limit is reached.

Parameters:
  • dc (DataCube) – DataCube object containing the hyperspectral image with shape (v, x, y).

  • k (int, optional) – Initial number of clusters to begin clustering. Default is 10.

  • it (int, optional) – Maximum number of iterations. Default is 10.

  • p (int, optional) – Maximum number of cluster pairs allowed to merge. Default is 2.

  • theta_m (int, optional) – Minimum number of pixels required per cluster. Default is 10.

  • theta_s (float, optional) – Threshold for standard deviation to trigger cluster splitting. Default is 0.1.

  • theta_c (int, optional) – Distance threshold for merging clusters. Default is 2.

  • theta_o (float, optional) – Threshold for minimum change in cluster centers to stop iteration. Default is 0.05.

  • k – Alternative number of clusters to override initial k. Default is None.

  • k_ (int | None)

Returns:

2D array (x, y) with cluster labels assigned to each pixel.

Return type:

np.ndarray

Raises:

ValueError – If intermediate clustering steps produce inconsistent dimensions or invalid data.

Notes

This implementation adapts clustering during iterations by merging or splitting clusters, based on the intra-cluster statistics and spatial constraints. The algorithm stops early if cluster centers converge according to theta_o.

Examples

>>> labels = isodata(dc, k=5, it=15)

Note

The Isodata code was inspired by pyRadar <https://github.com/PyRadar/pyradar/>_ from PyRadar.

wizard._processing.cluster.spectral_spatial_kmeans(dc, n_clusters, spatial_radius)[source]

Spectral–spatial K-Means clustering for hyperspectral images.

Applies K-Means to pixel spectra augmented by the mean spectrum of their local neighborhood.

Parameters:
  • dc (DataCube) – Hyperspectral data cube with shape (v, x, y), where v is number of bands.

  • n_clusters (int) – Number of clusters to form.

  • spatial_radius (int) – Radius (in pixels) of the square neighborhood for local averaging.

Returns:

labels – Integer cluster label for each pixel.

Return type:

np.ndarray of shape (x, y)

Raises:

ValueError – If spatial_radius < 0 or n_clusters <= 0.

Notes

  • Uses a uniform filter to compute the local mean spectrum for each pixel.

  • Flattens the augmented spectral features and applies scikit-learn’s KMeans.

Examples

>>> labels = spectral_spatial_kmeans(dc, n_clusters=5, spatial_radius=1)
wizard._processing.cluster.spatial_agglomerative_clustering(dc, n_clusters)[source]

Agglomerative clustering with a 4-connected grid graph enforcing spatial contiguity.

Flattens the spectral vectors and uses grid_to_graph for pixel connectivity, so only spatial neighbors can merge.

Parameters:
  • dc (DataCube) – Hyperspectral data cube (v, x, y).

  • n_clusters (int) – Desired number of clusters.

Returns:

labels – Connected clusters that respect spatial adjacency.

Return type:

np.ndarray of shape (x, y)

Notes

  • Uses sklearn.feature_extraction.image.grid_to_graph to build a sparse connectivity matrix over the x×y grid.

  • May be more memory-intensive for large images.

Examples

>>> labels = spatial_agglomerative_clustering(dc, n_clusters=8)

Warning

Agglomerative clustering with spatial connectivity is conceptually elegant, but it doesn’t scale well to large 2D grids, and processing very large datasets can lead to high computing time.

wizard._processing.cluster.smooth_cluster(img, sigma=1.0, n_iter=1)[source]

Smooth a cluster label image to remove mislabelled pixels.

Apply a Gaussian filter to the input cluster label image to reduce spurious mislabelled pixels by smoothing label intensities, then round back to the nearest integer labels.

Parameters:
  • img (numpy.ndarray) – Integer label image of shape (H, W) or (H, W, …).

  • sigma (float, optional) – Standard deviation for Gaussian kernel. Default is 1.0.

  • n_iter (int, optional) – Number of iterations to apply the Gaussian filter. Default is 1.

Returns:

Smoothed label image with same shape and dtype as input.

Return type:

numpy.ndarray

Raises:
  • TypeError – If img is not a numpy.ndarray.

  • ValueError – If img is empty.

Notes

Internally, the image labels are converted to float, smoothed, then rounded back to integer labels. This may remove small isolated noisy pixels.

Examples

>>> import numpy as np
>>> img = np.array([[1,1,2],[1,2,2],[2,2,2]])
>>> smooth_cluster(img, sigma=0.5)
array([[1,1,2],
       [1,2,2],
       [2,2,2]])
wizard._processing.cluster.pca(dc, n_components=25)[source]

Perform principal component analysis to reduce the spectral dimensionality of a DataCube.

This function reshapes the DataCube’s underlying array from shape (v, x, y) to (x*y, v), applies PCA to reduce the number of spectral bands to n_components, then reshapes the result back to (n_components, x, y). It also updates the DataCube’s wavelengths to a simple integer index for each new component.

Parameters:
  • dc (DataCube) – The DataCube instance whose cube attribute (a numpy array of shape (v, x, y)) will be reduced in its spectral dimension.

  • n_components (int, optional) – The number of principal components to retain. Defaults to 25.

Returns:

The same DataCube instance, with its cube attribute replaced by the reduced cube of shape (n_components, x, y) and its wavelengths attribute set to integer indices from 0 to n_components-1.

Return type:

DataCube

Raises:

ValueError – If n_components is greater than the original number of spectral bands (v).

Notes

  • Uses sklearn.decomposition.PCA under the hood.

  • The new wavelengths are not actual physical wavelengths but simple indices.

  • The transformation is done in-place on the provided DataCube.

Examples

>>> import wizard
>>> dc = wizard.read("hyperspectral_data.cube")
>>> dc = pca(dc, n_components=10)
>>> print(dc.cube.shape)
(10, 512, 512)