SciPy − optimal_leaf_ordering_method()



The optimal_leaf_ordering() method in SciPy's cluster.hierarchy module is a tool for improving the arrangement of leaf nodes in hierarchical clustering dendrograms.By reordering the leaf nodes, it minimizes distance between adjacent leaves, making the dendrogram easier to interpret.

This technique can be combined with other clustering techniques like leaves_list() which retrieves the order of leaves, and linkage(), which builds the hierarchical structure of clustering.

This method is helpful for applications like gene expression analysis, image segmentation, and other fields where comprehending hierarchical relationships among huge datasets is essential.

Syntax

Following is the syntax of the SciPy optimal_leaf_ordering() method

.optimal_leaf_ordering(Z, y, metric='euclidean')

Parameters

This method accepts the following parameters −

  • Z − (ndarray) The linkage matrix produced by the linkage() method.

  • y − (ndarray) The original input distance matrix or condensed distance matrix from which linkage matrix(Z) generated.

  • metric − (optional, string or function) The distance metric to compute distances between observations. Default is 'euclidean'.

Return Value

The method returns a copy of the linkage matrix(Z), optimizing the arrangement of leaf nodes to minimize the distance between adjacent leaves and enhance the interpretability of hierarchical clustering.

Example 1

This example demonstrates how optimal_leaf_ordering() method improves the clustering visualization by reordering the leaves. In the output below you can compare the leaf order before and after using optimal leaf ordering.

The reordered leaf order after applying optimal_leaf_ordering() helps minimize the distance between adjacent leaves in the dendrogram, helps to clarify and improve the interpretability of the clustering structure.

import numpy as np
from scipy.cluster import hierarchy

# Generating a random 2D dataset with 6 points
rng = np.random.default_rng()
X = rng.standard_normal((6, 2))

# Perform hierarchical clustering using 'ward' method
Z = hierarchy.linkage(X, method='ward')

original_leaves = hierarchy.leaves_list(Z)
print("Original Leaf Order:", original_leaves)

# Apply optimal leaf ordering to reorder the linkage matrix
optimal_Z = hierarchy.optimal_leaf_ordering(Z, X)

# Displaying the leaf order after applying optimal leaf ordering
optimal_leaves = hierarchy.leaves_list(optimal_Z)
print("Optimized Leaf Order:", optimal_leaves)

When we run above program, it produces following result −

Original Leaf Order: [3 1 4 2 0 5]
Optimized Leaf Order: [3 4 1 5 0 2]

Example 2

Let us see how optimal_leaf_ordering uses linkage matrix(z) to reorder adjacent leaves by using cityblock metric.

The cityblock metric calculates the distance between two points as the sum of the absolute differences of their coordinates. It is also called as Manhattan distance.

import numpy as np
from scipy.cluster.hierarchy import linkage, optimal_leaf_ordering, leaves_list

distance_matrix = np.array([0.5, 1.2, 0.9, 1.8, 1.1, 0.7])
Z = linkage(distance_matrix, method='average')
original_order = leaves_list(Z)
print("Original Leaf Order:", original_order)

optimal_Z = optimal_leaf_ordering(Z, distance_matrix, metric='cityblock')
optimal_order = leaves_list(optimal_Z)
print("Optimized Leaf Order:", optimal_order)

When we run above program, it produces following result −

Original Leaf Order: [0 1 2 3]
Optimized Leaf Order: [1 0 3 2]

Example 3

This example tells how optimal_leaf_ordering_method() can enhance the visualization by ensuring similar clusters are adjacent. The example utilizes a custom metric, cosine distance, to measure similarity between data points.

The output shows the optimized order ensures clusters with high similarity, based on cosine similarity, are placed adjacent.

import numpy as np
from scipy.cluster.hierarchy import linkage, optimal_leaf_ordering, leaves_list
from scipy.spatial.distance import pdist

data = np.random.rand(5, 3)
Z = linkage(data, method='single')
original_order = leaves_list(Z)
print("Original Leaf Order:", original_order)

# Apply optimal_leaf_ordering with cosine distance
optimal_Z = optimal_leaf_ordering(Z, pdist(data, metric='cosine'))
optimal_order = leaves_list(optimal_Z)
print("Optimized Leaf Order:", optimal_order)

When we run above program, it produces following result −

Original Leaf Order: [3 1 4 0 2]
Optimized Leaf Order: [3 4 1 2 0]
scipy_cluster.htm
Advertisements