SciPy - ClusterNode() Function



The ClusterNode() method is an important component in hierarchical clustering, representing individual data points and merged clusters within a dendrogram. It offers an organized method for managing and storing data regarding a node's position, relationships, and distance within the clustering tree.

It is particularly useful when you want to explore hierarchical clusters programmatically, access detailed cluster attributes.

What makes ClusterNode() method powerful is its ability to integrate with other methods, such as fcluster() in SciPy for flat clustering, to_tree() for conversion into trees, and linkage() for computing linkage matrices. This method is useful for users for performing detailed analyses, to evaluate the inter-cluster distances and determine the cluster hierarchies, and interpreting relationships between data points.

Syntax

Following is the syntax of the SciPy ClusterNode() method

.ClusterNode(id, left=None, right=None, dist=0.0, count=1)

Parameters

This method accepts the following parameters −

  • id − (int) The unique identifier for the node.

  • left − (optional) The left child of the node in the dendrogram. It is None for leaf nodes..

  • right − (optional) The right child of the node in the dendrogram. It is None for leaf nodes.

  • dist − (optional) The distance between the two clusters merged at this node.

  • count − (optionl) The number of original data points in the cluster.

For a leaf node, the left and right attributes are set to None, as there are no child nodes. The dist attribute is either 0.0 or undefined, and the count attribute is 1, representing a single data point.

Return Value

The ClusterNode returns an object of type ClusterNode, where object represents a hierarchical clustering tree node which encapsulates attributes such as:

  • Integer: id, count.

  • Float: dist.

  • ClusterNode or None: left, right.

Example 1

This is the basic example of clusternode() method illustrating how leaf nodes handle attributes like left and right when they don't have any child nodes. Following is the code −

from scipy.cluster.hierarchy import ClusterNode

# Create a leaf node
leaf_node = ClusterNode(id=0)

# Access and print attributes of the leaf node
print("Node ID:", leaf_node.id)           
print("Left Child:", leaf_node.left)      
print("Right Child:", leaf_node.right)    
print("Distance:", leaf_node.dist)       
print("Count:", leaf_node.count)          

When we run above program, it produces following result

Node ID: 0
Left Child: None
Right Child: None
Distance: 0
Count: 1

Example 2: Creating and Accessing ClusterNode in Hierarchical Clustering

This code demonstrates how to manually create ClusterNode objects to represent leaf and internal nodes in a hierarchical clustering tree uisng ClusterNode() method.

This program's output will be the node details including the ID, children, distance, and the total points of the cluster.

from scipy.cluster.hierarchy import ClusterNode

# Create leaf nodes
leaf1 = ClusterNode(id=0)  
leaf2 = ClusterNode(id=1)  

# Create an internal node by merging two leaf nodes
internal_node = ClusterNode(id=2, left=leaf1, right=leaf2, dist=1.5, count=2)

# Accessing attributes of the internal node
print("Node ID:", internal_node.id)  
print("Left Child ID:", internal_node.left.id)  
print("Right Child ID:", internal_node.right.id)  
print("Distance:", internal_node.dist)  
print("Count:", internal_node.count)  

When we run above program, it produces following result

Node ID: 2
Left Child ID: 0
Right Child ID: 1
Distance: 1.5
Count: 2

Example 3: Accessing ClusterNode Objects from linkage Using to_tree() method

In the below code it performs hierarchical clustering on a set of data points using the linkage method, then converts the resulting linkage matrix into a tree of ClusterNode objects. It prints the root node's ID, cluster size, and merge distance.

The ClusterNode objects are automatically generated when converting a linkage matrix (using to_tree) into a tree structure.

from scipy.cluster.hierarchy import linkage, to_tree
data = [[1, 2], [3, 4], [5, 6], [7, 8]]

# Perform hierarchical clustering
Z = linkage(data, method='ward')

# Convert the linkage matrix into a tree of ClusterNode objects
root_node = to_tree(Z)

print("Root Node ID:", root_node.id)  
print("Left Child ID:", root_node.left.id)  
print("Right Child ID:", root_node.right.id)  
print("Distance at Root Node:", root_node.dist) 
print("Number of Points in Cluster:", root_node.count)  

Following is an output of the above code

Root Node ID: 6
Left Child ID: 4
Right Child ID: 5
Distance at Root Node: 7.999999999999999
Number of Points in Cluster: 4

Example 4: Combining ClusterNode and fcluster

This example shows how to use fcluster() method to create flat clusters based on a distance threshold while navigating a tree of ClusterNode objects.

from scipy.cluster.hierarchy import linkage, to_tree, fcluster

data = [[1, 2], [3, 4], [5, 6], [7, 8]]

Z = linkage(data, method='ward')
root_node = to_tree(Z)

# Function to traverse the tree and print ClusterNode attributes
def print_cluster_node(node):
    print(f"Node ID: {node.id}, Distance: {node.dist}, Count: {node.count}")
    if node.left:
        print(f"  Left Child ID: {node.left.id}")
    if node.right:
        print(f"  Right Child ID: {node.right.id}")

# Traverse the tree starting from the root
print("Root Node:")
print_cluster_node(root_node)

# Extract flat clusters using fcluster
clusters = fcluster(Z, t=5, criterion='distance')

# Print flat clusters
print("\nFlat Clusters:", clusters)

Output of the above code is as follows

Root Node:
Node ID: 6, Distance: 7.999999999999999, Count: 4
  Left Child ID: 4
  Right Child ID: 5

Flat Clusters: [1 1 2 2]

The node IDs and results may differ depending on the input data.

scipy_cluster.htm
Advertisements