SciPy - to_tree() Function



The SciPy to_tree() method is used to convert linkage matrix(z) to a tree representation.

This method creates a hierarchical tree and nodes where, each node has attributes id(unique identifier), count(number of data points), left and right(child nodes), and dist(merging distance between child nodes) and helps you to understand and manipulate hierarchical clusterings more flexibly, as it gives a tree-like structure instead of just a matrix.

In hierarchical clustering, the tree is a data structure that represents how data points are grouped together. Each node in the tree represents either a single data point(leaf) and larger cluster formed by merging smaller clusters.

Syntax

Following is the syntax of the SciPy to_tree() method −

.to_tree(z)

Parameters

This method accepts two parameters −

  • Z: Linkage matrix created by methods like ward, complete, single, etc. It contains information about how clusters are merged.

  • rd(optional): If rd = False (default) only the root node of the tree is returned and when rd = True it gives a tuple containing a root node of the tree and a list of ClusterNodes.

Return Value

This method returns ClusterNode when rd = False and returns tuple(r,d) when rd = True where 'r' refers to the root node which is the top-most node of the tree and 'd' refers to list of ClusterNode object in the tree.

Example 1

Following is the example that shows the usage of SciPy to_tree(Z[, rd]) method where rd is false by default.

This is useful when you want details of high-level attributes of the clustering like root cluster's size and structure.

import numpy as np
from scipy.cluster import hierarchy
from scipy.cluster.hierarchy import to_tree
#sample data
data = [[30, 34], [67, 66], [31, 32], [70, 68]]

#hierarchical clustering
Z = hierarchy.linkage(data)

#matrix to tree
root_node = to_tree(Z)

# Access root node attributes
print("Root Node ID:", root_node.id)
print("Number of points in root cluster:", root_node.count)
print("Distance at root:", root_node.dist)

# Access children of root node
print("Left child ID:", root_node.left.id)
print("Right child ID:", root_node.right.id)

Following is an output of the above code −

Root Node ID: 6
Number of points in root cluster: 4
Distance at root: 48.91829923454004
Left child ID: 4
Right child ID: 5

Example 2

Following is the example that shows the usage of SciPy to_tree(Z[, rd]) method where rd is True.

This is useful when u need detailed access to all nodes in the clustering tree, enabling in-depth exploration of the hierarchical structure.

import numpy as np
from scipy.cluster import hierarchy
from scipy.cluster.hierarchy import to_tree

# Data
data = [[2, 3], [3, 4], [5, 8],
    [10, 12], [15, 18], [20, 25]]

# Hierarchical clustering
Z = hierarchy.linkage(data, method='average')

# Converting to tree structure with rd=True
root_node, node_dict = to_tree(Z, rd=True)

# Root node information
print("Root node ID:", root_node.id)
print("Number of nodes in the tree:", len(node_dict))  # Node dictionary is actually a list

# Listing node IDs
print("Node IDs in the tree:", [node.id for node in node_dict])
print("Cluster count in root node:", root_node.count)

Output of the above code is as follows −

Root node ID: 10
Number of nodes in the tree: 11
Node IDs in the tree: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Cluster count in root node: 6
scipy_cluster.htm
Advertisements