SciPy - cut_tree() Function



The cut_tree() method in SciPy helps us to divide a hierarchical clustering tree(dendogram) into smaller clusters to form flat clusters.

This method consists of "cutting" the linkage matrix at a chosen level, often generated by the linkage() function. Flat clusters are generated from this cut and can be used for further analysis or plotting, for example, putting together related data points.

This method is helpful because we can know which data point belongs to which cluster and also tells how the data points is grouped together at the different levels of similarity or distance.

Syntax

Following is the syntax of the SciPy cut_tree() method −

.cut_tree(Z, n_clusters=None, height=None)

Parameters

This method accepts 3 parameters in which two are optional −

  • Z: The linkage matrix

  • n_clusters(optional): Number of clusters you want e.g.(2, 3, 4, etc.)

  • height(optional): The distance at which you want to cut the tree.

Return Value

A 2D array where each row corresponds to a data point, and each column represents cluster labels at different levels.

Example 1

Following is the basic SciPy cut_tree() method that illustrates the result of hierarchical clustering by cutting the tree into clusters.

This code demonstrates how to split the dendrogram into two clusters after performing hierarchical clustering on a dataset.

import numpy as np
from scipy.cluster.hierarchy import linkage, cut_tree

data = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
# Perform hierarchical clustering
Z = linkage(data, method='single')
# Cut the tree to form 2 clusters
clusters = cut_tree(Z, n_clusters=2)
print("Clusters:\n", clusters)

Output of the above code is as follows

Clusters:
 [[0]
 [0]
 [0]
 [1]]

Example 2

In the following code, linkage() makes use of the Ward method for carrying out hierarchical clustering on the randomly generated data. Once cut_tree() has split the data into three clusters, it prints the cluster assignments for the first ten data points. The output lists which data points belong to each of the three clusters. Following is the code −

import numpy as np
from scipy.cluster.hierarchy import linkage, cut_tree
from numpy.random import default_rng

# Generate random data
rng = default_rng()
data = rng.random((60, 3))

# Perform hierarchical clustering using the 'ward' method
Z = linkage(data, method='ward')

# Cut the tree to form 3 clusters
clusters = cut_tree(Z, n_clusters=3)
print("Clusters:\n", clusters[:10])

Output of the above code is as follows

Clusters:
 [[0]
 [1]
 [1]
 [2]
 [0]
 [1]
 [0]
 [2]
 [0]
 [2]]
scipy_cluster.htm
Advertisements