Is there a way to enforce that a set of points are assigned to the same class when clustering in sklearn or other clustering library?

I would like to use one of sklearn's clustering algorithms but with the restriction that certain sets of points must belong to the same class. For instance, given the set of points below I would like to enforce that all red points belong to the same class and all blue points belong to the same class. I would also like it so that red and blue points can belong to the same class. If this is not possible in sklearn I am also open to using other libraries.

Clustering with some points prespecified

Solution 1:

The name for this is "constrained clustering," which is a family of semi-supervised clustering approaches in which a user can also supply constraints as:

Must Link - two nodes must belong to the same cluster
Cannot Link - two nodes cannot belong to the same cluster

There's an implementation of the COP-KMeans algorithm, which provides an API like this:

import numpy
from copkmeans.cop_kmeans import cop_kmeans
input_matrix = numpy.random.rand(100, 500)
must_link = [(0, 10), (0, 20), (0, 30)]
cannot_link = [(1, 10), (2, 10), (3, 10)]
clusters, centers = cop_kmeans(dataset=input_matrix, k=5, ml=must_link,cl=cannot_link)

Solution 2:

One possible solution which should work for any library is to define a "superpoint" for the blue cluster and another for the red cluster.

So just define the blue superpoint to be the average / median of each blue point and similarly for the red. Then run the clustering on these two superpoints plus the remaining points

C program elements of array present in exactly two of three arrays [closed]

Javascript Regex Extract Url and Dimension in SrcSet Attribute String

Display text when timer reaches 0s, and add 0 before 1-9s

Could not load file or assembly for project reference, reference dlls vs csproj [duplicate]

react-router getting this.props.location in child components

CSS does not load in Laravel 8 + Jetstream on XAMPP

How can I convert an HTML table to CSV?

part and export - What is the usage in dart?

I used Heron’s formula to check if a given point lies inside a triangle; the math checks out, but the output is wrong. Why? [duplicate]

Forcing HTTPS and www on root domain, but not on subdomains

What is the best way to check if a URL exists in PHP?

Where i can find Scheduled Jobs in Oracle SQL Developer?