Disjoint Sets

A disjoint set, also known as a union-find data structure, is a data structure that keeps track of a collection of elements partitioned into a number of disjoint (non-overlapping) subsets. In this article, we will discuss the concepts, use cases, and implementation of disjoint sets in computer science.

Definition

A disjoint set is a collection of elements, each of which belongs to a unique subset. These subsets are disjoint, meaning that they do not have any elements in common. Each element has a parent element, which represents the subset to which it belongs. The parent of the root element of each subset is itself, forming a tree structure.

Use Cases

Disjoint sets are used in a variety of computer science applications, including:

Kruskal's algorithm for finding the minimum spanning tree of a graph
Dynamic connectivity problems, such as checking if two elements are connected or finding the connected components of a graph
Union-by-rank and path compression algorithms for improving the efficiency of disjoint set operations

Operations

The following are the basic operations that can be performed on a disjoint set:

MakeSet: Creates a new set with a single element.
Find: Returns the representative (root) element of the set that contains a given element.
Union: Merges two sets into one by making the representative of one set the parent of the representative of the other set.

Implementations

There are several ways to implement disjoint sets, including:

Array-based implementation
Linked-list implementation
Forest-based implementation

Array-Based Implementation

In an array-based implementation, each element is represented by its index in an array. The parent of each element is stored in another array, with the root element having a parent of itself. The Find operation can be performed by following the chain of parents until the root is found. The Union operation can be performed by changing the parent of one of the elements to the root of the other element.

Linked-List Implementation

In a linked-list implementation, each element is represented by a node in a linked list. The parent of each node is a pointer to another node, with the root node having a parent of itself. The Find operation can be performed by following the chain of pointers until the root is found. The Union operation can be performed by changing the parent of one of the nodes to the root of the other node.

Forest-Based Implementation

In a forest-based implementation, each set is represented by a tree, with the root node representing the representative element. The Find operation can be performed by returning the root of the tree containing the given element. The Union operation can be performed by merging two trees into one, either by making the root of one tree the parent of the root of the other tree or by merging the two trees at the root level.

Python Implementation

class DisjointSet:
  def __init__(self, elements):
    self.parent = {element: element for element in elements}
    self.size = {element: 1 for element in elements}

  def find(self, element):
    if self.parent[element] != element:
      self.parent[element] = self.find(self.parent[element])
    return self.parent[element]

  def union(self, element1, element2):
    root1 = self.find(element1)
    root2 = self.find(element2)

    if root1 != root2:
      if self.size[root1] < self.size[root2]:
          self.parent[root1] = root2
          self.size[root2] += self.size[root1]
      else:
          self.parent[root2] = root1
          self.size[root1] += self.size[root2]

Conclusion

Disjoint sets are a useful data structure for solving dynamic connectivity problems and finding the minimum spanning tree of a graph. There are several ways to implement disjoint sets, including array-based, linked-list, and forest-based implementations. Each implementation has its own advantages and disadvantages, and the best implementation for a given problem depends on the specific requirements and constraints of the problem.