Basic Concept of Union Find Algorithm
Prerequisites
Before reading this article, you should first study:
Summary in One Sentence
The Union-Find (or Disjoint Set) structure is a derivative of the binary tree structure and is used to efficiently solve connectivity problems in undirected graphs. It can merge two connected components in time, check if two nodes are connected in time, and determine the number of connected components in time.
There are several optimization methods for the Union-Find algorithm, which are supported by visualization panels. The following shows an unoptimized implementation of Union-Find, where the resulting multi-branch tree almost degenerates into a singly linked list, reducing algorithm efficiency:
The optimization strategies and visual demonstrations for this problem will be detailed in the subsequent sections.
This article will introduce the concept of the dynamic connectivity problem in graphs and explain why the Union Find algorithm is an efficient solution for this problem.
We will use a visualization panel to intuitively demonstrate the core principles of the Union Find algorithm and the effects of several optimization strategies.
As this is a foundational chapter, we will not delve into the implementation details of the algorithm code. The specific code implementation and application in algorithm problems will be covered in later sections: Union Find Algorithm Implementation and Application and Classic Union Find Exercises. It is recommended for beginners to follow the sequence in the directory for a step-by-step learning process.
Dynamic Connectivity and Terminology
Graph theory algorithms often involve numerous technical terms. Here, a simple example will be used to introduce some of these terms.
Consider the example below, where there are 10 nodes labeled from 0 to 9. Although there are no edges, it still represents a graph structure:
We can say that within this graph structure, there are 10 "connected components", with each node being a connected component on its own, as they do not connect with other nodes.
Now, let's perform some "union operations" on certain nodes:
At this point, nodes 0, 1, 2
have become connected, forming a single connected component. We can say these three nodes are "connected".
Additionally, the number of connected components in this graph structure has decreased from 10 to 8, as the union operations merged several connected components into one.
Properties of Connectivity
Reflexivity: A node
p
is connected to itself.Symmetry: If node
p
is connected to nodeq
, thenq
is connected top
.Transitivity: If node
p
is connected to nodeq
, andq
is connected tor
, thenp
is connected tor
.
Determining this type of "equivalence relation" is very practical, such as when compilers determine different variable references to the same memory object, or when calculating social network friend circles, and so on.
The dynamic connectivity problem involves providing a graph structure as input, performing several "union operations", and possibly querying whether any two nodes are "connected", or finding out how many "connected components" exist in the current graph.
Our goal is to design a data structure that performs union operations and queries with minimal time complexity.
Why We Need the Union-Find Algorithm
The Union-Find data structure provides the following API: