Trie, Digital Tree, Prefix Tree Basics and Visualization
Prerequisite Knowledge
Before reading this article, you should first learn:
Summary in One Sentence
A Trie tree, also known as a prefix tree, is an extension of the multi-way tree structure. It is a data structure specially optimized for handling strings.
The Trie tree offers several advantages in string-related operations, such as saving memory space for common string prefixes, facilitating prefix operations, and supporting wildcard matching.
The following visualization panel demonstrates the structure and main APIs of a Trie tree. You can click through the code line by line to observe changes in the console output and the Trie tree structure on the right:
This article is only an introduction to the principles of Trie trees. The practical implementation of TrieMap/TrieSet is placed in the data structure design section after the binary tree series exercises. The reason is similar to the previous article on TreeMap/TreeSet principles. I do not plan to explain the specific implementation of such complex structures in the basic knowledge section, as beginners do not need to understand the code implementation of Trie trees at this stage.
However, I still include the explanation of Trie tree principles here for two reasons:
To give you an intuitive sense of the various transformations of binary tree structures, which may help you understand why my tutorials emphasize binary tree structures.
To introduce you to this data structure at the beginning, familiarize you with its APIs, and understand the scenarios where it is applicable. In the future, when you encounter related problems, you might consider using a Trie tree to solve them. At the very least, you'll have a starting point and can come back to copy the code template if needed. The implementation of such data structures is standardized, and you won't be asked to manually code a Trie tree from scratch in exams or interviews; you can simply copy and paste the code.
Alright, without further ado, let's begin.
In explaining the Trie tree, I will guide you in implementing a TrieMap
and TrieSet
. Let's first review the Map/Set types we have already implemented:
The standard HashMap uses a hash function to store key-value pairs in a
table
array, with two methods for resolving hash collisions. Its characteristic is speed, with basic operations like add, delete, search, and update having a time complexity of . HashSet is a simple wrapper aroundHashMap
.LinkedHashMap is an enhancement of the standard hash table using a doubly linked list structure. It inherits the operational complexity of hash tables and allows all keys in the hash table to maintain "insertion order."
LinkedHashSet
is a simple wrapper aroundLinkedHashMap
.ArrayHashMap is an enhancement of the standard hash table using an array structure. It inherits the operational complexity of hash tables and provides an extra
randomKey
function that can return a random key in time.ArrayHashSet
is a simple wrapper aroundArrayHashMap
.TreeMap is a map based on a binary search tree (the standard library uses an improved self-balancing red-black tree). Basic operations such as add, delete, search, and update have a time complexity of , and it can dynamically maintain the size relationship of key-value pairs with many additional APIs.
TreeSet
is a simple wrapper aroundTreeMap
.
TrieSet
is also a simple wrapper around TrieMap
, so we will focus on the implementation principles of TrieMap
.
Main Application Scenarios of Trie Trees
Trie trees are a data structure specifically optimized for strings, which is likely why they are also known as prefix trees. Trie trees offer several advantages when processing strings, which are outlined below.
Saving Storage Space
Let's compare with HashMap
, for instance, when storing several key-value pairs:
Map<String, Integer> map = new HashMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
Think back to the implementation principle of hash tables: key-value pairs are stored in the table
array. This means it actually creates the strings "apple"
, "app"
, and "appl"
, occupying 12 characters of memory space.
However, note that these three strings share a common prefix. The prefix "app"
is stored three times, and "l"
is stored twice.
If we use a TrieMap for storage:
// The key type of the Trie tree is fixed as String type, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
A Trie tree does not store common prefixes multiple times, so you only need the memory space for the 5 characters in "apple"
to store the key.
In this example, the data size is small, and you might think that storing duplicates a few times is no big deal. However, if there are many keys, each very long, and with many common prefixes (which is often the case in real-world scenarios, such as with ID numbers), a Trie tree can save a lot of memory space.
Convenient for Handling Prefix Operations
Here's an example to make it clear:
// The key type of Trie tree is fixed as String, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("that", 1);
map.put("the", 2);
map.put("them", 3);
map.put("apple", 4);
// "the" is the shortest prefix of "themxyz"
System.out.println(map.shortestPrefixOf("themxyz")); // "the"
// "them" is the longest prefix of "themxyz"
System.out.println(map.longestPrefixOf("themxyz")); // "them"
// "tha" is a prefix of "that"
System.out.println(map.hasKeyWithPrefix("tha")); // true
// There is no key with the prefix "thz"
System.out.println(map.hasKeyWithPrefix("thz")); // false
// "that", "the", and "them" are all prefixes of "th"
System.out.println(map.keysWithPrefix("th")); // ["that", "the", "them"]
Except for the keysWithPrefix
method, whose complexity depends on the length of the returned result, other prefix operations have a complexity of , where L
is the length of the prefix string.
Consider these operations; can they be done using a HashMap or TreeMap? You would likely have to forcibly traverse all keys and compare each string prefix one by one, resulting in very high complexity.
By the way, doesn't the keysWithPrefix
method seem well-suited for implementing an auto-complete feature?