Trie, Digital Tree, Prefix Tree Basics and Visualization
Prerequisites
Before reading this article, you should first study:
Summary in One Sentence
A Trie tree is an extension of a multi-way tree structure and is a data structure specifically optimized for strings.
Trie trees offer several advantages when dealing with string operations, such as saving memory space for common string prefixes, facilitating prefix operations, and supporting wildcard matching.
The following visualization panel demonstrates the structure and main API of a Trie tree. You can click through the code line by line to observe changes in the console output and the Trie tree structure on the right:
This article only introduces the principles of Trie trees (also known as dictionary trees or prefix trees). The hands-on implementation of TrieMap/TrieSet is placed in the data structure design section following the binary tree series exercises. The reason is similar to the previous article TreeMap/TreeSet Principles. In the basics section, I do not intend to explain the detailed implementation of such complex structures, and beginners do not need to understand the code implementation of Trie trees at this stage.
However, I still explain the principles of Trie trees here for two purposes:
To give you a direct insight into the various transformations of binary tree structures, which might help you understand why my tutorials particularly emphasize binary tree structures.
To introduce you to this data structure and its API, as well as applicable scenarios. In the future, when you encounter related problems, you might think of using a Trie tree to solve them, or at least have an idea. At worst, you can come back to copy the code template. The implementation of such data structures is fixed, and you won't be asked to manually write a Trie tree from scratch in written exams or interviews. Simply copy and paste to use it.
Enough preamble, let's begin.
In explaining the Trie tree, I will guide you to implement a TrieMap
and TrieSet
. First, let's review the Map/Set types we have already implemented:
The standard Hash Table
HashMap
uses a hash function to store key-value pairs in atable
array, with two methods to resolve hash collisions. Its characteristic is speed, with basic operations like addition, deletion, search, and update having a time complexity of . The Hash SetHashSet
is a simple encapsulation ofHashMap
.Linked Hash Map
LinkedHashMap
is an enhancement of the standard hash table using a doubly linked list structure. It inherits the operational complexity of hash tables and allows all keys in the hash table to maintain the "insertion order".LinkedHashSet
is a simple encapsulation ofLinkedHashMap
.Array Hash Map
ArrayHashMap
is an enhancement of the standard hash table using an array structure. It inherits the operational complexity of hash tables and provides an additionalrandomKey
function, which can return a random key in time.ArrayHashSet
is a simple encapsulation ofArrayHashMap
.TreeMap
Mapping, which is based on a binary search tree (the standard library uses an improved self-balancing Red-Black Tree), has a basic operational complexity of for addition, deletion, search, and update. Its characteristic is the dynamic maintenance of the size relationship of key-value pairs, with many additional APIs to operate on key-value pairs. TheTreeSet
collection is a simple encapsulation of theTreeMap
mapping.
TrieSet
is also a simple encapsulation of TrieMap
, so below we will focus on the implementation principles of TrieMap
.
Main Application Scenarios of Trie Trees
Trie trees are a data structure specifically optimized for strings, which is likely why they are also known as prefix trees. Trie trees offer several advantages when processing strings, which are outlined below.
Saving Storage Space
Let's compare with HashMap
, for instance, when storing several key-value pairs:
Map<String, Integer> map = new HashMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
Think back to the implementation principle of hash tables: key-value pairs are stored in the table
array. This means it actually creates the strings "apple"
, "app"
, and "appl"
, occupying 12 characters of memory space.
However, note that these three strings share a common prefix. The prefix "app"
is stored three times, and "l"
is stored twice.
If we use a TrieMap for storage:
// The key type of the Trie tree is fixed as String type, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
A Trie tree does not store common prefixes multiple times, so you only need the memory space for the 5 characters in "apple"
to store the key.
In this example, the data size is small, and you might think that storing duplicates a few times is no big deal. However, if there are many keys, each very long, and with many common prefixes (which is often the case in real-world scenarios, such as with ID numbers), a Trie tree can save a lot of memory space.
Convenient for Handling Prefix Operations
Here's an example to make it clear:
// The key type of Trie tree is fixed as String, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("that", 1);
map.put("the", 2);
map.put("them", 3);
map.put("apple", 4);
// "the" is the shortest prefix of "themxyz"
System.out.println(map.shortestPrefixOf("themxyz")); // "the"
// "them" is the longest prefix of "themxyz"
System.out.println(map.longestPrefixOf("themxyz")); // "them"
// "tha" is a prefix of "that"
System.out.println(map.hasKeyWithPrefix("tha")); // true
// There is no key with the prefix "thz"
System.out.println(map.hasKeyWithPrefix("thz")); // false
// "that", "the", and "them" are all prefixes of "th"
System.out.println(map.keysWithPrefix("th")); // ["that", "the", "them"]
Except for the keysWithPrefix
method, whose complexity depends on the length of the returned result, other prefix operations have a complexity of , where L
is the length of the prefix string.
Consider these operations; can they be done using a HashMap or TreeMap? You would likely have to forcibly traverse all keys and compare each string prefix one by one, resulting in very high complexity.
By the way, doesn't the keysWithPrefix
method seem well-suited for implementing an auto-complete feature?