Trie, Digital Tree, Prefix Tree Basics and Visualization
Prerequisite Knowledge
Before reading this article, you need to first learn:
Summary in One Sentence
A Trie tree is an extension of a multi-way tree structure and is a specially optimized data structure for handling strings.
Trie trees offer many advantages in string operations, such as saving memory space for common string prefixes, facilitating prefix operations, and supporting wildcard matching.
The visual panel below illustrates the structure and main API of a Trie tree. You can click through the code line by line to observe the changes in the console output and the Trie tree structure on the right:
Algorithm Visualization
This article is only an introduction to the principles of Trie trees (also known as dictionary trees or prefix trees). The hands-on implementation of TrieMap/TrieSet is placed in the data structure design section following the Introduction to Binary Tree Exercises. The reason is the same as in the previous Principles of TreeMap/TreeSet; I do not plan to explain the specific implementation of such complex structures in the basic knowledge section, and beginners do not need to understand the code implementation of Trie trees at this stage.
However, I am still explaining the principles of Trie trees here for two purposes:
To give you an intuitive sense of the various transformations of binary tree structures, and perhaps you will understand why my tutorial emphasizes binary tree structures.
To let you know about this data structure, understand its API and applicable scenarios. In the future, if you encounter related problems, you might think of using a Trie tree to solve them, at least having an idea. If not, you can come back and copy the code template. The implementation of such data structures is fixed, and in written or oral exams, you won't be required to manually construct a Trie tree from scratch. You can simply copy and paste it for use.
Alright, let's get started without further ado.
This site will guide you in implementing a TrieMap
and TrieSet
. Let's first review the Map/Set types we have implemented:
The standard Hash Table
HashMap
, which uses a hash function to store key-value pairs in atable
array, offers two methods for resolving hash collisions. Its main feature is speed, as basic operations like addition, deletion, query, and update have a time complexity of . The Hash SetHashSet
is a simple wrapper around theHashMap
.Linked Hash Map
LinkedHashMap
is an enhancement of the standard hash table using a doubly linked list structure. It inherits the operation complexity of a hash table and can maintain all keys in "insertion order." TheLinkedHashSet
is a simple wrapper around theLinkedHashMap
.Array Hash Map
ArrayHashMap
enhances the standard hash table using an array structure. It inherits the operation complexity of a hash table and provides an additionalrandomKey
function, which can return a random key in time. TheArrayHashSet
is a simple wrapper around theArrayHashMap
.TreeMap
, whose underlying structure is a binary search tree (typically a self-balancing red-black tree in standard libraries), has a basic operation complexity of . Its features include dynamically maintaining the size relationship of key-value pairs and offering many additional API operations on key-value pairs. TheTreeSet
is a simple wrapper around theTreeMap
.
The TrieSet
is also a simple wrapper around the TrieMap
, so we will focus on understanding the implementation principles of the TrieMap
below.
Main Application Scenarios of Trie Trees
Trie trees are a data structure specifically optimized for strings, which is likely why they are also known as prefix trees. Trie trees offer several advantages when processing strings, which are outlined below.
Saving Storage Space
Let's compare with HashMap
, for instance, when storing several key-value pairs:
Map<String, Integer> map = new HashMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
Think back to the implementation principle of hash tables: key-value pairs are stored in the table
array. This means it actually creates the strings "apple"
, "app"
, and "appl"
, occupying 12 characters of memory space.
However, note that these three strings share a common prefix. The prefix "app"
is stored three times, and "l"
is stored twice.
If we use a TrieMap for storage:
// The key type of the Trie tree is fixed as String type, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("apple", 1);
map.put("app", 2);
map.put("appl", 3);
A Trie tree does not store common prefixes multiple times, so you only need the memory space for the 5 characters in "apple"
to store the key.
In this example, the data size is small, and you might think that storing duplicates a few times is no big deal. However, if there are many keys, each very long, and with many common prefixes (which is often the case in real-world scenarios, such as with ID numbers), a Trie tree can save a lot of memory space.
Convenient for Handling Prefix Operations
Here's an example to make it clear:
// The key type of Trie tree is fixed as String, and the value type can be generic
TrieMap<Integer> map = new TrieMap<>();
map.put("that", 1);
map.put("the", 2);
map.put("them", 3);
map.put("apple", 4);
// "the" is the shortest prefix of "themxyz"
System.out.println(map.shortestPrefixOf("themxyz")); // "the"
// "them" is the longest prefix of "themxyz"
System.out.println(map.longestPrefixOf("themxyz")); // "them"
// "tha" is a prefix of "that"
System.out.println(map.hasKeyWithPrefix("tha")); // true
// There is no key with the prefix "thz"
System.out.println(map.hasKeyWithPrefix("thz")); // false
// "that", "the", and "them" are all prefixes of "th"
System.out.println(map.keysWithPrefix("th")); // ["that", "the", "them"]
Except for the keysWithPrefix
method whose complexity depends on the length of the return result, the complexity of other prefix operations is , where is the length of the prefix string.
Consider these operations. Can you achieve them with a HashMap or TreeMap? You would likely have to forcefully traverse all keys and compare string prefixes one by one, which is highly complex.
By the way, isn't this keysWithPrefix
method quite suitable for implementing auto-complete functionality?