Use Array to Enhance Hash Table (ArrayHashMap)
Prerequisite Knowledge
Before reading this article, you should first learn:
In the previous chapter Enhancing Hash Table with Linked List, we used a doubly linked list to enhance the hash table, implementing a data structure like LinkedHashMap
that maintains the insertion order of keys.
While linked lists can enhance hash tables, arrays, being closely related to linked lists, can also enhance hash tables.
Adding a randomKey()
API
Now, let me challenge you: based on the standard hash table API, add a new randomKey()
API that can return a random key in O(1) time complexity:
interface Map<K, V> {
// Get the value corresponding to the key, time complexity O(1)
V get(K key);
// Add/modify key-value pair, time complexity O(1)
void put(K key, V value);
// Remove key-value pair, time complexity O(1)
void remove(K key);
// Check if the key is contained, time complexity O(1)
boolean containsKey(K key);
// Return all keys, time complexity O(N)
List<K> keys();
// New API: Randomly return a key, required time complexity O(1)
K randomKey();
}
Uniform Random
Note that when we generally talk about randomness, we refer to uniform random, meaning each element has an equal probability of being selected. For example, if you have n
elements, your random algorithm must ensure that each element has a probability of 1/n
of being selected to be considered uniform random.
So, can you do it? Don't underestimate this simple requirement; the implementation is quite ingenious.
From previous lessons, you should know that the essence of a hash table is a table
array. Now, if you want to randomly return a key from a hash table, it's easy to think of randomly selecting an element from an array.
In a standard array, randomly getting an element is straightforward. You just use a random number generator to produce a random index in the range [0, size)
, which effectively means you've found a random element:
int randomeElement(int[] arr) {
Random r = new Random();
// generate a random index in the range [0, arr.length)
return arr[r.nextInt(arr.length)];
}
int randomeElement(vector<int>& arr) {
// generate a random index in the range [0, arr.size())
return arr[rand() % arr.size()];
}
import random
def randomeElement(arr: List[int]) -> int:
# generate a random index in the range [0, len(arr))
return arr[random.randint(0, len(arr) - 1)]
import "math/rand"
func randomeElement(arr []int) int {
// generate a random index in the range [0, len(arr))
return arr[rand.Intn(len(arr))]
}
function randomeElement(arr) {
// generate a random index in the range [0, arr.length)
return arr[Math.floor(Math.random() * arr.length)];
}
This algorithm is correct, with a complexity of O(1), and the probability of each element being selected is 1/n
, where n
is the total number of elements in the arr
array.
However, note that this function has a prerequisite: the elements in the array must be stored compactly without any gaps, such as arr = [1, 2, 3, 4]
. This ensures that any random index corresponds to a valid element.
If there are gaps in the array, problems arise. For example, if arr = [1, 2, null, 4]
, where arr[2] = null
represents a gap with no stored element. If you happen to generate a random number that is 2, what should you do?
You might think of doing a linear search to the left or right to find a non-null element to return, like this:
// return a non-null random element (pseudo code)
int randomeElement(int[] arr) {
Random r = new Random();
// generate a random index in the range [0, arr.length)
int i = r.nextInt(arr.length);
while (arr[i] == null) {
// the randomly generated index i happens to be null
// use the circular array technique to probe to the right
// until a non-null element is found
i = (i + 1) % arr.length;
}
return arr[i];
}
This approach won't work, as there are two issues with this algorithm:
There's a loop, causing the worst-case time complexity to rise to O(N), which does not meet the O(1) requirement.
This algorithm is not uniformly random because your search direction is fixed, making the elements on the right side of the gap more likely to be selected. For example, with
arr = [1, 2, null, 4]
, the probabilities of selecting elements1
,2
, and4
are1/4
,1/4
, and2/4
, respectively.
There might be another way: if you have bad luck once, try randomizing a few more times until you find a non-empty element:
// return a non-empty random element (pseudo code)
int randomeElement(int[] arr) {
Random r = new Random();
// generate a random index in the range [0, arr.length)
int i = r.nextInt(arr.length);
while (arr[i] == null) {
// the randomly generated index i happens to be null
// regenerate a random index
i = r.nextInt(arr.length);
}
return arr[i];
}
Currently, this algorithm is uniformly random, but the problem is obvious: its time complexity depends on random numbers! It certainly isn't O(1), which doesn't meet the requirements.
So, are you stumped by the problem of randomly returning an element from an array with holes?
Don't forget, our current goal is to randomly return a key from a hash table. The table
array underlying the hash table not only contains holes, but the situation is even more complex:
If your hash table resolves collisions using open addressing, it's not too bad—it's the scenario with an array containing holes.
But if your hash table uses chaining, things get tricky. Each element in the array is a linked list, so simply randomizing an index isn't enough; you also need to randomly select a node within the linked list.
And consider the probabilities: with chaining, even if you uniformly randomize an array index and then uniformly randomize a node in the linked list at that index, is the key you get uniformly random?
Actually, it's not. In the image above, the probabilities of selecting k1, k2, k3
are 1/2 * 1/3 = 1/6
, while the probabilities of selecting k4, k5
are 1/2 * 1/2 = 1/4
. This is not uniform randomness.
About Probabilistic Algorithms
Probabilistic algorithms are also a very interesting type of problem. Both in algorithmic problems and real-world applications, some classic random algorithms are often used. I will explain them in detail in later sections Random Algorithms in Games and Weighted Random Selection. For now, you don't need to master them.
The only way to achieve uniform randomness is to use the keys
method to traverse the entire table
array, store all keys in another array, and then randomly return a key. But that makes the complexity O(N), which still doesn't meet the requirements.
Feeling stuck? This is why it's important to accumulate experience with classic data structure designs. If you encounter a similar problem in an interview or test, it's hard to come up with a solution on the spot. Next, I'll introduce how to enhance a hash table with an array to easily implement the randomKey()
API.