Time and Space Complexity Analysis Practical Guide
Note
Now all the plugins has supported English. I'm still improving the website...
In my previous articles, I mainly focused on explaining the principles of algorithms and the thought process for solving problems, often glossing over the analysis of time and space complexity. This was primarily due to the following two reasons:
For readers who are relatively new to the field, I wanted you to concentrate on understanding the principles of algorithms. Adding too much mathematical content can easily discourage people.
Correctly understanding the underlying principles of common algorithms is a prerequisite for analyzing their complexity. Especially for recursive algorithms, you need to think and analyze from the perspective of trees to correctly assess their complexity.
Given that my previous articles have already covered the core principles of all common algorithms, I have decided to write a dedicated guide on analyzing time and space complexity. Teaching you how to fish is better than giving you a fish, so I will provide you with a set of universal methods to analyze the time and space complexity of any algorithm.
This article will be quite lengthy and will cover the following points:
Using time complexity to infer problem-solving approaches, reducing trial and error time.
Where does the time go? Common coding mistakes that can cause algorithms to timeout.
Basic characteristics of Big O notation.
Time complexity analysis in non-recursive algorithms.
Methods for measuring the efficiency of data structure APIs (amortized analysis).
Analysis methods for the time/space complexity of recursive algorithms, which is the focus. I will illustrate this with examples from dynamic programming and backtracking algorithms.
Before diving into the concepts and specific calculation methods of complexity, I will first share some common techniques and pitfalls often encountered in practical scenarios.
Inferring Solution Strategies from Complexity
Pay Attention to Data Scale
Complexity analysis is not intended to make things difficult for you; rather, it serves as a guide to help you solve problems.
You should pay attention to the data scale provided in the problem before you start coding, as complexity analysis can help you avoid wasting time on incorrect approaches. Sometimes, it even directly suggests which algorithm to use.
Why is this important? Typically, problems indicate the scale of the test cases. We can use this information to deduce the allowable time complexity for the problem, which further helps us determine the appropriate algorithm to apply.
For example, if a problem provides an input array with a length up to 10^6
, we can infer that the time complexity should be less than . It needs to be optimized to or . An algorithm with complexity would reach about 10^12
, which most judging systems cannot handle efficiently.
To keep the complexity within or , our options become limited. Suitable approaches might include sorting the array, prefix sums, two-pointer techniques, or one-dimensional dynamic programming. Starting with these strategies is more reliable. Nested for loops, two-dimensional dynamic programming, and backtracking can typically be ruled out immediately.
Another clear example is when the data scale is very small, such as an array length N
not exceeding 20
. In this case, we can be fairly certain that a brute-force exhaustive search is likely required.
Judging platforms tend to challenge you with larger data scales, so if they provide small data scales, it's likely because the optimal solution involves exponential or factorial complexity. You can confidently use backtracking algorithms without considering other approaches.
Therefore, it's important to consider all the information given in the problem, particularly the data scale, before starting to code. Ignoring this and diving directly into coding can lead to unnecessary detours.
Complexity Issues Due to Coding Errors
This section summarizes common coding errors, especially among beginners, that can lead to unexpected time consumption, slow down your algorithm, or even cause timeouts.
Using Standard Output
While solving algorithm problems, we might use print/console.log
functions to print some states for debugging purposes.
However, remember to comment out these output statements before submitting your code, as standard output involves I/O operations that can significantly slow down your algorithm.
Incorrectly Passing by Value Instead of Reference
For example, in C++, function parameters are passed by value by default. If you pass a vector
as a parameter, the data structure is fully copied, leading to additional time and space consumption. This is particularly problematic for recursive functions, where passing by value almost guarantees timeout or memory overflow.
The correct approach is to pass by reference, such as vector<int>&
, to avoid unnecessary copying. Readers using other languages should verify if similar issues exist and understand their programming language's parameter passing mechanism.
Unclear Underlying Implementation of Interface Objects
This is a rather tricky problem, often occurring in object-oriented languages like Java, though it is rare. Nonetheless, it merits attention.
In Java, List
is an interface with many implementing classes, such as ArrayList
and LinkedList
. If you have learned from the basic knowledge sections Implementing Dynamic Arrays Step by Step and Implementing Doubly Linked Lists Step by Step, you know that methods of ArrayList
and LinkedList
differ in complexity, such as the get
method in ArrayList
being and in LinkedList
being .
So, if a function parameter is passed to you as a List
type, would you dare to perform list.get(i)
for random access? Probably not.
In such cases, it is generally safer to create an ArrayList
yourself, copy all elements from the passed List
, and then access them via an index.
These are the main non-algorithmic issues to watch out for. Pay attention to them, and you should be fine. Let's move on to the formal explanation of algorithm complexity analysis.
Big O Notation
Mathematical Definition of Big O Notation:
= { f(n)
: There exist constants c
and n_0
such that for all n ≥ n_0
, 0 ≤ f(n) ≤ c*g(n)
}
The symbol O
(uppercase letter 'O', not the number 0) is commonly used to represent a set of functions. For example, represents a set of functions derived from g(n) = n^2
. When we say an algorithm has a time complexity of , it means the complexity of the algorithm is described by a function within this set.
In theory, understanding this abstract mathematical definition can answer all your questions about Big O notation.
However, since many people feel dizzy when they see mathematical definitions, I will list two properties used in complexity analysis. Remembering these two is sufficient.
1. Retain only the term with the highest growth rate; other terms can be omitted.
Firstly, constant factors in multiplication and addition can be ignored, as shown in the following example:
O(2N + 100) = O(N)
O(2^(N+1)) = O(2 * 2^N) = O(2^N)
O(M + 3N + 99) = O(M + N)
Of course, don't eliminate constants blindly. Some constants cannot be removed, as they might involve exponential rules we learned in high school:
O(2^(2N)) = O(4^N)
Besides constant factors, terms with slower growth rates can be ignored in the presence of terms with faster growth rates:
O(N^3 + 999 * N^2 + 999 * N) = O(N^3)
O((N + 1) * 2^N) = O(N * 2^N + 2^N) = O(N * 2^N)
All the examples listed above are the simplest and most common ones, which can be correctly explained by the definition of Big O notation. If you encounter more complex complexity scenarios, you can also judge whether your complexity expression is correct based on the definition.
2. Big O Notation Represents the "Upper Bound" of Complexity.
In other words, as long as you provide an upper bound, using Big O notation to represent it is correct.
For example, consider the following code:
for (int i = 0; i < N; i++) {
print("hello world");
}
If we consider this as an algorithm, then clearly its time complexity is . However, if you insist on saying its time complexity is , theoretically, that's acceptable, because the O
notation represents an upper bound. Indeed, the time complexity of this algorithm will not exceed the upper bound of N^2
. Although this bound is not "tight," it still fits the definition, so theoretically, it's not incorrect.
The above example is too simple, and unnecessarily expanding its time complexity upper bound seems meaningless. However, some algorithms have complexities that depend on the input data, making it difficult to provide a very precise time complexity in advance. In such cases, using Big O notation to expand the upper bound of time complexity becomes meaningful.
For instance, in the previous article The Core Framework of Dynamic Programming, we discussed the brute-force recursive solution to the coin change problem. The core code framework is as follows:
// Definition: to make up the amount n, at least dp(coins, n) coins are needed
int dp(int[] coins, int amount) {
// base case
if (amount <= 0) return;
// state transition
for (int coin : coins) {
dp(coins, amount - coin);
}
}
When amount = 11
and coins = [1,2,5]
, the recursive tree for the algorithm looks like this:
We will discuss the method for calculating the time complexity of recursive algorithms later. For now, let's determine the number of nodes in this recursive tree.
Suppose the value of amount
is N
and the number of elements in the coins
list is K
. In this case, the recursive tree is a K
-ary tree. However, since the growth of this tree is directly related to the denominations in the coins
list, its structure is irregular, making it difficult to accurately calculate the total number of nodes in the tree.
For this situation, a simpler approach is to approximate using the worst-case scenario:
How tall is the tree? We do not know, so we consider the worst case where all the coins have a denomination of 1. In this scenario, the tree's height would be N
.
What is the structure of the tree? We do not know, so we assume the worst case where it is a complete K
-ary tree.
So, how many nodes does this tree have in total? Considering the worst-case scenario, a complete K
-ary tree with height N
has a total number of nodes according to the geometric series sum formula (K^N - 1)/(K - 1)
. Using Big O notation, this is represented as .
Of course, we know that the actual number of nodes in this tree is not as many, but representing it with provides a rough upper bound, which clearly indicates that this is an exponential algorithm and that we need to find ways to optimize its efficiency.
Therefore, sometimes when your estimated time complexity differs from someone else's, it does not necessarily mean one of you is wrong. Both could be correct, just differing in the precision of the estimation.
Theoretically, we aim for a more accurate and "tight" upper bound, but obtaining an exact upper bound requires a certain level of mathematical skill. For the purpose of interview preparation, it is acceptable to settle for the correct order of magnitude (linear/exponential/logarithmic/quadratic, etc.).
In the field of algorithms, besides using Big O notation for asymptotic upper bounds, there are also methods for representing asymptotic lower bounds and tight bounds, among others. Interested readers can explore these further. However, from a practical standpoint, the explanation provided here regarding Big O notation is sufficient.
Non-Recursive Algorithm Analysis
The space complexity of non-recursive algorithms is generally easy to determine. You simply check if it involves allocating any array or similar storage space. Therefore, I will mainly discuss the analysis of time complexity.