Learn C++
Search textbook...
Associative containers organize their elements in terms of unique keys.
Associative containers are collections that organize their elements in terms of unique keys. Conceptually, these resemble Python dictionaries and sets. Use map
and unordered_map
anytime you need to map from a unique key to a value, or set
and unordered_set
to store a collection of unique elements.
Ordered containers, map
and set
, impose an ordering constraint on their element types, allowing them to achieve key lookups in logarithmic time. Internally, these containers sort their elements using a comparison function. As a result, iterating through the key-value pairs of a map (or the elements of a set) will always produce them in order of smallest to largest key (or set element), as the following example demonstrates:
To use an std::map<K, V>
or a std::set<T>
, K
or T
must have an operator<
. For example,
would be valid because it is valid to compare two std::string
using the <
operator whereas:
would not be because there is no operator<
defined for std::ifstream
. If a type MyType
lacks an operator<
, you can still use it with a map
or set
as long as you can define one, which can be done in a number of ways:
Define an operator<
. For example, one could write a non-member function operator<
, shown below. See the chapter on operator overloading for more details on how to overload the operator<
function.
Define a functor. This technique is recommended if you don't want to overload the global operator<
for MyType
. For example:
This works by explicitly passing a comparison template argument to map
and set
, which if not provided as a template argument defaults to std::less<T>
, a built-in functor in the standard library that compares keys using operator<
!
Use a lambda function. Similar to the above, this is useful if you don't want to override the global operator<
, and also avoids creating a new functor type. Functionally, it is the same as 2:
decltype(comp)
infers the compile-time type of the lambda function comp
, which is ordinarily hidden through the use of auto
. Note that we also pass the lambda function to the constructor of map
and set
.
std::map<K, V>
A map
is the standard way to associate a key with a value in C++. It works exactly like a Python dictionary or a JavaScript object, with a few exceptions noted below.
Expression | Result |
---|---|
std::map<K, V> m | Creates an empty map. See above for initializing a map with a custom comparison function |
std::map<K, V> m { { k1, v1 }, /* ... */ } | Uniform initializes a map with key-value pairs {k1, v1} , {k2, v2} , etc. |
auto v = m[k] | Gets the value for a key k . **If k is not in the map, it will be inserted with the default value for V |
m[k] = v | Sets or updates the value for a key k |
auto v = m.at(k) | Gets the value for a key k . Will throw an error if k is not present in the map |
m.insert({ k, v }) m.insert(p) | Inserts an std::pair p (or the equivalent uniformly initialized pair using k and v ) representing a key-value pair into the map if it doesn't already exist |
m.erase(k) | Removes the key k from the map. k does not need to be present in the map |
if (m.count(k)) ... if (m.contains(k)) ... (since C++20) | Checks if k is in the map |
m.empty() | Checks if m is empty |
Pay special attention to the fact that
m[k]
will insert a default initialized value intom
ifk
does not exist! The default value corresponds to the parameterless constructor ofV
. For example,false
forbool
,0
forint
,size_t
,float
, anddouble
, and an empty container for most container types. This can lead to strange behaviour, such as keys appearing in a map even if all you did was try to read a value. For example, the following snippet:prints
1
even though this code appears to only read the value of"Bjarne"
from the map. The reason for this behaviour is that it makes certain algorithms trivial to implement. For example, consider the following code which counts characters in a string:Compare this to the equivalent Python code, which might need to check for the existence of each character in the map before attempting to increment its count:
In C++, the default-initialization of the value obviates this check!
std::set<T>
A set
is the standard way of storing a collection of unique elements in C++. No matter how many times you add the same element to a set
, it's as if you only added it once.
Expression | Result |
---|---|
std::set<T> s | Creates an empty set |
s.insert(e) | Adds e to s . Calling insert(e) more than once has the same effect as calling it once |
s.erase(e) | Removes e from s . e does not need to be s |
if (s.count(e)) ... if (s.contains(e)) ... (since C++20) | Checks if e is in the set |
s.empty() | Checks if s is empty |
You may be wondering: why do std::map<K, V>
and std::set<T>
impose the requirement that K
and T
have a operator<
? The reason has to do with how these data structures are implemented behind the scenes. The ability to compare two elements allows us to build an efficient data structure that can quickly determine whether a key or value exists in a map
or set
.
The C++ standard does not enforce any particular implementation for the map
and set
, but compilers almost always implement these data structures with a red-black tree, allowing for efficient traversal during key lookup. Let's see how this works for a map
(set
will behave similarly). Consider the following code snippet:
L1
The map organizes its elements in memory as a binary search tree. Each TreeNode
has a std::pair
containing the key and value for the node, and left
and right
pointers to the next node in the sequence. The tree is laid out in such a way as to make finding the node associated with a given key fast, as shown on the next line.
m | map<string, size_t>
|
TreeNode
|
TreeNode
| ||||||||||
TreeNode
|
TreeNode
| ||||||||||
TreeNode
|
L2
To find the value associated with a given key, the map
traverses its tree using a binary search algorithm. Starting with the root node, it compares the desired key using the provided comparison function (operator<
by default) to see which direction traversal should continue:
"Gustav" < "Johann"
, so we go left"Gustav" >= "Dmitri"
, so we go right"Gustav"
is not less than itself, so we have found the node we were looking for!This is a simple example with five mappings, but imagine a tree with hundreds or thousands of mappings. Assuming our tree is well-balanced, each comparison we make would eliminates have of the nodes from consideration.
As you might expect, there is something special about the tree that allows this algorithm to work. Namely, all the keys in left subtree of a parent are less than the parent's key, and all of the keys in the right subtree are greater than the parent's key (this is what a binary search tree is). A map
will perform better if its tree is roughly balanced—this is exactly what a red black tree (a specific kind of binary search tree) aims to accomplish.
m | map<string, size_t>
| ||||
gustav | 64 |
TreeNode
|
TreeNode
| ||||||||||
TreeNode
|
TreeNode
| ||||||||||
TreeNode
|
A set
uses the same red-black tree data structure under the hood to determine quickly determine if an element exists inside a set. set
has no concept of a key-value pair, so its TreeNode
behind the scenes is somewhat simpler: rather than storing a pair
of key and value, it directly stores the set element in the node. Otherwise, set
and map
work exactly the same: conceptually, you can think of a set
being somewhat like a map
without any values.
Unordered containers, unordered_map
and unordered_set
, perhaps more commonly known in other langauges as "hash tables", make use of hash and equality functions to achieve near constant-time key lookups. These containers are called unordered because, unlike map
and set
, they do not rely on an ordering comparison (e.g. operator<
) to order their elements, and you cannot depend on their elements having any specific order when iterating through them.
To use std::unordered_map<K, V>
or std::unordered_set<T>
, K
or T
must have an associated hash and equality function.
A hash function "randomly" scrambles up an element of type K
into a value of type size_t
. When formally defined, hash functions are deterministic—the same K
will always produce the same size_t
—but they are discontinuous—small changes in the input K
will produce large, unpredictable changes in the output size_t
. These properties are key to how unordered data structures accelerate their element lookups.
Many types, e.g. int
, double
, std::string
, have built-in hash functions (you can see the full list here). Other seemingly basic types, such as std::pair
and std::tuple
, do not. To add a hash function to an unsupported type, do one of the following:
Specialize the std::hash
functor. This is the most common way to add a hash function to a type. It involves creating a template specialization for the std::hash<T>
functor, which is the default strategy unordered containers use to hash their elements. The syntax is as follows:
Define a custom functor. This is an alternative to the above syntax if you do not want to change the default hash function for all consumers of a type. For example:
Use a lambda function. Similar to the above, this is useful if you don't want to override the global operator<
, and also avoids creating a new functor type. Functionally, it is the same as 2:
When writing your own hash functions, make sure that the output size_t
are well distributed—as we will see, the performance of the container you use depends on how "random" its hash function outputs are. It is worthwhile to read this Wikiedia article for more information on how to design a good hash function. A good way to combine hash values, for example, is to utilize bit-shift, XOR, and multiplication by prime numbers, as the following example of a hash function for an std::vector<T>
demonstrates. Preferrably, use a third-party library like boost::hash_combine
to combine hash values in a way that yields a good distribution over the integers.
The equality function (or key-equality function for unordered_map
) checks if two elements/keys are equal. By default, unordered_map
and unordered_set
will try to use operator==
. If your type is missing an operator==
, you have a few options:
Define an operator==
. See the chapter on operator overloading for more information.
Define a functor. This is recommended if you don't want to overload the global operator==
for MyType
. For example:
Use a lambda function. Similar to the above, this is useful if you don't want to override the global operator<
, and also avoids creating a new functor type. Functionally, it is the same as 2:
Here, we are passing the default value for the hash function, std::hash<T> {}
, to the container constructor, but any hash function can be used in practice.
std::unordered_map<K, V>
unordered_map
is an accelerated version of map
that allows looking up the value for a key in constant time, at the cost of using additional memory.
unordered_map
supports all of the operations of std::map
(i.e. it can serve as a drop-in replacement for a map
), as well as a few more operations specific to how it works behind the scenes. Note that like map
, the operator[]
on unordered_map
will default initialize values for keys which do not exist!
Expression | Result |
---|---|
m.load_factor() | Returns the current load factor (explained below) of the map |
m.max_load_factor(lf) | Sets the maximum allowed load factor to lf |
m.rehash(b) | Ensures m has at least b buckets and reassigns elements to buckets given the new bucket count |
std::unordered_set<T>
Similar to unordered_map
, unordered_set
is an accelerated version of set
that allows checking set-membership in constant time.
unordered_set
supports all of the operations of std::set
, as well as the additional operations listed for unordered_map
.
One of the performance bottlenecks of map
and set
is that to lookup an element, they must perform a binary search on their internal trees, taking time. In most applications, this is acceptable, as even for very large is managable—even a map
with one billion entries will only need to traverse nodes to find a match in the worst case. However, if the overhead of this search is unacceptable, we can achieve faster lookup at the expense of higher memory usage by rearranging the way container elements are stored: this is exactly what unordered_map
and unordered_set
do.
Let's take a look at how unordered_map
works (as we will see, unordered_set
works identically). Like std::map
, the C++ standard enforces no restrictions on what data structure unordered_map
uses to organize its elements, but compilers almost always use a separate chaining hash table[1]. The map picks some value (known as the bucket count) and keeps track of an array of linked lists, known as the buckets. On insertion, each key-value pair is placed into one of the buckets depending on the value of and the map's hash function. Specifically, if is the hash function, then a key will go into the bucket with index . Note that hash functions are not injective: there can exist keys for which [2], so more than one key can end up in the same bucket. After taking the modulo by , the chance of such a hash collision occuring increases even more as the output of is squashed to fit inside the range .
To determine the value of a key , the unordered_map
first determines the bucket index of and then loops through the key-value pairs of that bucket, applying the key-equality function to each one until a matching key (and corresponding value) is found. Assuming that evaluating and is (an assumption that is often made in practice), then the time complexity of a key lookup will depend on the number of elements the map must search through within each bucket before finding a matching key—in other words, the number of items in the bucket. On average, an unordered_map
with mappings and buckets will have items per bucket. This value, , is called the load factor of the unordered_map
.
By keeping small, the unordered_map
will on average have to examine fewer items to find a key's value. In fact, unordered_map
will keep this number bounded by some constant threshold: by default, the max_load_factor
is set to 1.0
. Should the load factor exceed this threshold, the map will allocate additional buckets and rehash. Because the map must search at most a constant number of items during key lookup, key lookup in an unordered_map
is considered . Let's take a look at an example and see what the unordered_map
memory looks like:
L1
Here we can see the map has chosen , and maintains 5 buckets. By random chance, two key-value pairs have landed in bucket 0 and none in bucket 2. The value was chosen arbitrarily—in fact, it can be customized in the unordered_map
constructor. The only requirement on is that it meets the load factor requirement: . In this case, , or exactly .
m | unordered_map<string, size_t>
|
|
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
|
ListNode
|
L2
To find the value for "Amadeus"
, lookup proceeds as follows. First we compute to find the bucket index (0). Then we loop through all key-value pairs in bucket 0 until we find the one which equals "Amadeus"
according to the key-equality function. "Gustav"
is skipped.
Note: Even if bucket 0 had only a single element, we would still need to loop through its elements and apply the key-equality function, since it might happen to have a single key that coincidentally has the same bucket index as "Amadeus"
.
m | unordered_map<string, size_t>
| ||||
amadeus | 626 |
|
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
|
ListNode
|
L3
After inserting the key-value pair { "Franz", 600 }
to the map, the resulting load factor was . Because exceeded the threshold of , the map had to allocate more buckets and rehash. To choose the new bucket count, the old bucket count is typically doubled. Some compilers, such as G++ in this example, will further stipulate that the new bucket count always be a prime number, as taking the modulo by this count gives a better distribution of items over the buckets (11 is used in this example). Notice that after rehashing, the relative arrangement of items has completely changed as a result of the new bucket count.
m | unordered_map<string, size_t>
| ||||
amadeus | 626 |
|
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
| ||||||||
ListNode
|
ListNode
|
As was the case for set
, unordered_set
uses the same hash-table data structure as unordered_map
. You can think of unordered_set
as being implemented as a hash-table whose nodes contain only the elements (as opposed to key-value pairs).
A perennial question among C++ developers is: given the choice between a map
and an unordered_map
, which one should I choose? The answer to this question will depend on the specific problem you are solving and what your requirements are. As a general rule of thumb, unordered_map
will offer superior performance over map
while using far more memory (at the very least, a lot of empty buckets are needed to keep the load factor small). In general, as memory tends to be less of a concern in software development, unordered_map
seems to be the more promising choice. However, in select circumstances map
can still outperform unordered_map
. Below is a (far from an exhaustive) table of where each container type shines:
Scenario | Ordered Container (map /set ) | Unordered Container (unordered_map /unordered_set ) |
---|---|---|
Ordering | ✔️ Allows iteration of keys/element in sorted order | ❌ Has no dependable order. Elements are iterated in an essentially random order |
Speed | ❌ Finding/removing an element requires a binary tree traversal, average | ✔️ Hash functions allows near constant-time element lookup and removal, average |
Memory | ✔️ Small memory footprint, including only the element data and bookkeeping data for binary search tree | ❌ Large memory footprint, must allocate many empty buckets in order to keep load factor small |
Element operations | ✔️ Comparing elements using operator< is usually faster, since operator< often only needs to examine part of its operands to determine that one is smaller | ❌ Must invoke hash function, which must inspect the entire key object, and equality function |
Worst-case lookup | ✔️ Looking up an element takes at most , requiring a tree traversal | ❌ Looking up an element takes at most , e.g. if poor choice of hash function places all elements in the same bucket |
Worst-case resize | ✔️ Adding elements is always in the worst case, requiring a tree traversal | ❌ Adding elements may trigger a rehash, taking in the worst case |
Working set[3] | ✔️ Tends to preserve its working set, i.e. the same set of nodes are frequently accessed during lookups, even as the container resizes | ❌ Rehashing essentially produces a new working set. After rehashing, the buckets are completely rearranged, causing cache lines that were previously hot to become cold and vice-versa. |
Other hashing strategies exist, such as open addressing, but separate chaining is used in practice due to certain time complexity restrictions guaranteed by methods of unordered_map
and unordered_set
in the C++ standard. ↩
One intuitive way to make sense of this fact is to consider the hash function for a std::string
. Intuitively, there are many more possible std::string
instances than size_t
, so there must be at least two std::string
instances with the same hash value. ↩
The working set of a program is the set of locations in memory it frequently accesses. For reasons that are beyond the scope of this textbook, low-latency applications benefit from having a working set that doesn't change much. Accessing the same addresses frequently is usually faster than accessing many different locations, which can happen, for example, when an unordered_map
has to rehash. ↩