Learn C++

Search textbook... 

⌘K

Associative Containers

Associative containers organize their elements in terms of unique keys.

Associative containers are collections that organize their elements in terms of unique keys. Conceptually, these resemble Python dictionaries and sets. Use map and unordered_map anytime you need to map from a unique key to a value, or set and unordered_set to store a collection of unique elements.

Ordered Containers

Ordered containers, map and set, impose an ordering constraint on their element types, allowing them to achieve key lookups in logarithmic time. Internally, these containers sort their elements using a comparison function. As a result, iterating through the key-value pairs of a map (or the elements of a set) will always produce them in order of smallest to largest key (or set element), as the following example demonstrates:

Requirements

To use an std::map<K, V> or a std::set<T>, K or T must have an operator<. For example,

would be valid because it is valid to compare two std::string using the < operator whereas:

would not be because there is no operator< defined for std::ifstream. If a type MyType lacks an operator<, you can still use it with a map or set as long as you can define one, which can be done in a number of ways:

  1. Define an operator<. For example, one could write a non-member function operator<, shown below. See the chapter on operator overloading for more details on how to overload the operator< function.

  2. Define a functor. This technique is recommended if you don't want to overload the global operator< for MyType. For example:

    This works by explicitly passing a comparison template argument to map and set, which if not provided as a template argument defaults to std::less<T>, a built-in functor in the standard library that compares keys using operator<!

  3. Use a lambda function. Similar to the above, this is useful if you don't want to override the global operator<, and also avoids creating a new functor type. Functionally, it is the same as 2:

    decltype(comp) infers the compile-time type of the lambda function comp, which is ordinarily hidden through the use of auto. Note that we also pass the lambda function to the constructor of map and set.

std::map<K, V>

A map is the standard way to associate a key with a value in C++. It works exactly like a Python dictionary or a JavaScript object, with a few exceptions noted below.

ExpressionResult
std::map<K, V> mCreates an empty map. See above for initializing a map with a custom comparison function
std::map<K, V> m { { k1, v1 }, /* ... */ }Uniform initializes a map with key-value pairs {k1, v1}, {k2, v2}, etc.
auto v = m[k]Gets the value for a key k. **If k is not in the map, it will be inserted with the default value for V
m[k] = vSets or updates the value for a key k
auto v = m.at(k)Gets the value for a key k. Will throw an error if k is not present in the map
m.insert({ k, v })
m.insert(p)
Inserts an std::pair p (or the equivalent uniformly initialized pair using k and v) representing a key-value pair into the map if it doesn't already exist
m.erase(k)Removes the key k from the map. k does not need to be present in the map
if (m.count(k)) ...
if (m.contains(k)) ... (since C++20)
Checks if k is in the map
m.empty()Checks if m is empty

Pay special attention to the fact that m[k] will insert a default initialized value into m if k does not exist! The default value corresponds to the parameterless constructor of V. For example, false for bool, 0 for int, size_t, float, and double, and an empty container for most container types. This can lead to strange behaviour, such as keys appearing in a map even if all you did was try to read a value. For example, the following snippet:

prints 1 even though this code appears to only read the value of "Bjarne" from the map. The reason for this behaviour is that it makes certain algorithms trivial to implement. For example, consider the following code which counts characters in a string:

Compare this to the equivalent Python code, which might need to check for the existence of each character in the map before attempting to increment its count:

In C++, the default-initialization of the value obviates this check!

std::set<T>

A set is the standard way of storing a collection of unique elements in C++. No matter how many times you add the same element to a set, it's as if you only added it once.

ExpressionResult
std::set<T> sCreates an empty set
s.insert(e)Adds e to s. Calling insert(e) more than once has the same effect as calling it once
s.erase(e)Removes e from s. e does not need to be s
if (s.count(e)) ...
if (s.contains(e)) ... (since C++20)
Checks if e is in the set
s.empty()Checks if s is empty

Behind the scenes

You may be wondering: why do std::map<K, V> and std::set<T> impose the requirement that K and T have a operator<? The reason has to do with how these data structures are implemented behind the scenes. The ability to compare two elements allows us to build an efficient data structure that can quickly determine whether a key or value exists in a map or set.

The C++ standard does not enforce any particular implementation for the map and set, but compilers almost always implement these data structures with a red-black tree, allowing for efficient traversal during key lookup. Let's see how this works for a map (set will behave similarly). Consider the following code snippet:

L1

The map organizes its elements in memory as a binary search tree. Each TreeNode has a std::pair containing the key and value for the node, and left and right pointers to the next node in the sequence. The tree is laid out in such a way as to make finding the node associated with a given key fast, as shown on the next line.

Stack
m
map<string, size_t>
size5
root
Heap
TreeNode
value
firstJohann
second1128
left
right
Heap
TreeNode
value
firstDmitri
second147
left
right
TreeNode
value
firstLudwig
second722
left
right
Heap
TreeNode
value
firstAmadeus
second626
left
right
TreeNode
value
firstGustav
second64
left
right

L2

To find the value associated with a given key, the map traverses its tree using a binary search algorithm. Starting with the root node, it compares the desired key using the provided comparison function (operator< by default) to see which direction traversal should continue:

  • "Gustav" < "Johann", so we go left
  • "Gustav" >= "Dmitri", so we go right
  • "Gustav" is not less than itself, so we have found the node we were looking for!

This is a simple example with five mappings, but imagine a tree with hundreds or thousands of mappings. Assuming our tree is well-balanced, each comparison we make would eliminates have of the nodes from consideration.

As you might expect, there is something special about the tree that allows this algorithm to work. Namely, all the keys in left subtree of a parent are less than the parent's key, and all of the keys in the right subtree are greater than the parent's key (this is what a binary search tree is). A map will perform better if its tree is roughly balanced—this is exactly what a red black tree (a specific kind of binary search tree) aims to accomplish.

Stack
m
map<string, size_t>
size5
root
gustav64
Heap
TreeNode
value
firstJohann
second1128
left
right
Heap
TreeNode
value
firstDmitri
second147
left
right
TreeNode
value
firstLudwig
second722
left
right
Heap
TreeNode
value
firstAmadeus
second626
left
right
TreeNode
value
firstGustav
second64
left
right

A set uses the same red-black tree data structure under the hood to determine quickly determine if an element exists inside a set. set has no concept of a key-value pair, so its TreeNode behind the scenes is somewhat simpler: rather than storing a pair of key and value, it directly stores the set element in the node. Otherwise, set and map work exactly the same: conceptually, you can think of a set being somewhat like a map without any values.

Unordered Containers

Unordered containers, unordered_map and unordered_set, perhaps more commonly known in other langauges as "hash tables", make use of hash and equality functions to achieve near constant-time key lookups. These containers are called unordered because, unlike map and set, they do not rely on an ordering comparison (e.g. operator<) to order their elements, and you cannot depend on their elements having any specific order when iterating through them.

Requirements

To use std::unordered_map<K, V> or std::unordered_set<T>, K or T must have an associated hash and equality function.

Hash function

A hash function "randomly" scrambles up an element of type K into a value of type size_t. When formally defined, hash functions are deterministic—the same K will always produce the same size_t—but they are discontinuous—small changes in the input K will produce large, unpredictable changes in the output size_t. These properties are key to how unordered data structures accelerate their element lookups.

Many types, e.g. int, double, std::string, have built-in hash functions (you can see the full list here). Other seemingly basic types, such as std::pair and std::tuple, do not. To add a hash function to an unsupported type, do one of the following:

  1. Specialize the std::hash functor. This is the most common way to add a hash function to a type. It involves creating a template specialization for the std::hash<T> functor, which is the default strategy unordered containers use to hash their elements. The syntax is as follows:

  2. Define a custom functor. This is an alternative to the above syntax if you do not want to change the default hash function for all consumers of a type. For example:

  3. Use a lambda function. Similar to the above, this is useful if you don't want to override the global operator<, and also avoids creating a new functor type. Functionally, it is the same as 2:

When writing your own hash functions, make sure that the output size_t are well distributed—as we will see, the performance of the container you use depends on how "random" its hash function outputs are. It is worthwhile to read this Wikiedia article for more information on how to design a good hash function. A good way to combine hash values, for example, is to utilize bit-shift, XOR, and multiplication by prime numbers, as the following example of a hash function for an std::vector<T> demonstrates. Preferrably, use a third-party library like boost::hash_combine to combine hash values in a way that yields a good distribution over the integers.

Equality function

The equality function (or key-equality function for unordered_map) checks if two elements/keys are equal. By default, unordered_map and unordered_set will try to use operator==. If your type is missing an operator==, you have a few options:

  1. Define an operator==. See the chapter on operator overloading for more information.

  2. Define a functor. This is recommended if you don't want to overload the global operator== for MyType. For example:

  3. Use a lambda function. Similar to the above, this is useful if you don't want to override the global operator<, and also avoids creating a new functor type. Functionally, it is the same as 2:

    Here, we are passing the default value for the hash function, std::hash<T> {}, to the container constructor, but any hash function can be used in practice.

std::unordered_map<K, V>

unordered_map is an accelerated version of map that allows looking up the value for a key in constant time, at the cost of using additional memory.

unordered_map supports all of the operations of std::map (i.e. it can serve as a drop-in replacement for a map), as well as a few more operations specific to how it works behind the scenes. Note that like map, the operator[] on unordered_map will default initialize values for keys which do not exist!

ExpressionResult
m.load_factor()Returns the current load factor (explained below) of the map
m.max_load_factor(lf)Sets the maximum allowed load factor to lf
m.rehash(b)Ensures m has at least b buckets and reassigns elements to buckets given the new bucket count

std::unordered_set<T>

Similar to unordered_map, unordered_set is an accelerated version of set that allows checking set-membership in constant time.

unordered_set supports all of the operations of std::set, as well as the additional operations listed for unordered_map.

Behind the scenes

One of the performance bottlenecks of map and set is that to lookup an element, they must perform a binary search on their internal trees, taking O(logn)O(\log n) time. In most applications, this is acceptable, as logn\log n even for very large nn is managable—even a map with one billion entries will only need to traverse log210930\log_2 10^9 \approx 30 nodes to find a match in the worst case. However, if the overhead of this search is unacceptable, we can achieve faster lookup at the expense of higher memory usage by rearranging the way container elements are stored: this is exactly what unordered_map and unordered_set do.

Let's take a look at how unordered_map works (as we will see, unordered_set works identically). Like std::map, the C++ standard enforces no restrictions on what data structure unordered_map uses to organize its elements, but compilers almost always use a separate chaining hash table[1]. The map picks some value bb (known as the bucket count) and keeps track of an array of bb linked lists, known as the buckets. On insertion, each key-value pair is placed into one of the bb buckets depending on the value of bb and the map's hash function. Specifically, if ff is the hash function, then a key kk will go into the bucket with index f(k)modbf(k) \mod b. Note that hash functions are not injective: there can exist keys k1k2k_1\neq k_2 for which f(k1)=f(k2)f(k_1)=f(k_2)[2], so more than one key can end up in the same bucket. After taking the modulo by bb, the chance of such a hash collision occuring increases even more as the output of ff is squashed to fit inside the range [0,b)[0, b).

To determine the value of a key kk, the unordered_map first determines the bucket index of kk and then loops through the key-value pairs of that bucket, applying the key-equality function ee to each one until a matching key (and corresponding value) is found. Assuming that evaluating ff and ee is O(1)O(1) (an assumption that is often made in practice), then the time complexity of a key lookup will depend on the number of elements the map must search through within each bucket before finding a matching key—in other words, the number of items in the bucket. On average, an unordered_map with nn mappings and bb buckets will have α=nb\alpha=\frac{n}{b} items per bucket. This value, α\alpha, is called the load factor of the unordered_map.

By keeping α\alpha small, the unordered_map will on average have to examine fewer items to find a key's value. In fact, unordered_map will keep this number bounded by some constant threshold: by default, the max_load_factor is set to 1.0. Should the load factor exceed this threshold, the map will allocate additional buckets and rehash. Because the map must search at most a constant α\alpha number of items during key lookup, key lookup in an unordered_map is considered O(1)O(1). Let's take a look at an example and see what the unordered_map memory looks like:

L1

Here we can see the map has chosen b=5b=5, and maintains 5 buckets. By random chance, two key-value pairs have landed in bucket 0 and none in bucket 2. The value b=5b=5 was chosen arbitrarily—in fact, it can be customized in the unordered_map constructor. The only requirement on bb is that it meets the load factor requirement: nb=α1.0\frac{n}{b}=\alpha\leq 1.0. In this case, α=55\alpha=\frac{5}{5}, or exactly 1.01.0.

Stack
m
unordered_map<string, size_t>
b5
buckets
Heap
Heap
ListNode
value
firstGustav
second64
next
ListNode
value
firstDmitri
second147
next
ListNode
value
firstJohann
second1128
next
ListNode
value
firstLudwig
second722
next
Heap
ListNode
value
firstAmadeus
second626
next

L2

To find the value for "Amadeus", lookup proceeds as follows. First we compute f("Amadeus")mod5f(\texttt{"Amadeus"}) \mod 5 to find the bucket index (0). Then we loop through all key-value pairs in bucket 0 until we find the one which equals "Amadeus" according to the key-equality function. "Gustav" is skipped.

Note: Even if bucket 0 had only a single element, we would still need to loop through its elements and apply the key-equality function, since it might happen to have a single key that coincidentally has the same bucket index as "Amadeus".

Stack
m
unordered_map<string, size_t>
b5
buckets
amadeus626
Heap
Heap
ListNode
value
firstGustav
second64
next
ListNode
value
firstDmitri
second147
next
ListNode
value
firstJohann
second1128
next
ListNode
value
firstLudwig
second722
next
Heap
ListNode
value
firstAmadeus
second626
next

L3

After inserting the key-value pair { "Franz", 600 } to the map, the resulting load factor was α=65=1.2≰1.0\alpha=\frac{6}{5}=1.2 \not \leq 1.0. Because α\alpha exceeded the threshold of 1.01.0, the map had to allocate more buckets and rehash. To choose the new bucket count, the old bucket count is typically doubled. Some compilers, such as G++ in this example, will further stipulate that the new bucket count always be a prime number, as taking the modulo by this count gives a better distribution of items over the buckets (11 is used in this example). Notice that after rehashing, the relative arrangement of items has completely changed as a result of the new bucket count.

Stack
m
unordered_map<string, size_t>
b11
buckets
amadeus626
Heap
Heap
ListNode
value
firstDmitri
second147
next
ListNode
value
firstLudwig
second722
next
ListNode
value
firstAmadeus
second626
next
ListNode
value
firstGustav
second64
next
ListNode
value
firstFranz
second600
next
Heap
ListNode
value
firstJohann
second1128
next

As was the case for set, unordered_set uses the same hash-table data structure as unordered_map. You can think of unordered_set as being implemented as a hash-table whose nodes contain only the elements (as opposed to key-value pairs).

Which should I use?

A perennial question among C++ developers is: given the choice between a map and an unordered_map, which one should I choose? The answer to this question will depend on the specific problem you are solving and what your requirements are. As a general rule of thumb, unordered_map will offer superior performance over map while using far more memory (at the very least, a lot of empty buckets are needed to keep the load factor small). In general, as memory tends to be less of a concern in software development, unordered_map seems to be the more promising choice. However, in select circumstances map can still outperform unordered_map. Below is a (far from an exhaustive) table of where each container type shines:

ScenarioOrdered Container (map/set)Unordered Container (unordered_map/unordered_set)
Ordering✔️ Allows iteration of keys/element in sorted order❌ Has no dependable order. Elements are iterated in an essentially random order
Speed❌ Finding/removing an element requires a binary tree traversal, average O(logn)O(\log n)✔️ Hash functions allows near constant-time element lookup and removal, average O(1)O(1)
Memory✔️ Small memory footprint, including only the element data and bookkeeping data for binary search tree❌ Large memory footprint, must allocate many empty buckets in order to keep load factor small
Element operations✔️ Comparing elements using operator< is usually faster, since operator< often only needs to examine part of its operands to determine that one is smaller❌ Must invoke hash function, which must inspect the entire key object, and equality function
Worst-case lookup✔️ Looking up an element takes at most O(logn)O(\log n), requiring a tree traversal❌ Looking up an element takes at most O(n)O(n), e.g. if poor choice of hash function places all elements in the same bucket
Worst-case resize✔️ Adding elements is always O(logn)O(\log n) in the worst case, requiring a tree traversal❌ Adding elements may trigger a rehash, taking O(n2)O(n^2) in the worst case
Working set[3]✔️ Tends to preserve its working set, i.e. the same set of nodes are frequently accessed during lookups, even as the container resizes❌ Rehashing essentially produces a new working set. After rehashing, the buckets are completely rearranged, causing cache lines that were previously hot to become cold and vice-versa.

Footnotes

  1. Other hashing strategies exist, such as open addressing, but separate chaining is used in practice due to certain time complexity restrictions guaranteed by methods of unordered_map and unordered_set in the C++ standard.

  2. One intuitive way to make sense of this fact is to consider the hash function for a std::string. Intuitively, there are many more possible std::string instances than size_t, so there must be at least two std::string instances with the same hash value.

  3. The working set of a program is the set of locations in memory it frequently accesses. For reasons that are beyond the scope of this textbook, low-latency applications benefit from having a working set that doesn't change much. Accessing the same addresses frequently is usually faster than accessing many different locations, which can happen, for example, when an unordered_map has to rehash.