Learn C++
Search textbook...
Iterators provide a uniform interface for traversing container data structures.
So far, we have covered a variety of C++ data structures—std::map
, std::vector
, std::set
, std::deque
, std::unordered_map
, std::unordered_set
—all of which have a variety of different implementations behind the scenes—binary search tree, contiguous allocation, array of arrays, hash table. Despite this internal complexity, we are still able to do certain operations on these containers without worrying about how they are implemented behind the scenes. For example, consider this code[1]:
How does this syntax work? Under the hood, the compiler does a syntactic transformation of the above code into the following:
What is std::begin(container)
? What is it
? As we will see, these are iterators, data types that share the same semantics as pointers and allow generic iteration over a data structure, independent of how that data structure was implemented.
Imagine you want to iterate through every element in an std::vector
. You might have code that looks like this:
This code uses an index i
to keep track of which element the for loop is currently at. You have likely seen this code many times before, or something like it. Once i
is no longer a valid index, we stop looping—otherwise, we increment i
at the end of every iteration.
What if we wanted to apply the same kind of loop to an std::set
? Recall from the previous chapter that unlike std::vector
, std::set
does not store its elements sequentially in memory. We also cannot use an integer index to get an element from a set, since it has no operator[]
. What we'd like is some kind of construct that allows us to keep track of where we are in a container while iterating, but which can handle more powerful data structures than std::vector
.
Roughly speaking, an iterator functions as a pointer to an element in a container. The iterators of a container define a sequence of elements—each iterator knows how to get to the next iterator in the sequence. Every C++ container has built-in methods that return iterators to the beginning and end of its container. These methods are guaranteed to run in constant-time.
Container Method | Description |
---|---|
std::begin(c) c.begin() | Gets an iterator to the first element of a container |
std::end(c) c.end() | Gets an iterator to the past-the-end element of a container |
Note that c.end()
is a past-the-end iterator—it doesn't actually point to any element of the container, but is instead one past the last element of the container. This helps us to build looping logic and represent empty containers. Consider the following set and its iterators:
s | ● |
begin | ● |
end | ● |
Elements
|
Note that an empty container has c.begin()
and c.end()
pointing to the same (non-existent) element:
s | ● |
begin | ● |
end | ● |
Elements
Iterators, on the other hand, all provide a simple set of operations to allow traversing through a container. Like the container methods, these are also required to run in constant-time.
Iterator Method | Description |
---|---|
auto it = c.begin() | Copy Construction: operator= creates a copy of an existing iterator that points to the same element. |
++it OR it++ | Increment: operator++ moves the iterator forward onto the next element. |
it == c.end() | Comparison: operator== determines whether two iterators point to the same element. |
*it | Indirection: operator* returns a reference to the underlying element. Whether this reference can be read or written to depends on whether it is an input- or an output iterator (described below). |
Be careful not to increment an iterator past the end (e.g.
++c.end()
) or try to dereference the end iterator (*c.end()
). Both of these are undefined behaviour in C++!
Putting both the container and iterator interfaces together, try reasoning through what this code to iterate through a container is doing now:
If you have read and are familiar with the chapter on pointers, you might notice that semantics of iterators are very similar to that of pointers. Iterators, like pointers, can be dereferenced, incremented, compared, etc. In some sense, iterators are a generalization of pointers to memory that is not necessarily sequential. This allows more complex data types, such as unordered_map
, to seem as if their elements are represented sequentially in memory.
Note: You may notice in this chapter that we use
++it
instead ofit++
when incrementing iterators. To see why, it can help to look at the call signatures of these two versions and how they differ:In particular, the prefix form (
++it
) updates an iterator in-place, returning a reference to the same iterator. The postfix form (it++
) still updates the iterator, but returns a copy of the old value of the iterator before it was incremented. For this reason,it++
can be a bit slower than++it
because an extra copy is created[2].
So far, we have used auto
to let the compiler deduce the type of, for example, c.begin()
. What actually is the type of an iterator? This will depend on the container, but generally speaking, given a container type C
, its iterator type will be C::iterator
. For example, std::string::iterator
and std::unordered_map<std:string, double>::iterator
are both iterator types.
Under the hood, these iterator types are implemented by the compiler internally for each data structure and type-aliased with a using
definition inside of the container class. For example, consider one possible definition for a std::vector
:
This code works because std::vector
actually does store its elements sequentially in memory (through a heap allocated buffer) and pointers, as previously mentioned, share the basic semantics of iterators. A look behind the scenes for a more complex type like std::unordered_map
might reveal a different implementation:
where _unordered_map_iterator
is some internal, compiler-dependent type representing an iterator to an unordered map and which overloads all the necessary operations to meet the interface criteria for an iterator. _unordered_map_iterator
is just an example—the actual type used will depend on the compiler.
Not all iterators support the same set of operations. As a consequence of how their container is defined internally, some are more "powerful" than others. This is primarily a result of the restriction in the C++ Standard that all iterator methods be in runtime complexity. For example, it is trivial to jump ahead by elements with a vector iterator in time since a vector's memory is laid out sequentially, but not so straightforward to do so for an std::unordered_map
whose elements are arranged in a distributed fashion. This gives rise to a variety of different iterator categories: each container's iterators fall into one of the following categories depending on what operations it supports.
An iterator is an output iterator if it supports overwriting the pointed to element via operator=
, e.g.:
Some container iterators will not be output if modifying their element would require restructuring the container. For example, changing the key of an element in a std::map
might change where that element lives in its binary-search tree, so std::map<K, V>::iterator
is not output.
Supported Operation | Description |
---|---|
*it = elem | Overwriting element: operator* returns a reference whose value can be overwritten, changing the value of the element the iterator points to. |
An iterator is an input iterator if it supports reading the pointed to element, e.g.:
Almost all iterators are input iterators[3]—in fact, all of the following iterator categories are specializations of input iterators.
Supported Operation | Description |
---|---|
auto elem = *it | Reading element: operator* returns a reference which represents the element pointed to by the iterator. |
Up to this point, we have not actually specified whether it is valid to make multiple passes over the iterators of a container. For example, given a range of elements between begin
and end
iterator, would it be valid to traverse these elements multiple times?
Forward iterators guarantee that multiple passes are valid. Every STL container's iterators are forward—intuitively, this should make sense: simply iterating over a container's elements doesn't change them in a way that would prevent iterating again. Outside of the STL containers, however, this is not generally true[4].
More formally, forward iterators must satisfy the multi-pass guarantee. That is, given iterators a
and b
which point to the same element (a == b
), it must hold that ++a == ++b
. In plain English, moving both iterators forward (in either order) should land them at the same element.
Bidirectional iterators are a kind of forward iterator that can be moved backwards as well as forwards, e.g.
A container's iterators will be bidirectional if there is some way to identify the previous element. This may be the case if the container is sequential (like a vector) or its elements are sorted (like a std::map
). Some containers have no easy way to go backwards from an iterator (or choose not provide this behaviour) like std::unordered_map
.
Be careful not to decrement before the
begin
iterator! Writing--c.begin()
is undefined behaviour, just like++c.end()
.
Supported Operation | Description |
---|---|
--it OR it-- | Decrement: operator-- moves the iterator backwards to the previous element. |
A random-access iterator is a bidirectional iterator that supports jumping forwards or backwards multiple elements at a time.
Negative values will move the iterator backwards. These iterators closely resemble pointers in their syntax. For example, we can use operator[]
to combine an increment/dereference together:
Once again, be careful not to go out of bounds or dereference the end
iterator!
Supported Operation | Description |
---|---|
it += n | Random Access: If is positive, operator+= moves it forward n hops—this has the equivalent effect as calling it++ times except the operation is done in constant time. For negative , operator+= moves it backwards. |
it -= n | Random Access: Identical to it += -n |
it + n | Random Access: Creates a new iterator n hops forward |
it - n | Random Access: Creates a new iterator n hops backward |
it1 < it2 it1 <= it2 it1 > it2 it1 >= it2 | Ordered Comparison: Checks if it1 comes before or after, respectively, it2 in the sequence |
Contiguous iterators are a subset of random-access iterators that further stipulate that their elements are stored contiguously in memory. For example, an std::deque
is random-access, but not contiguous (recall its implementation details).
Functionally, there is not much difference between contiguous and random-access iterators. However, taking the address of the elements pointed to by these iterators (&*it
) will reveal that they are stored contiguously in memory.
What happens to an iterator if we modify its underlying container? Iterators, much like pointers, point to a fixed location in memory where their element is stored, plus some bookkeeping data to derive the location of the next element. As a result, operations which restructure a container may invalidate previously obtained iterators. This can lead to undefined behaviour if we are not careful.
Here's a table sumarizing which operations will and will not invalidate iterators for the containers discussed in this textbook:
Method | Iterators Valid? | Precondition/Notes |
---|---|---|
std::vector | ||
push_back insert | ❌ | capacity() changed. If the vector had to reallocate its internal buffer, then the elements will be copied over to a new buffer, invalidating all existing iterators. |
push_back insert | ❌ | Iterators after modified element. These iterators will be pushed forwards, so they will no longer refer to the same elements. |
push_back insert | ✅ | All other cases |
pop_back erase | ❌ | Iterators after modified element. These iterators will be pushed backwards, so they will no longer refer to the same elements. |
pop_back erase | ✅ | All other cases |
std::deque | ||
push_front push_back insert | ❌ | All iterators are invalidated |
pop_front pop_back | ❌ | Iterators to front/back elements |
pop_front pop_back | ✅ | All other cases |
erase | ❌ | If middle elements were erased, all iterators are invalidated |
std::map , std::set | ||
insert operator[] | ✅ | |
erase | ✅ | Except for iterators to erased element |
std::unordered_map , std::unordered_set | ||
insert operator[] | ❌ | Insertion caused rehash[5] |
insert operator[] | ✅ | |
erase | ✅ | Except for iterators to erased element |
Occasionally, you will discover some variants of the above iterator concepts when working with C++ in practice. These iterator "flavors" allow us to handle const
containers more appropriately, as well as work with bidirectional iterators in reverse order.
const
IteratorsGiven a const
container, we would not want to allow elements of that container to be modified through their iterators. This is the idea of const correctness—objects that are marked const
should not be allowed to be modified through any part of their interface.
const
iterators allow for container types to conform to this principle. A container type C
with elements of type T
will in practice have two iterator types:
C::iterator
which points to elements of type T
(e.g. std::string::iterator
points to char
)C::const_iterator
which points to elements of type const T
(e.g. std::string::const_iterator
points to const char
)Namely, this means that you cannot modify the elements that a const_iterator
points to. As a result, every const_iterator
is necessarily not an output iterator, since you can't write to the underlying element (however, you can still have non-output containers which have a meaningful distinction between their iterator
and const_iterator
types![6]).
Practically speaking, const_iterator
can be used anywhere an iterator
for the same container was expected, so long as you do not require modification. For example, iterator algorithms (discussed in the next chapter) which do not modify their range (e.g. std::count_if
counts the number of elements between two iterators) will work just as well on const_iterator
s as they do on regular iterator
s. However, algorithms which do modify their range (e.g. std::sort
sorts the elements between two iterators in-place) will not compile for const_iterator
.
Given a const
container c
, calling c.begin()
or c.end()
will automatically return const_iterator
s through the provided const
overloads for these methods. However, if you require a const
iterator for a non-const
container, you can request these through the following convenience methods defined on every standard library container:
Container Method | Description |
---|---|
std::cbegin(c) c.cbegin() | Gets a const_iterator to the first element of a container |
std::cend(c) c.cend() | Gets a const_iterator to the past-the-end element of a container |
Bidirectional iterators provide an easy way for us to create a reverse ordering of the elements in a container. This is useful if we wanted to iterate through a container in reverse, or if we want to apply an iterator algorithm in reverse order (e.g. std::find(c.begin(), c.end(), v)
searches for a value v
starting from the left—what if we wanted to search from the right?). This is precisely what reverse iterators accomplish.
Every container whose iterators are bidirectional provides a reverse iterator interface for iterating backwards through a sequence. These methods are analogous to their ordinary counterparts:
Container Method | Description |
---|---|
std::rbegin(c) c.rbegin() | Gets an iterator to the first element in the container's reversed sequence. Dereferencing this iterator yields the last element in the container's ordinary (unreversed) sequence. |
std::rend(c) c.rend() | Gets an iterator to the past-the-end element in the container's reversed sequence. It is invalid to dereference this iterator—conceptually, it points to a non-existent element one before the first element in the container's ordinary (unreversed) sequence. |
std::crbegin(c) c.crbegin() | Returns the const -iterator version of rbegin() |
std::crend(c) c.crend() | Returns the const -iterator version of rend() |
Reverse iterators can only exist for bidirectional iterators: behind the scenes, a reverse iterator stores a regular iterator and moving forward (operator++
) on the reverse iterator decrements the stored iterator (operator--
). Reverse iterators are implemented as a templated wrapper (std::reverse_iterator
) around a bidirectional iterator, so containers do not need to implement any additional functionality to achieve this result. In effect, a reverse iterator stores an iterator to the element one ahead its dereferenced value, as explained by the diagram below.
Conceptually, we think of reverse iterators as pointing to elements in a reversed sequence, with semantics identical to ordinary iterators.
v | ● |
rbegin | ● |
rend | ● |
Elements
|
Behind the scenes, reverse iterators actually store an iterator that is one ahead (i.e. in the ordinary sequence) of where they conceptually point to.
v | ● |
rbegin | ● |
rend | ● |
Elements
|
The motivation behind this implementation detail is that a true "before-the-start" iterator (what rend
represents) would be undefined behaviour in C++—recall that --v.begin()
is not valid in C++.
We can get the actual iterator a reverse iterator stores by calling the member function base()
on it, e.g.
Notice that v.rbegin()
actually stores v.end()
as its base
iterator, while v.rend()
stores v.begin()
.
v | ● |
rb | ● |
re | ● |
rb_base | ● |
re_base | ● |
Elements
|
Reverse iterators generally inherit the category of their underlying iterator (e.g. random access iterators are random access in reverse). The only exception to this is contiguous iterators—since the elements are ordered in reverse, they are no longer strictly contiguous as the addresses decrease instead of increase as you move forward in the reverse sequence.
std::deque::iterator
In this section, we will see how an iterator might actually be implemented for a real STL container data structure—in this case, an std::deque
. Compilers are free to implement iterators however they choose, so long as the iterator operators are constant time and respect the invalidation rules above. In this section, our implementation will roughly mirror the g++
compiler's implementation, whose source code can be found here, with a few simplifications. std::deque<T>::iterator
is a random-access iterator, but for brevity, we will only implement the bidirectional iterator operations, namely operator*
, operator++
, and operator--
. The remaining random-access operations are left as an exercise to the reader.
Recall from the chapter on sequence containers that a deque organizes its elements as an array of fixed-sized blocks of elements:
d | deque<int>
|
blocks
|
| ||||
|
Contrary to what was presented in the sequence containers chapter, start
and finish
are actually iterators to the elements, not mere indexes. These are represented here in the diagram above as dashed arrows. The implementation of these iterators is what will be discussed in the following section. Notice that finish
is a past-the-end iterator.
Accordingly, we can imagine that a std::deque
, behind the scenes, might look something like this:
_deque_iterator
is the name of the iterator type for a deque which we will implement. Before we do so, take note of a few details:
iterator
and const_iterator
are type aliases for an underlying _deque_iterator
to a T
or const T
, respectively.BLOCK_SIZE
is the fixed size of the individual blocks. In actual practice, g++
's definition of this value is a bit more involved, with the actual size depending on the type T
.begin()
and end()
have const
overloads. This will pose a problem for the const-versions of these methods, since _deque_iterator<T>
cannot be converted to _deque_iterator<const T>
by default. As a result, we will need to make sure our _deque_iterator
provides a constructor for converting between these as a result.rbegin()
and rend()
(and their const
overloads) call std::make_reverse_iterator
, which given a bidirectional iterator, produces a new iterator to the reversed range. We use auto
to let the compiler deduce the return type, which will be std::reverse_iterator<iterator>
(or std::reverse_iterator<iterator>
for the const
versions).So what goes into implementing _deque_iterator
? For starters, we must decide what data the iterator needs to track in order to increment and decrement the iterator. This poses an interesting challenge: if an iterator points to the end of one block, operator++
needs to somehow know to move on to the start of the next block. operator--
must likewise reposition iterators to the final element of the previous block. To solve this, compilers typically keep track of four pointers: the element the iterator points to, the element's block, and the first and last element in that block. In code, we might have a _deque_iterator
that looks like this:
Below is an example of what the d.begin()
iterator might look like for the deque shown at the beginning of this section. The animated arrow is the actual pointer to the element in the deque that the iterator points to.
d | deque<int>
| ||||||||
d.begin() | _deque_iterator
|
blocks
|
| ||||
|
In the above implementation, our _deque_iterator
class has the four pointers as described, with a constructor that initializes them and a copy constructor that will handle conversion from an iterator
to a const_iterator
(both constructors are constant-time, as required). We have hidden the implementation of the core iterator operators—the prefix operator++()
and operator--()
—which we will cover shortly. Notice that the postfix forms—operator++(int)
and operator--(int)
—are implemented in terms of their prefix equivalents.
To implement operator++
, we will attempt to move the current
pointer to the next contiguous element within its block. If this increment would cause current
to exceed its block, we will reposition it to the next block pointer, setting first
and last
accordingly. This could be implemented as:
Notice that while the individual blocks of a deque are not all contiguous in memory, the blocks
array which contains all the pointers-to-blocks is itself contiguous! This allows us to write ++block
in the above implementation. Also notice that operator++
is constant-time, as required.
Implementing operator--
will take a similar approach:
Voila! Putting everything together, we have built a bidirectional deque iterator. To complete it, we would need to add the random-access operators, in particular operator+=
and operator-=
. One approach might be to call operator++
and operator--
above n
times, except that this would violate the restriction that iterator operations are constant-time. Instead, the actual random-access operators will precompute how many blocks they need to skip, and skip them in one fell swoop (e.g. blocks += block_offset
). We will not implement this here, but we encourage you to consider how you might implement it yourself or take a look at the g++
source for these methods.
In this snippet, we use const auto&
for the container element. This is a recommended standard practice, as it (1) avoids making a copy of each element as you iterate (particularly important if the element type is large) and (2) prevents us from modifying the container element as we iterate (it is const
). You can, however, write auto elem : container
to copy each element if necessary or for primitive types where copying is not likely to be a performance concern (e.g. int
, float
, double
, etc.). ↩
In actual practice, the extra copy created by it++
may be optimized out by the compiler if it is not used, so these two versions may end up exhibiting the same performance after all optimizations are applied. We encourage you to check out this FAQ post on the isocpp site (co-authored by the original creator of C++) which states:
↩So if you’re writing
i++
as a statement rather than as part of a larger expression, why not just write++i
instead? You never lose anything, and you sometimes gain something.
One example of an iterator which is not an input iterator is std::ostream_iterator
. This iterator represents a position inside an std::ostream
, and allows inserting elements into an ostream
(but not reading them). For example, the following code writes comma-separated double
values to std::out
:
This code allows us to write to an std::ostream
through an iterator interface. To do this, it employs some operator overloading trickery. For one, operator*
returns a reference to the same iterator. That is, operator*
is implemented as:
Meanwhile, operator=
is overloaded to write elements to the underlying stream, e.g.
where std::ostream& out_stream
and const char* delim
are private fields of the iterator, assigned on construction. Together, these overloads allow code like *it = 3.14
to write to the underlying stream. Notice that reading the value of *it
is meaningless—std::ostream_iterator
is not an input iterator. *it
is simply a reference to the same iterator and doesn't actually refer to any "element" in the stream.
This is also the way that the commonly used std::back_inserter
, std::front_inserter
and std::inserter
work as well. ↩
For example, consider std::istream_iterator
, which allows us to iterate over the elements of an istream
, as shown in the following example:
The above code reads doubles from the underlying str
stream. Each time the iterator it
is advanced, a double
value is read from str
and stored into it
. We should not expect that running this for loop again (starting at begin
) would yield the same output, since the stream has been modified! Hence, std::istream_iterator
is not a forward iterator. ↩
This shows another benefit to using std::map
over std::unordered_map
if iterators to elements are employed extensively: std::map
has more stable iterators that are less likely to be invalidated. ↩
As an example of this, consider std::map::iterator
, which points to a std::pair<const K, V>
representing a key-value pair in the map. This iterator is not output, since modifying the entire key-value pair might change the key, which would change where that entry is stored inside of the map (see the chapter on associative containers as to why)—this is precisely why the key is a const K
. That said, we can still modify the value through a std::map::iterator
. For example, the following code is valid:
Now consider what would happen if it
was a const_iterator
:
In this updated example, it
is a const_iterator
since m
is marked const
(const-correctness). Since it
is a const_iterator
, the entire element is const
and we cannot update the value of "Fabio"
to 106
as we were able to before. ↩