Learn C++

Search textbook... 

⌘K

Iterators

Iterators provide a uniform interface for traversing container data structures.

So far, we have covered a variety of C++ data structures—std::map, std::vector, std::set, std::deque, std::unordered_map, std::unordered_set—all of which have a variety of different implementations behind the scenes—binary search tree, contiguous allocation, array of arrays, hash table. Despite this internal complexity, we are still able to do certain operations on these containers without worrying about how they are implemented behind the scenes. For example, consider this code[1]:

How does this syntax work? Under the hood, the compiler does a syntactic transformation of the above code into the following:

What is std::begin(container)? What is it? As we will see, these are iterators, data types that share the same semantics as pointers and allow generic iteration over a data structure, independent of how that data structure was implemented.

Iterator Basics

Imagine you want to iterate through every element in an std::vector. You might have code that looks like this:

This code uses an index i to keep track of which element the for loop is currently at. You have likely seen this code many times before, or something like it. Once i is no longer a valid index, we stop looping—otherwise, we increment i at the end of every iteration.

What if we wanted to apply the same kind of loop to an std::set? Recall from the previous chapter that unlike std::vector, std::set does not store its elements sequentially in memory. We also cannot use an integer index to get an element from a set, since it has no operator[]. What we'd like is some kind of construct that allows us to keep track of where we are in a container while iterating, but which can handle more powerful data structures than std::vector.

Container Interface

Roughly speaking, an iterator functions as a pointer to an element in a container. The iterators of a container define a sequence of elements—each iterator knows how to get to the next iterator in the sequence. Every C++ container has built-in methods that return iterators to the beginning and end of its container. These methods are guaranteed to run in constant-time.

Container MethodDescription
std::begin(c)
c.begin()
Gets an iterator to the first element of a container
std::end(c)
c.end()
Gets an iterator to the past-the-end element of a container

Note that c.end() is a past-the-end iterator—it doesn't actually point to any element of the container, but is instead one past the last element of the container. This helps us to build looping logic and represent empty containers. Consider the following set and its iterators:

Stack
s
begin
end

Elements

12345

Note that an empty container has c.begin() and c.end() pointing to the same (non-existent) element:

Stack
s
begin
end

Elements

Iterator Interface

Iterators, on the other hand, all provide a simple set of operations to allow traversing through a container. Like the container methods, these are also required to run in constant-time.

Iterator MethodDescription
auto it = c.begin()Copy Construction: operator= creates a copy of an existing iterator that points to the same element.
++it OR it++Increment: operator++ moves the iterator forward onto the next element.
it == c.end()Comparison: operator== determines whether two iterators point to the same element.
*itIndirection: operator* returns a reference to the underlying element. Whether this reference can be read or written to depends on whether it is an input- or an output iterator (described below).

Be careful not to increment an iterator past the end (e.g. ++c.end()) or try to dereference the end iterator (*c.end()). Both of these are undefined behaviour in C++!

Putting both the container and iterator interfaces together, try reasoning through what this code to iterate through a container is doing now:

If you have read and are familiar with the chapter on pointers, you might notice that semantics of iterators are very similar to that of pointers. Iterators, like pointers, can be dereferenced, incremented, compared, etc. In some sense, iterators are a generalization of pointers to memory that is not necessarily sequential. This allows more complex data types, such as unordered_map, to seem as if their elements are represented sequentially in memory.

Note: You may notice in this chapter that we use ++it instead of it++ when incrementing iterators. To see why, it can help to look at the call signatures of these two versions and how they differ:

In particular, the prefix form (++it) updates an iterator in-place, returning a reference to the same iterator. The postfix form (it++) still updates the iterator, but returns a copy of the old value of the iterator before it was incremented. For this reason, it++ can be a bit slower than ++it because an extra copy is created[2].

Iterator Types

So far, we have used auto to let the compiler deduce the type of, for example, c.begin(). What actually is the type of an iterator? This will depend on the container, but generally speaking, given a container type C, its iterator type will be C::iterator. For example, std::string::iterator and std::unordered_map<std:string, double>::iterator are both iterator types.

Under the hood, these iterator types are implemented by the compiler internally for each data structure and type-aliased with a using definition inside of the container class. For example, consider one possible definition for a std::vector:

This code works because std::vector actually does store its elements sequentially in memory (through a heap allocated buffer) and pointers, as previously mentioned, share the basic semantics of iterators. A look behind the scenes for a more complex type like std::unordered_map might reveal a different implementation:

where _unordered_map_iterator is some internal, compiler-dependent type representing an iterator to an unordered map and which overloads all the necessary operations to meet the interface criteria for an iterator. _unordered_map_iterator is just an example—the actual type used will depend on the compiler.

Iterator Categories

Not all iterators support the same set of operations. As a consequence of how their container is defined internally, some are more "powerful" than others. This is primarily a result of the restriction in the C++ Standard that all iterator methods be O(1)O(1) in runtime complexity. For example, it is trivial to jump ahead by nn elements with a vector iterator in O(1)O(1) time since a vector's memory is laid out sequentially, but not so straightforward to do so for an std::unordered_map whose elements are arranged in a distributed fashion. This gives rise to a variety of different iterator categories: each container's iterators fall into one of the following categories depending on what operations it supports.

A diagram showing the difference between iterator categoriesA diagram showing the difference between iterator categories

Output

An iterator is an output iterator if it supports overwriting the pointed to element via operator=, e.g.:

Some container iterators will not be output if modifying their element would require restructuring the container. For example, changing the key of an element in a std::map might change where that element lives in its binary-search tree, so std::map<K, V>::iterator is not output.

Supported OperationDescription
*it = elemOverwriting element: operator* returns a reference whose value can be overwritten, changing the value of the element the iterator points to.

Input

An iterator is an input iterator if it supports reading the pointed to element, e.g.:

Almost all iterators are input iterators[3]—in fact, all of the following iterator categories are specializations of input iterators.

Supported OperationDescription
auto elem = *itReading element: operator* returns a reference which represents the element pointed to by the iterator.

Forward

Up to this point, we have not actually specified whether it is valid to make multiple passes over the iterators of a container. For example, given a range of elements between begin and end iterator, would it be valid to traverse these elements multiple times?

Forward iterators guarantee that multiple passes are valid. Every STL container's iterators are forward—intuitively, this should make sense: simply iterating over a container's elements doesn't change them in a way that would prevent iterating again. Outside of the STL containers, however, this is not generally true[4].

More formally, forward iterators must satisfy the multi-pass guarantee. That is, given iterators a and b which point to the same element (a == b), it must hold that ++a == ++b. In plain English, moving both iterators forward (in either order) should land them at the same element.

Bidirectional

Bidirectional iterators are a kind of forward iterator that can be moved backwards as well as forwards, e.g.

A container's iterators will be bidirectional if there is some way to identify the previous element. This may be the case if the container is sequential (like a vector) or its elements are sorted (like a std::map). Some containers have no easy way to go backwards from an iterator (or choose not provide this behaviour) like std::unordered_map.

Be careful not to decrement before the begin iterator! Writing --c.begin() is undefined behaviour, just like ++c.end().

Supported OperationDescription
--it OR it--Decrement: operator-- moves the iterator backwards to the previous element.

Random Access

A random-access iterator is a bidirectional iterator that supports jumping forwards or backwards multiple elements at a time.

Negative values will move the iterator backwards. These iterators closely resemble pointers in their syntax. For example, we can use operator[] to combine an increment/dereference together:

Once again, be careful not to go out of bounds or dereference the end iterator!

Supported OperationDescription
it += nRandom Access: If nn is positive, operator+= moves it forward n hops—this has the equivalent effect as calling it++ nn times except the operation is done in constant time. For negative nn, operator+= moves it backwards.
it -= nRandom Access: Identical to it += -n
it + nRandom Access: Creates a new iterator n hops forward
it - nRandom Access: Creates a new iterator n hops backward
it1 < it2
it1 <= it2
it1 > it2
it1 >= it2
Ordered Comparison: Checks if it1 comes before or after, respectively, it2 in the sequence

Contiguous

Contiguous iterators are a subset of random-access iterators that further stipulate that their elements are stored contiguously in memory. For example, an std::deque is random-access, but not contiguous (recall its implementation details).

Functionally, there is not much difference between contiguous and random-access iterators. However, taking the address of the elements pointed to by these iterators (&*it) will reveal that they are stored contiguously in memory.

Iterator Invalidation

What happens to an iterator if we modify its underlying container? Iterators, much like pointers, point to a fixed location in memory where their element is stored, plus some bookkeeping data to derive the location of the next element. As a result, operations which restructure a container may invalidate previously obtained iterators. This can lead to undefined behaviour if we are not careful.

Here's a table sumarizing which operations will and will not invalidate iterators for the containers discussed in this textbook:

MethodIterators Valid?Precondition/Notes
std::vector
push_back
insert
capacity() changed. If the vector had to reallocate its internal buffer, then the elements will be copied over to a new buffer, invalidating all existing iterators.
push_back
insert
Iterators after modified element. These iterators will be pushed forwards, so they will no longer refer to the same elements.
push_back
insert
All other cases
pop_back
erase
Iterators after modified element. These iterators will be pushed backwards, so they will no longer refer to the same elements.
pop_back
erase
All other cases
std::deque
push_front
push_back
insert
All iterators are invalidated
pop_front
pop_back
Iterators to front/back elements
pop_front
pop_back
All other cases
eraseIf middle elements were erased, all iterators are invalidated
std::map, std::set
insert
operator[]
erase
Except for iterators to erased element
std::unordered_map, std::unordered_set
insert
operator[]
Insertion caused rehash[5]
insert
operator[]
erase
Except for iterators to erased element

Iterator Flavors

Occasionally, you will discover some variants of the above iterator concepts when working with C++ in practice. These iterator "flavors" allow us to handle const containers more appropriately, as well as work with bidirectional iterators in reverse order.

const Iterators

Given a const container, we would not want to allow elements of that container to be modified through their iterators. This is the idea of const correctness—objects that are marked const should not be allowed to be modified through any part of their interface.

const iterators allow for container types to conform to this principle. A container type C with elements of type T will in practice have two iterator types:

  • C::iterator which points to elements of type T (e.g. std::string::iterator points to char)
  • C::const_iterator which points to elements of type const T (e.g. std::string::const_iterator points to const char)

Namely, this means that you cannot modify the elements that a const_iterator points to. As a result, every const_iterator is necessarily not an output iterator, since you can't write to the underlying element (however, you can still have non-output containers which have a meaningful distinction between their iterator and const_iterator types![6]).

Practically speaking, const_iterator can be used anywhere an iterator for the same container was expected, so long as you do not require modification. For example, iterator algorithms (discussed in the next chapter) which do not modify their range (e.g. std::count_if counts the number of elements between two iterators) will work just as well on const_iterators as they do on regular iterators. However, algorithms which do modify their range (e.g. std::sort sorts the elements between two iterators in-place) will not compile for const_iterator.

Given a const container c, calling c.begin() or c.end() will automatically return const_iterators through the provided const overloads for these methods. However, if you require a const iterator for a non-const container, you can request these through the following convenience methods defined on every standard library container:

Container MethodDescription
std::cbegin(c)
c.cbegin()
Gets a const_iterator to the first element of a container
std::cend(c)
c.cend()
Gets a const_iterator to the past-the-end element of a container

Reverse Iterators

Bidirectional iterators provide an easy way for us to create a reverse ordering of the elements in a container. This is useful if we wanted to iterate through a container in reverse, or if we want to apply an iterator algorithm in reverse order (e.g. std::find(c.begin(), c.end(), v) searches for a value v starting from the left—what if we wanted to search from the right?). This is precisely what reverse iterators accomplish.

Every container whose iterators are bidirectional provides a reverse iterator interface for iterating backwards through a sequence. These methods are analogous to their ordinary counterparts:

Container MethodDescription
std::rbegin(c)
c.rbegin()
Gets an iterator to the first element in the container's reversed sequence. Dereferencing this iterator yields the last element in the container's ordinary (unreversed) sequence.
std::rend(c)
c.rend()
Gets an iterator to the past-the-end element in the container's reversed sequence. It is invalid to dereference this iterator—conceptually, it points to a non-existent element one before the first element in the container's ordinary (unreversed) sequence.
std::crbegin(c)
c.crbegin()
Returns the const-iterator version of rbegin()
std::crend(c)
c.crend()
Returns the const-iterator version of rend()

Reverse iterators can only exist for bidirectional iterators: behind the scenes, a reverse iterator stores a regular iterator and moving forward (operator++) on the reverse iterator decrements the stored iterator (operator--). Reverse iterators are implemented as a templated wrapper (std::reverse_iterator) around a bidirectional iterator, so containers do not need to implement any additional functionality to achieve this result. In effect, a reverse iterator stores an iterator to the element one ahead its dereferenced value, as explained by the diagram below.

Conceptually, we think of reverse iterators as pointing to elements in a reversed sequence, with semantics identical to ordinary iterators.

Stack
v
rbegin
rend

Elements

12345

Behind the scenes, reverse iterators actually store an iterator that is one ahead (i.e. in the ordinary sequence) of where they conceptually point to.

Stack
v
rbegin
rend

Elements

12345

The motivation behind this implementation detail is that a true "before-the-start" iterator (what rend represents) would be undefined behaviour in C++—recall that --v.begin() is not valid in C++.

We can get the actual iterator a reverse iterator stores by calling the member function base() on it, e.g.

Notice that v.rbegin() actually stores v.end() as its base iterator, while v.rend() stores v.begin().

Stack
v
rb
re
rb_base
re_base

Elements

12345

Reverse iterators generally inherit the category of their underlying iterator (e.g. random access iterators are random access in reverse). The only exception to this is contiguous iterators—since the elements are ordered in reverse, they are no longer strictly contiguous as the addresses decrease instead of increase as you move forward in the reverse sequence.

Deep Dive: std::deque::iterator

In this section, we will see how an iterator might actually be implemented for a real STL container data structure—in this case, an std::deque. Compilers are free to implement iterators however they choose, so long as the iterator operators are constant time and respect the invalidation rules above. In this section, our implementation will roughly mirror the g++ compiler's implementation, whose source code can be found here, with a few simplifications. std::deque<T>::iterator is a random-access iterator, but for brevity, we will only implement the bidirectional iterator operations, namely operator*, operator++, and operator--. The remaining random-access operations are left as an exercise to the reader.

Recall from the chapter on sequence containers that a deque organizes its elements as an array of fixed-sized blocks of elements:

Stack
d
deque<int>
start
finish
blocks
capacity2
Heap
blocks
Heap
_456
789_

Contrary to what was presented in the sequence containers chapter, start and finish are actually iterators to the elements, not mere indexes. These are represented here in the diagram above as dashed arrows. The implementation of these iterators is what will be discussed in the following section. Notice that finish is a past-the-end iterator.

Accordingly, we can imagine that a std::deque, behind the scenes, might look something like this:

_deque_iterator is the name of the iterator type for a deque which we will implement. Before we do so, take note of a few details:

  • iterator and const_iterator are type aliases for an underlying _deque_iterator to a T or const T, respectively.
  • BLOCK_SIZE is the fixed size of the individual blocks. In actual practice, g++'s definition of this value is a bit more involved, with the actual size depending on the type T.
  • begin() and end() have const overloads. This will pose a problem for the const-versions of these methods, since _deque_iterator<T> cannot be converted to _deque_iterator<const T> by default. As a result, we will need to make sure our _deque_iterator provides a constructor for converting between these as a result.
  • rbegin() and rend() (and their const overloads) call std::make_reverse_iterator, which given a bidirectional iterator, produces a new iterator to the reversed range. We use auto to let the compiler deduce the return type, which will be std::reverse_iterator<iterator> (or std::reverse_iterator<iterator> for the const versions).

So what goes into implementing _deque_iterator? For starters, we must decide what data the iterator needs to track in order to increment and decrement the iterator. This poses an interesting challenge: if an iterator points to the end of one block, operator++ needs to somehow know to move on to the start of the next block. operator-- must likewise reposition iterators to the final element of the previous block. To solve this, compilers typically keep track of four pointers: the element the iterator points to, the element's block, and the first and last element in that block. In code, we might have a _deque_iterator that looks like this:

Below is an example of what the d.begin() iterator might look like for the deque shown at the beginning of this section. The animated arrow is the actual pointer to the element in the deque that the iterator points to.

Stack
d
deque<int>
blocks
capacity2
d.begin()
_deque_iterator
block
current
first
last
Heap
blocks
Heap
_456
789_

In the above implementation, our _deque_iterator class has the four pointers as described, with a constructor that initializes them and a copy constructor that will handle conversion from an iterator to a const_iterator (both constructors are constant-time, as required). We have hidden the implementation of the core iterator operators—the prefix operator++() and operator--()—which we will cover shortly. Notice that the postfix forms—operator++(int) and operator--(int)—are implemented in terms of their prefix equivalents.

To implement operator++, we will attempt to move the current pointer to the next contiguous element within its block. If this increment would cause current to exceed its block, we will reposition it to the next block pointer, setting first and last accordingly. This could be implemented as:

Notice that while the individual blocks of a deque are not all contiguous in memory, the blocks array which contains all the pointers-to-blocks is itself contiguous! This allows us to write ++block in the above implementation. Also notice that operator++ is constant-time, as required.

Implementing operator-- will take a similar approach:

Voila! Putting everything together, we have built a bidirectional deque iterator. To complete it, we would need to add the random-access operators, in particular operator+= and operator-=. One approach might be to call operator++ and operator-- above n times, except that this would violate the restriction that iterator operations are constant-time. Instead, the actual random-access operators will precompute how many blocks they need to skip, and skip them in one fell swoop (e.g. blocks += block_offset). We will not implement this here, but we encourage you to consider how you might implement it yourself or take a look at the g++ source for these methods.


Footnotes

  1. In this snippet, we use const auto& for the container element. This is a recommended standard practice, as it (1) avoids making a copy of each element as you iterate (particularly important if the element type is large) and (2) prevents us from modifying the container element as we iterate (it is const). You can, however, write auto elem : container to copy each element if necessary or for primitive types where copying is not likely to be a performance concern (e.g. int, float, double, etc.).

  2. In actual practice, the extra copy created by it++ may be optimized out by the compiler if it is not used, so these two versions may end up exhibiting the same performance after all optimizations are applied. We encourage you to check out this FAQ post on the isocpp site (co-authored by the original creator of C++) which states:

    So if you’re writing i++ as a statement rather than as part of a larger expression, why not just write ++i instead? You never lose anything, and you sometimes gain something.

  3. One example of an iterator which is not an input iterator is std::ostream_iterator. This iterator represents a position inside an std::ostream, and allows inserting elements into an ostream (but not reading them). For example, the following code writes comma-separated double values to std::out:

    This code allows us to write to an std::ostream through an iterator interface. To do this, it employs some operator overloading trickery. For one, operator* returns a reference to the same iterator. That is, operator* is implemented as:

    Meanwhile, operator= is overloaded to write elements to the underlying stream, e.g.

    where std::ostream& out_stream and const char* delim are private fields of the iterator, assigned on construction. Together, these overloads allow code like *it = 3.14 to write to the underlying stream. Notice that reading the value of *it is meaningless—std::ostream_iterator is not an input iterator. *it is simply a reference to the same iterator and doesn't actually refer to any "element" in the stream.

    This is also the way that the commonly used std::back_inserter, std::front_inserter and std::inserter work as well.

  4. For example, consider std::istream_iterator, which allows us to iterate over the elements of an istream, as shown in the following example:

    The above code reads doubles from the underlying str stream. Each time the iterator it is advanced, a double value is read from str and stored into it. We should not expect that running this for loop again (starting at begin) would yield the same output, since the stream has been modified! Hence, std::istream_iterator is not a forward iterator.

  5. This shows another benefit to using std::map over std::unordered_map if iterators to elements are employed extensively: std::map has more stable iterators that are less likely to be invalidated.

  6. As an example of this, consider std::map::iterator, which points to a std::pair<const K, V> representing a key-value pair in the map. This iterator is not output, since modifying the entire key-value pair might change the key, which would change where that entry is stored inside of the map (see the chapter on associative containers as to why)—this is precisely why the key is a const K. That said, we can still modify the value through a std::map::iterator. For example, the following code is valid:

    Now consider what would happen if it was a const_iterator:

    In this updated example, it is a const_iterator since m is marked const (const-correctness). Since it is a const_iterator, the entire element is const and we cannot update the value of "Fabio" to 106 as we were able to before.