Consider what happens if you want to erase each element of a collection that meets some condition. Deleting a single element is not an issue because you don't need the iterator afterward. Deleting all elements is similarly easy because you can simply call clear(). Deleting an arbitrary number of elements is generally written this way by most beginners:
set<int>coll;
/* ... */
for (set<int>::iterator itr=coll.begin(); itr!=coll.end(); ++itr)
if (*itr % 2 == 0)
coll.erase(itr);
Predictably, this code doesn't work. The iterator is invalidated when
My first attempted workaround was for the vector class. I wanted to ignore the iterator completely and just step through the collection with an index:
for (unsigned int i=0; i<coll.size(); ++i)
When I tried to call coll.
vector<int> coll;
/* ... */
for (unsigned int i=0; i<coll.size(); ++i)
if (coll[i]%2 == 0)
coll.erase(coll.begin() + i--);
coll.begin() is always a valid iterator (even if it equals "end()"), and you are allowed to use array arithmetic on iterators for vector<> and for deque<>. Although this solutions works, it's a bit of a hack, especially because you have to "fix up" i after the call to
So I was back to figuring out how to get a "valid" iterator after calling
In some STL collections, the
set<int> coll;
/* ... */
for (set<int>::iterator itr=coll.begin(); itr!=coll.end(); ++itr)
if (*itr%2 == 0)
itr = coll.erase(itr);
I liked this solution. No tricks, easy to write, easy to read. But it crashed. Why? Because erase() could return coll.end(), which the for loop would try to increment, which wasn't allowed. So the solution looks like this:
set<int> coll;
/* ... */
set<int>::iterator itr=coll.begin();
while ( itr!=coll.end() )
{
if (*itr%2 == 0)
itr = coll.erase(itr);
else
++itr;
}
The Microsoft version of STL implements map::erase() by returning an iterator, but this is non-standard. I therefore ran into trouble when I switched to STLport and
set<int> coll;
/* ... */
set<int>::iterator itr=coll.begin();
while ( itr!=coll.end() )
{
if (*itr%2 == 0)
{
set<int>::iterator next = itr;
++next;
coll.erase(itr);
itr = next;
}
else
{
++itr;
}
}
Later I learned that the code could be simplified:
set<int> coll;
/* ... */
set<int>::iterator itr=coll.begin();
while ( itr!=coll.end() )
{
if (*itr%2 == 0)
coll.erase(itr++);
else
++itr;
}
Note that we can't use the "hack" from the beginning of this article, which would allow us to write
coll.erase(itr--)
and then increment the iterator in a surrounding for loop. The problem is that a scalar can decrement below zero, but an iterator cannot decrement before begin().Personally, I don't like the autoincrement code above as much as when
The question is, why does this trick work? If you've spent much time in C++, you've inevitably learned about the dangers of code like this:
int i = 5;
i = i++;
The code could end with i equal to either 5 or 6 - the result is undefined because the C++ standard only guarantees that ++ will be executed after i is read and before the semi-colon. It could be incremented before or after the assignment happens. (If you are wondering why, remember that assignments in C++ can be embedded in expressions, unlike most languages.)
To reword the guarantee in the language of the C++ Standard, the only guarantee is that ++ will be executed after i is read and before the next sequence point. The sequence point is the key to understanding our call to
In the expression above, the semicolon is the sequence point. Now look at our code example:
coll.erase(itr++);
If you apply the same evaluation rules, the undefined timing of the execution of ++ would mean that the iterator could be evaluated after
- The value of itr is loaded and pushed on the stack for
erase() . - itr is incremented.
erase() is called.
Jim -
ReplyDeleteThere's a bug in that routine: the iterator gets incremented twice after a deletion, which can cause a number to be skipped. If the set starts out with 1 2 4 5 then after the loop it will contain 1 4 5.
I think there should be no increment in the for() and the body needs to be:
if( pred(*itr) )
coll.erase( itr++ )
else
++itr;
Hmm.. I'm not sure your final solution works properly either - the %2 and an even number of entries, or the memory layout is probably saving you from an infinite loop or crash. if you insert 1,2,3,4,5,6 into the set, and print out *itr inside the loop (at the start), you'll see the output of 1,2,4,6,5 (or perhaps a crash at the end) rather than the expected 1,2,3,4,5,6. You'll need a bit uglier code to avoid incrementing the iterator twice in the erase passes and to get the proper functionality:
ReplyDeleteitr=coll.begin();
while (iter!=coll.end())
{
if (*itr%2 == 0)
{
coll.erase(itr++);
}
else
{
itr++;
}
}
This gets even uglier if you're calling a method inside of the loop that might potentially modify the collection.
Sam and Anonymous,
ReplyDeleteThanks so much for your feedback, you are absolutely right. That's what I get for writing the post at 1:30am. I'll fix it next week.
I have updated and tested all of the examples to fix the double increment problem. Assuming that I didn't make any mistakes transcribing the HTML entity characters, the code should now be correct.
ReplyDeleteThanks for such a great post about ++ suffix and STL container.
ReplyDeleteThen, i think the most safe code should be:
set coll;
/* ... */
set::iterator itr=coll.begin();
while ( itr!=coll.end() )
{
if (*itr%2 == 0)
{
set::iterator next = itr;
++next;
coll.erase(itr);
itr = next;
}
else
{
++itr;
}
}
Finally, I found an answered that explained why the erase(it++) version works every time. Great article.
ReplyDeleteThanks for the explanation.
ReplyDelete