A quick introduction to implementing Python iterators

When you put a piece of Python data into a “for” loop, the loop doesn’t execute on the data itself.  Rather, it executes on the data’s “iterator.”  An iterator is an object that knows how to behave inside a loop.

Let’s take that apart.  First, let’s assume that I say:

for letter in 'abc':
    print(letter)

I’m not really iterating over ‘abc’.  Rather, I’m iterating over the iterator object that I got from ‘abc’.  That is invisible and behind the scenes, but it happens all the same.  We can get the iterator of any object with the iter() function:

>>> s = 'abc'

>>> iter(s)
<iterator at 0x10a47f150>

>>> iter(s)
<iterator at 0x10a47f190>

>>> iter(s)
<iterator at 0x10a47f050>

Notice that each time we invoke iter(s), we get back a new and different object.  (We can tell, because there is a different address in memory for each one.)  That’s because each iterator is used only once.  Once you get to the end of an iterator object, the object is thrown out, and you need to get a new one.

OK, so what can we do with this iterator object?  Why do we care about it so much?  Because we can invoke the next() function on it.  Each time we do so, we’re basically telling the object that we want to get the next piece of data that it’s providing:

>>> i = iter(s)

>>> next(i)
'a'

>>> next(i)
'b'

>>> next(i)
'c'

So far, so good: Each time we invoke next(i), we ask our iterator object (i) to give us the next element.  But there are only three elements in s, which raises the question of what we’ll get when we invoke next() another time:

>>> next(i)
StopIteration

In other words, Python raises an exception (StopIteration) when we get to the end.  We can now invoke next(i) as many times as we want; we’ll always get StopIteration, which indicates that there is nothing more to get.

You can thus think of a “for” loop as a “while” loop that catches the StopIteration exception, and then leaves the loop when it happens. Consider this function:

def myfor(data):
    i = iter(data)
    while True:
        try:
            print next(i)
        except StopIteration:
            break

Now, this “myfor” function only prints the elements of the sequence it was given, so it’s not really a replacement for loop.  But it’s not a bad way to begin to understand how these things work. Our function starts off by getting an iterator for our data.  It then assumes that we are going to iterate forever on the object, using the “while True” infinite loop. However, we know that when next(i) is done providing elements of data, it will raise StopIteration.  At that point, we’ll catch the exception and return from the function.

Let’s assume that you want to make instances of your class iterable. This means that when we invoke iter() on an instance of your class, we’ll want to get back an iterator.  Which means that we’ll want to get back an object on which we can invoke next(), and either get the next object or the StopIteration exception.

The easiest way to do this is to define both __iter__ (which is invoked when you run iter() on an object) and __next__ (which is invoked when you run next() on an iterator) within your class object. That is, you’ll define __iter__ to return self, because the object is its own iterator.  And you’ll define __next__ to return the next piece of data in turn, or to raise StopIteration if there is no more data.

Remember that in an iterator, there is no “previous” or “reset” or anything of the sort.  All you can do is move forward, one item at a time, until you get to the end.

So let’s say that I want to define a simple iterator, one that returns the elements of a piece of data.  (Yes, basically what you already get built in by Python.)  We can say:

class MyIter(object):
    def __init__(self, data):
        self.data = data
        self.index = 0
    def __iter__(self):
        return self
    def __next__(self):   # In Python 2, this is just "next")
        if self.index >= len(self.data):
            raise StopIteration
        value = self.data[self.index]
        self.index += 1
        return value

Now I can say

>>> m = MyIter('abc')
>>> for letter in m:
        print(letter)

and it will work!

You can take any class you want, and make it into an iterator by adding the  __iter__ method (which returns self) and the __next__ (or in Python 2, “next”)  method.  Once you have done that, instances of MyIter can now be put inside of “for” loops, list comprehensions, or anything else that expects an “iterable” type of data.

Leave a Reply

Your email address will not be published. Required fields are marked *

17 + = twenty three