Implementing “zip” with list comprehensions

zipperI love Python’s “zip” function. I’m not sure just what it is about zip that I enjoy, but I have often found it to be quite useful. Before I describe what “zip” does, let me first show you an example:

>>> s = 'abc'
>>> t = (10, 20, 30)

>>> zip(s,t)
[('a', 10), ('b', 20), ('c', 30)]

As you can see, the result of “zip” is a sequence of tuples. (In Python 2, you get a list back.  In Python 3, you get a “zip object” back.)  The tuple at index 0 contains s[0] and t[0]. The tuple at index 1 contains s[1] and t[1].  And so forth.  You can use zip with more than one iterable, as well:

>>> s = 'abc'
>>> t = (10, 20, 30)
>>> u = (-5, -10, -15)

>>> list(zip(s,t,u))
[('a', 10, -5), ('b', 20, -10), ('c', 30, -15)]

(You can also invoke zip with a single iterable, thus ending up with a bunch of one-element tuples, but that seems a bit weird to me.)

I often use “zip” to turn parallel sequences into dictionaries. For example:

>>> names = ['Tom', 'Dick', 'Harry']
>>> ages = [50, 35, 60]

>>> dict(zip(names, ages))
{'Harry': 60, 'Dick': 35, 'Tom': 50}

In this way, we’re able to quickly and easily product a dict from two parallel sequences.

Whenever I mention “zip” in my programming classes, someone inevitably asks what happens if one argument is shorter than the other. Simply put, the shortest one wins:

>>> s = 'abc'
>>> t = (10, 20, 30, 40)
>>> list(zip(s,t))
[('a', 10), ('b', 20), ('c', 30)]

(If you want zip to return one tuple for every element of the longer iterable, then use “izip_longest” from the “itertools” package.)

Now, if there’s something I like even more than “zip”, it’s list comprehensions. So last week, when a student of mine asked if we could implement “zip” using list comprehensions, I couldn’t resist.

So, how can we do this?

First, let’s assume that we have our two equal-length sequences from above, s (a string) and t (a tuple). We want to get a list of three tuples. One way to do this is to say:

[(s[i], t[i])              # produce a two-element tuple
 for i in range(len(s))]   # from index 0 to len(s) - 1

To be honest, this works pretty well! But there are a few ways in which we could improve it.

First of all, it would be nice to make our comprehension-based “zip” alternative handle inputs of different sizes.  What that means is not just running range(len(s)), but running range(len(x)), where x is the shorter sequence. We can do this via the “sorted” builtin function, telling it to sort the sequences by length, from shortest to longest. For example:

>>> s = 'abcd'
>>> t = (10, 20, 30)

>>> sorted((s,t), key=len)
[(10, 20, 30), 'abcd']

In the above code, I create a new tuple, (s,t), and pass that as the first parameter to “sorted”. Given these inputs, we will get a list back from “sorted”. Because we pass the builtin “len” function to the “key” parameter, “sorted” will return [s,t] if s is shorter, and [t,s] if t is shorter.  This means that the element at index 0 is guaranteed not to be longer than any other sequence. (If all sequences are the same size, then we don’t care which one we get back.)

Putting this all together in our comprehension, we get:

>>> [(s[i], t[i])    
    for i in range(len(sorted((s,t), key=len)[0]))]

This is getting a wee bit complex for a single list comprehension, so I’m going to break off part of the second line into a function, just to clean things up a tiny bit:

>>> def shortest_sequence_range(*args):
        return range(len(sorted(args, key=len)[0]))

>>> [(s[i], t[i])     
    for i in shortest_sequence_range(s,t) ]

Now, our function takes *args, meaning that it can take any number of sequences. The sequences are sorted by length, and then the first (shortest) sequence is passed to “len”, which calculates the length and then returns the result of running “range”.

So if the shortest sequence is ‘abc’, we’ll end up returning range(3), giving us indexes 0, 1, and 2 — perfect for our needs.

Now, there’s one thing left to do here to make it a bit closer to the real “zip”: As I mentioned above, Python 2’s “zip” returns a list, but Python 3’s “zip” returns an iterator object. This means that even if the resulting list would be extremely long, we won’t use up tons of memory by returning it all at once. Can we do that with our comprehension?

Yes, but not if we use a list comprehension, which always returns a list. If we use a generator expression, by contrast, we’ll get an iterator back, rather than the entire list. Fortunately, creating such a generator expression is a matter of just replacing the [ ] of our list comprehension with the ( ) of a generator expression:

>>> def shortest_sequence_range(*args):
      return range(len(sorted(args, key=len)[0]))

>>> g = ((s[i], t[i])
         for i in shortest_sequence_range(s,t) )

>>> for item in g:
        print(item)
('a', 10)
('b', 20)
('c', 30)

And there you have it!  Further improvements on these ideas are welcome — but as someone who loves both “zip” and comprehensions, it was fun to link these two ideas together.

17 thoughts on “Implementing “zip” with list comprehensions”

    1. Ah, of course! I always forget that min and max take the “key” parameter, as well. So I could instead say:

      return range(len(min(args, key=len)))

      and it’ll be clearer, while doing the same thing.

  1. Good article. For completeness you could add at the end a version of izip where, instead of the list comprehension

    g = [(s[i], t[i])
    for i in shortest_sequence_range(s,t) ]

    you instead use the generator version:

    g = ((s[i], t[i])
    for i in shortest_sequence_range(s,t))

    The subsequent `for item in g` will work unchanged.

    1. I’ve played with Clojure and loved what I saw, but haven’t had time to do more with it. But yes, that makes total sense.

      When I mention to my students that this kind of functional style is typical in Lisp, and that everyone should learn Lisp, there’s a clear divide between the people who learned Lisp (and generally dislike it, rolling their eyes at my comment) and those who have no idea what I’m talking about. Good ol’ 6.001, which introduced me to Lisp (well, Scheme) brainwashed me, too!

  2. Built-in zip accepts iterables, not necessarily finite, and is not restricted to sortable sequences. It is essentially the same as 2.x itertools.izip. The generator code in the 2.x doc is quite simple and elegant.

    def izip(*iterables):
    # izip(‘ABCD’, ‘xy’) –> Ax By
    iterators = map(iter, iterables)
    while iterators:
    yield tuple(map(next, iterators))

    ‘while iterators’ instead of ‘while True’ skips when there are no iterables.

    1. Ah, good point. I guess I hadn’t thought about the possibility of zipping an infinitely long iterable. I’d only suggest trying to sort an infinitely long data structure if you’re paid by the hour.

  3. Without importing anything, the following should work on just about any sequence up to 18,446,744,073,709,551,616 items in length. It is important to realize that zip does works on sequences that do not have any predetermined size. If functions or imports were allowed and not just expressions, the code could be easier to read and not quite as convoluted.

    my_zip = lambda *sequences, generator=lambda iterators, cutoff=1<<64: (
    record for record in (tuple(next(
    iterator) for iterator in iterators) for _ in range(
    cutoff)) if record or next(iter(()))): generator(tuple(iter(
    sequence) for sequence in sequences))

  4. Funniest comprehension in my opinion is a list flattener:
    >>> x = [[1],[2],[3]]
    >>> [x for x in x for x in x]
    [1, 2, 3]

    Cheers

    1. That’s great! While we’re at it, we can even make it unpronounceable (a la Slashdot):

      [fore for fore in fore for fore in fore]
      [inn for inn in inn for inn in inn]
      [foreign for foreign in foreign for foreign in foreign]

  5. My previous comment needs to be corrected. The code should be:

    my_zip = lambda *a, b=lambda c, d, e=1<<64: (f for f in (tuple(next(g) for g in c) for _ in range(e)) if len(f) == d or next(iter(()))): b(tuple(iter(h) for h in a), len(a))

    It was failing to work properly with some inputs. Here is one such test case:

    list(my_zip(range(4), 'abc', b'AB'))

  6. Consider this: (both work under Python2 and Python3)
    >>> s = ‘abcde’
    >>> t = (10, 20, 30)
    >>> u = ‘wxyz’
    >>>
    >>>
    >>> # list ZIP
    … def my_ZIP(*args):
    … return [[j[i] for j in args] for i in range(len(min(args, key=len)))]

    >>> print(my_ZIP(s,t,u))
    [[‘a’, 10, ‘w’], [‘b’, 20, ‘x’], [‘c’, 30, ‘y’]]
    >>>
    >>>
    >>> # generator ZIP
    … def my_gZIP(*args):
    … return ([j[i] for j in args] for i in range(len(min(args, key=len))))

    >>> g = my_gZIP(s,t,u)
    >>> print(g)
    <generator object my_gZIP.. at 0x1013e19e8>
    >>> print([x for x in g])
    [[‘a’, 10, ‘w’], [‘b’, 20, ‘x’], [‘c’, 30, ‘y’]]
    >>>
    >>>

    1. For better readability, change my_gZIP() to:

      my_gZIP(*args):
      return ([a[i] for a in args] for i in range(len(min(args, key=len))))

Leave a Reply

Your email address will not be published. Required fields are marked *

sixty eight − = fifty eight