Thinking with “map”

In the free Webinar I gave yesterday about functional programming, I mentioned that “map,” or its equivalent (e.g., Python’s list comprehensions), is a powerful tool that I use nearly every day. Once you get into the functional mode of thinking, you’re constantly finding ways to turn one collection into another collection. It’s a mindset that takes time and practice, but allows you to solve many problems quickly and easily. The trick is to see beyond the initial problem, and to understand how you can think in terms of a source collection and a target collection.

For example, I was teaching an introductory Python course just today, and someone came to me and asked how they can turn a URL query string (e.g., x=1&y=2&z=abc) into a dictionary. Now, this isn’t a super-hard problem, but the reaction on his face to the way in which I solved it demonstrated how he would have used a completely different approach, and that functional thinking didn’t even cross his mind.

The first thing to notice is that in a query string, you’ve got name-value pairs separated by & signs. So the first task is to take the query string, and turn it into a list:

>>> query_string.split('&')
['x=1', 'y=2', 'z=abc']

Now that we have these items in a list, we can transform each of them. But wait — transform them?  Yes, and that’s where the “map” mindset comes in. You want to be moving your data into a list, which allows you to transform each element into another one. In this case, I want to transform each of the elements of the list into a dictionary pair.

Fortunately, we see that each name-value pair has a “=” sign between the name and the value. We can use that to our advantage, splitting each of the pairs:

>>> [item.split('=') for item in query_string.split('&')]
[['x', '1'], ['y', '2'], ['z', 'abc']]

In other words, we have now created a list of lists, in which the first element of each sub-list is our intended dictionary key, and the second element is our intended dictionary value.

Well, we can use dict() to construct dictionaries in Python. And whadaya know, it works just fine with a sequence of two-element sequences. We normally think of feeding dict() a list of tuples, but it turns out that a list of lists works just fine, as well:

>>> dict([item.split('=') for item in query_string.split('&')])
{'x': '1', 'y': '2', 'z': 'abc'}

And just like that, we’ve created our dictionary.

Of course, we could also use a dictionary comprehension:

>>> { item.split('=')[0] : item.split('=')[1] 
    for item in query_string.split('&') }
{'a': '1', 'b': '2', 'c': 'xyz'}

Now, none of the steps here was particularly difficult. Indeed, while the syntax of comprehensions can be a bit complex, the real difficulty here was seeing the original string, and immediately thinking, “Wait, if I can just turn that into a list, then I can easily create a dictionary from that.”

These sorts of transformations are everywhere, and they allow us to take seemingly difficult tasks and turn them into relatively simple ones.

The relative speeds of str.format and %

My most recent blog post talked about the use of str.format instead of the % operator for interpolating values into strings. Some people who read the post wondered about their relative speeds.

I should first note that my first response to this is: I don’t really care that much. I’m not saying that speed isn’t important, or that optimization should never be done. Rather, my philosophy is that people are expensive and computers are cheap — and thus, anything we do to make people more productive, even if that comes at the expense of program speed, is probably fine.

Of course, that’s not always going to be true. Sometimes, you need (or just want) to squeeze more out of your computer. And to be a good programmer, you also need to know the relative advantages and disadvantages of the techniques you’re using.

So I decided to run a few, quick benchmarks on the relative speeds of str.format and %.  Sure enough, the % operator was a lot faster.  I ran my benchmarks the magic %timeit command that is built into the IPython interactive shell.  (If you’re using Python and aren’t using IPython, you should really switch ASAP.)  Note that in order to make things easier to read, I’m removing the IPython input and output prompts, and using >>> to denote where I entered text.

>>> name = 'Reuven'
>>> %timeit 'Hello there, {}'.format(name)
1000000 loops, best of 3: 243 ns per loop

>>> %timeit 'Hello there, %s' % name
10000000 loops, best of 3: 147 ns per loop

Wow.  As you can see, %timeit executed each of these lines of code 1,000,000 times. It then gave the average speed per loop. The % operator was, on average, about 100 ns faster than str.format. That shouldn’t come as a huge surprise, given that % is an operator (and thus doesn’t require a method call), doesn’t handle indexes and attributes, and can (I’m guessing) pass a great deal of its work off to C’s printf function.

Then again, is 100 ns really that long to wait for a formatted string?  I’m not so sure, to be honest.

What happens if we perform more than one interpolation?

>>> first = 'Reuven'
>>> last = 'Lerner'
>>> %timeit 'Hello there, {} {}'.format(first, last)
1000000 loops, best of 3: 371 ns per loop

>>> %timeit 'Hello there, %s %s' % (first, last)
1000000 loops, best of 3: 243 ns per loop

Each of these takes significantly longer to run than was the case with a single replacement. The difference between them continues to be about 120 ns per assignment — still not something to worry about too much, but the difference does exist.

What if I make the strings space-padded?

>>> %timeit 'Hello there, {:10} {:15}' % (first, last)
1000000 loops, best of 3: 459 ns per loop

>>> %timeit 'Hello there, %10s %15s' % (first, last)
1000000 loops, best of 3: 254 ns per loop

Now we see an even starker difference between the two ways of handling things. What about something like floating-point math, which takes longer?

>>> import math
>>> %timeit 'I love to eat {}'.format(math.pi)
1000000 loops, best of 3: 587 ns per loop

>>> %timeit 'I love to eat %f' % math.pi
1000000 loops, best of 3: 354 ns per loop

Limiting the number of decimals shown doesn’t seem to change the outcome very much:

>>> %timeit 'I love to eat {:.3}'.format(math.pi)
1000000 loops, best of 3: 582 ns per loop

>>>%timeit 'I love to eat %.3f' % math.pi
1000000 loops, best of 3: 329 ns per loop

UPDATE: Several people on Reddit took me to task for failing to consider the overhead of the str.format method call.  I mentioned this briefly above, but should have realized that there was an easy to to avoid this, namely aliasing the attributes (the method str.format and the float math.pi) to local variables:

>>> f = 'I love to eat {:.3}'.format
>>> p = math.pi
>>> %timeit f(p)
1000000 loops, best of 3: 489 ns per loop

>>> %timeit 'I love to eat %f' % p
1000000 loops, best of 3: 370 ns per loop

We still see significant overhead. Again, I’m guessing that a lot of this has to do with the overhead of a method vs. an operator. I’m not about to start looking at the bytecodes; this wasn’t meant to be a super-deep investigation or benchmark, but rather a quick check and comparison, and I think that on that front, it did the trick.

So, what have we learned?

  • Yes, str.format is slower than %.
  • The number of parameters you pass to str.format, and whether you then adjust the output with padding or a specified number of decimals, significantly influences the output speed.
  • That said, in many programs, the difference in execution speed is often 100 ns, which is not enough to cause trouble in many systems.

If speed is really important to you, then you should probably use %, and not str.format. However, if speed is more important than the maintainability or readability of your code, then I’d argue that Python is probably a poor choice of programming language.

Teaching an old dog new tricks — or, how I learned to love Python’s str.format, and gave up on %

I have been programming in Python for many years. One of the things that I wondered, soon after starting to work in Python, was how you can get Perl-style variable interpolation. After all, Perl (like the Unix shell) supports two types of quotes — single quotes (in which everything is taken literally) and double quotes (in which variables’ values are inserted). Thus, in Perl, you can do something like:

$name = 'Reuven';
print "Hello, $name\n";

And sure enough, it’ll print “Hello, Reuven”.

Because single and double quotes are equivalent in Python (so long as they come as a matched set), there is no variable interpolation. The technique that I learned years ago, when I started with Python, was that you could use the % operator on a string. In this context, % looks to the string on its left, determines how many values within the string need to be replaced, and then looks right to find those values. It then returns a new string, effectively interpolating the values. For example:

>>> name = 'Reuven'
>>> "Hello, %s" % name

'Hello, Reuven'

The above Python code works just fine, returning a string with a nice, personalized greeting. And indeed, for the length of my time working with Python, I have enjoyed using this % syntax. Yes, it’s a bit weird. And no, I cannot ever remember more than the absolute basics of printf’s various % codes, meaning that I either make everything a string (with %s), or I guess, or I look up the printf codes and what they do. But to be honest, I normally just use %s, and thus benefit additionally from the fact that Python will silently invoke “str” on the parameter.

The thing is, % is supposedly going away, or is at least deprecated. (A note on the python-dev list indicates that % will go away no sooner than 2022, which is a heckuva long time from now.) As of Python 2.6, not to mention Python 3.x, we have been told that it will eventually disappear, and that we shouldn’t use % any more. Instead, we should use the str.format method.

I have always mentioned str.format to my Python classes, but to be honest, I’ve usually relied upon % when giving live demonstrations and answering questions. And I would even encourage my students to use the % syntax, in part because I found it to be so much easier.

And yet.  I knew that I was doing something wrong, and I knew that I was probably misleading my students to some degree. Thus, in the last three classes I taught, I started to push harder in the direction of str.format. And that’s when I realized two things: (1) It’s just as easy as %, and actually easier in some ways, and (2) I hadn’t learned enough about str.format to use it, beyond the simplest ways. I thus spent a great deal of time researching it — and found out that str.format, while it takes some getting used to, is more than worth the effort.

Let’s start with the simplest case. I’d like to be able to say “Good morning” to someone, using both their first and last names. Assuming that I have variables named “first” and “last”, I can do this with the old syntax as follows:

>>> first = 'Reuven'
>>> last = 'Lerner'
>>> "Good morning, %s %s" % (first, last)

'Good morning, Reuven Lerner'

In this example, we already see one of the problems with the % syntax — namely, that if we have more than one value, we need to put it into a tuple. Perhaps this is logical and reasonable from Python’s perspective, but I can assure you that it surprises many of my students.

So, how would we do it using str.format? Pretty similarly, in many ways:

>>> "Good morning, {} {}".format(first, last)

'Good morning, Reuven Lerner'

Notice that we’ve changed things a bit here. No longer are we invoking a binary operator (%) on the string. Rather, we’re invoking a string method that takes a set of parameters. This is more logical and consistent. I can’t tell you how many of my students think that % is somehow connected to “print”, when in fact it’s connected (in the case of string formatting) to strings. Having to use put the “.format” at the end of the string makes the method call more obvious.

As you might already know, the “{} {}” in the string tells str.format to take its two parameters, and to insert them, in order, into the string. Because there are two arguments, we can only have two {} inside of the string. This is a bit harder to understand, both because having {} in Python reminds many people of a dictionary, and because the empty curly braces look a bit weird. But fine, I can live with that, and got used to it very quickly.

Where str.format quickly shows its advantages over %, however, is if I want to display the input parameters in reverse order. When I use %, there is no real way to do that. Plus, if I want to reuse a value passed to %, I cannot do so. With str.format, I can swap the order in which the inputs are displayed:

>>> "Good morning, {1} {0}".format(first, last)

'Good morning, Lerner Reuven'

Notice what happened in the above: If I just use “{} {}”, then str.format uses the two parameters in order. However, I’m also able to treat the parameters as a sequence, with indexes starting at 0. I can then insert them in reverse order, as I did above, or in the regular order:

>>> "Good morning, {0} {1}".format(first, last)

'Good morning, Reuven Lerner'

Note that if you explicitly state the field numbers, then you cannot rely on the automatic numbering.

Of course, this lets me also pass a sequence of values to be inserted, so long as we then use the splat (*) operator on it, to turn it into a parameter list:

>>> names = ('Reuven', 'Lerner')
>>> "Good morning, {} {}".format(*names)

'Good morning, Reuven Lerner'

You can also call str.format with keyword arguments. When you do this, you can then put a keyword name within the {}:

>>> "Good morning, {first} {last}".format(first='Reuven', last='Lerner')

'Good morning, Reuven Lerner'

The above really appeals to me. The named parameters are explicit (if a bit long), and the use of {first} and {last} is quite readable — certainly more so than %(first)s ever was with the % operator!

I can, of course, also pass a dictionary of names, using the ** operator to turn it into a set of keyword arguments:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Good morning, {first} {last}".format(**person)

'Good morning, Reuven Lerner'

I described all of these to my students in the last month, and I was pleasantly surprised to see how comfortable they were with the syntax. I’m sure that this reflects, to some degree, my comfort with the syntax, as well.

I should note that you can combine numeric and keyword arguments when working with str.format. I really suggest that you not do so. The results would look like this:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Good {0}, {first} {last}".format('morning', **person)

'Good morning, Reuven Lerner'

Yukko.

Now, the one thing that would appear to be missing from str.format is… well, formatting! The bad news is that str.format has a completely and different way of indicating how you want to format output. The good news is that it’s not too hard to learn and understand.

Let’s start with the easiest part: If you want to display a string within a fixed-width field, then you can do so by adding a colon (:) and then a number.  So to put my name in a fixed-width field of 10 spaces, we would say:

>>> "Your name is {name:10}".format(name="Reuven")

'Your name is Reuven    '

(Notice the trailing spaces after my name.)

In the above example, my name is left-justified. If I want it to be right-justified, I could use a > sign between the : and the number:

>>> "Your name is {name:>10}".format(name="Reuven")

'Your name is     Reuven'

And yes, I could have used an optional < symbol to say that my name should be left-justified within the field of 10 spaces in the first example.  Or I could center the text in a field of 10 spaces with the ^ specifier instead of < or >.

To pad the string with something other than a space, we specify it before the <, >, or ^ character. For example, if I’m moving to Hollywood, then perhaps I should do something like this:

>>> "Your name is {name:*^10}".format(name="Reuven")

'Your name is **Reuven**'

If I want to put the string in the (default) left-most position of the string, filling with characters on the right, then I must use the < specifier, so that the text will be on the left, and the stars on the right.

So it’s pretty clear that str.format is pretty snazzy when it comes to text. How about numbers? I wasn’t really sure how things would work here, but it turns out that they’re also quite straightforward. If you’re displaying integers, then you can go ahead and say:

>>> "The price is ${number}.".format(number=123)

'The price is $123.'

So far, we don’t see any difference between passing an integer and a string. And indeed, they share many characteristics. However, we might want to display an integer in a different way. We can do that using one of the (many) modifiers that str.format provides — letters placed just before the end of the closing } character. For example, we can get the price in binary (with a trailing “b”), or in hexadecimal (with a trailing “x”), as in the following example:

>>> "The price is ${number:x}.".format(number=123)

'The price is $7b.'

Of course, we can also zero-pad the number, such that it will always take up a fixed width. Just place a 0 between the colon and the width:

>>> "Your call is important to us. You are call #{number:05}.".format(number=123)

'Your call is important to us. You are call #00123.'

Notice that inside of the {}, we cannot put executable Python code. Instead, there is a mini-language that is separate and different from Python. However, there are two small exceptions to this rule: (1) We can retrieve any attribute with the standard . notation, and (2) we can retrieve a single item with the [] notation.

For example:

>>> class Foo(object):
        def __init__(self):
        self.x = 100
>>> f = Foo()
>>> 'Your number is {o.x}'.format(o=f)

'Your number is 100'n

Notice how we were able to retrieve the “x” attribute from the “f” object, which we mapped to “o” within the string. However, while you can retrieve an attribute, you cannot execute it. Thus, the following will not work:

>>> "Your name is {name.upper()}".format(name="Reuven")

AttributeError: 'str' object has no attribute 'upper()'

See what happened? I said “name.upper()”, in order to execute the method “str.upper” on “name”.  However, Python doesn’t want me to execute code there. So it takes the name of the attribute literally — and thus complained that there is no attribute “upper()”, with the parentheses. Of course, if you try it without the parentheses, it’ll work, for some value of “work”:

>>> "Your name is {name.upper}".format(name="Reuven")

'Your name is <built-in method upper of str object at 0x1028bf2a0>'

Similarly, we can retrieve an individual element of a sequence or mapping with []. However, we cannot use the slice notation for more than one element. For example:

>>> "Your favorite number is {n[3]}.".format(n=numbers)

'Your favorite number is 3.'

However:

>>> "Your favorite numbers are {n[2:4]}.".format(n=numbers)

ValueError: Missing ']' in format string

The “:” character, which we use for slices, isn’t available in format strings, because it’s used to control the formatting of the output.

You can, of course, use [] on a dictionary, as well. However — and this is a bit weird for Python — we omit the quote marks, even when our key is a string. For example:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Your name is {p[first]}.".format(p=person)

'Your name is Reuven.'

If we were to include the quotes…

>>> "Your name is {p['first']}.".format(p=person)

KeyError: "'first'"

There is actually a lot more to str.format than what I have shared here. In particular,  each type has its own format specifications, which means that you can do certain things with floats (e.g., setting the precision) that you cannot do with strings.

You can even add formatting functionality to your own Python classes, such that they’ll be displayed in the way that you want, along with format specifiers that you define.

If you want to learn more about this, I’d definitely suggest reading PEP 3101, which describes str.format. I’d also suggest a slide show by Eric Smith, which summarizes things nicely. Finally, the Python documentation has some excellent examples, including a guide for moving from % to str.format.

I hope that this was helpful and useful! If you enjoyed this blog post, check out my many other resources, including my free e-mail course on Python scoping, and my free Webinar on functional programming in Python.

Three Pythonic products: A (free) Webinar, a course, and an ebook

I love developing software.  I also love helping people to learn how to develop better. That’s why I have been teaching programming classes for more than a decade, and why I write about programming. There is so much to learn; it’s a rare day on which I don’t learn something new, and it’s a rare week in which I don’t apply some new understanding to a problem that I’m solving for a client.

Now that I have finished my PhD, I have some more time to focus on creating products that I believe will help people to program better. I’m pleased to announce three initial such products, all aimed at Python developers who want to improve their skills:

  1. A free Webinar (“How functional programming will make you a better Python programmer“), scheduled for Monday, September 15th, at 2 p.m. Eastern Time. The Webinar will consist of a 45-minute presentation, followed by a Q&A period. My goal is to describe functional programming, and then demonstrate some of the functional techniques that are available in Python. If you are a Python programmer who has heard about functional programming but doesn’t really know what it is, then I think you’ll get a lot out of this free seminar. Please register and attend! Even if you’re not interested in my other products, I’d love to see you at this Webinar — if only to learn what topics people do want to learn, and what I should address in future Webinars, books, and online classes.
  2. For those who want to dig deeper into functional programming in Python, I’ll be giving a full-day, live, online course on the subject. This course, which will include lectures, demos, and many exercises, begins with an introduction to functions in Python, then describes how to pass functions as parameters, list/set/dict comprehensions, customizing list.sort and sorted, lambda, map/filter/zip, reduce, and the “functools” module. This class, like all of my classes, will be intense and highly interactive — but will give you a new perspective on programming in general, and particularly in Python. You can register for this one-day course at EventBrite; it’ll be given on Sunday, September 21st and again on Tuesday, September 23rd.
  3. If you have taken a Python course, but feel that you need to practice Python more thoroughly before you feel like you’re truly fluent in the language, then you might be interested in my ebook, “Practice Makes Python.” The book will contain about 50 exercises in various aspects of Python, with the aim of helping you to learn through doing. I hope to release the ebook about a month from now, and am basing many of the exercises on my courses, as well as comments I’ve received from many of my students. If you’re interested in the book, you should sign up for the announcement list, where I’ll be providing sample content and behind-the-scenes information.

These products (well, the paid ones, anyway) come with a full, 100% money-back guarantee. And of course, if you have questions, you can always contact me via e-mail.