In Python, it’s all about the attributes

Newcomers to Python are often amazed to discover how easily we can create new classes. For example:

class Foo(object): 

is a perfectly valid (if boring) class. We can even create instances of this class:

f = Foo()

This is a perfectly valid instance of Foo. Indeed, if we ask it to identify itself, we’ll find that it’s an instance of Foo:

>>> type(f)
<class '__main__.Foo'>

Now, while “f” might be a perfectly valid and reasonable instance of Foo, it’s not very useful. It’s at this point that many people who have come to Python from another language expect to learn where they can define instance variables. They’re relieved to know that they can write an  __init__ method, which is invoked on a new object immediately after its creation. For example:

class Foo(object):
    def __init__(self, x, y):
      self.x = x
      self.y = y

>> f = Foo(100, 'abc')

>>> f.x
>>> f.y

On the surface, it might seem like we’re setting two instance variables, x and y, on f, our new instance of Foo. And indeed, the behavior is something like that, and many Python programmers think in these terms. But that’s not really the case, and the sooner that Python programmers stop thinking in terms of “instance variables” and “class variables,” the sooner they’ll understand how much of Python works, why objects work in the ways that they do, and how “instance variables” and “class variables” are specific cases of a more generalized system that exists throughout Python.

The bottom line is inside of __init__, we’re adding new attributes to self, the local reference to our newly created object.  Attributes are a fundamental part of objects in Python. Heck, attributes are fundamental to everything in Python. The sooner you understand what attributes are, and how they work, the sooner you’ll have a deeper understanding of Python.

Every object in Python has attributes. You can get a list of those attributes using the built-in “dir” function. For example:

>>> s = 'abc'
>>> len(dir(s))
>>> dir(s)[:5]
['__add__', '__class__', '__contains__', '__delattr__', '__doc__']

>>> i = 123
>>> len(dir(i))
>>> dir(i)[:5]
['__abs__', '__add__', '__and__', '__class__', '__cmp__']

>>> t = (1,2,3)
>>> len(dir(t))
>>> dir(t)[:5]
['__add__', '__class__', '__contains__', '__delattr__', '__doc__']

As you can see, even the basic data types in Python have a large number of attributes. We can see the first five attributes by limiting the output from “dir”; you can look at them yourself inside of your Python environment.

The thing is, these attribute names returned by “dir” are strings. How can I use this string to get or set the value of an attribute? We somehow need a way to translate between the world of strings and the world of attribute names.

Fortunately, Python provides us with several built-in functions that do just that. The “getattr” function lets us get the value of an attribute. We pass “getattr” two arguments: The object whose attribute we wish to read, and the name of the attribute, as a string:

>>> getattr(t, '__class__')

This is equivalent to:

>>> t.__class__

In other words, the dot notation that we use in Python all of the time is nothing more than syntactic sugar for “getattr”. Each has its uses; dot notation is far easier to read, and “getattr” gives us the flexibility to retrieve an attribute value with a dynamically built string.

Python also provides us with “setattr”, a function that takes three arguments: An object, a string indicating the name of the attribute, and the new value of the attribute. There is no difference between “setattr” and using the dot-notation on the left side of the = assignment operator:

>>> f = Foo()
>>> setattr(f, 'x', 5)
>>> getattr(f, 'x')
>>> f.x
>>> f.x = 100
>>> f.x

As with all assignments in Python, the new value can be any legitimate Python object. In this case, we’ve assigned f.x to be 5 and 100, both integers, but there’s no reason why we couldn’t assign a tuple, dictionary, file, or even a more complex object. From Python’s perspective, it really doesn’t matter.

In the above case, I used “setattr” and the dot notation (f.x) to assign a new value to the “x” attribute. f.x already existed, because it was set in __init__. But what if I were to assign an attribute that didn’t already exist?

The answer: It would work just fine:

>>> f.new_attrib = 'hello'
>>> f.new_attrib

>>> f.favorite_number = 72
>>> f.favorite_number

In other words, we can create and assign a new attribute value by … well, by assigning to it, just as we can create a new variable by assigning to it. (There are some exceptions to this rule, mainly in that you cannot add new attributes to many built-in classes.) Python is much less forgiving if we try to retrieve an attribute that doesn’t exist:

>>> f.no_such_attribute
AttributeError: 'Foo' object has no attribute 'no_such_attribute'

So, we’ve now seen that every Python object has attributes, that we can retrieve existing attributes using dot notation or “getattr”, and that we can always set attribute values. If the attribute didn’t exist before our assignment, then it certainly exists afterwards.

We can assign new attributes to nearly any object in Python. For example:

def hello():
    return "Hello"

>>> hello.abc_def = 'hi there!'

>>> hello.abc_def
'hi there!'

Yes, Python functions are objects. And because they’re objects, they have attributes. And because they’re objects, we can assign new attributes to them, as well as retrieve the values of those attributes.

So the first thing to understand about these “instance variables” that we oh-so-casually create in our __init__ methods is that we’re not creating variables at all. Rather, we’re adding one or more additional attributes to the particular object (i.e., instance) that has been passed to __init__. From Python’s perspective, there is no difference between saying “self.x = 5″ inside of __init__, or “f.x = 5″ outside of __init__. We can add new attributes whenever we want, and the fact that we do so inside of __init__ is convenient, and makes our code easier to read.

This is one of those conventions that is really useful to follow: Yes, you can create and assign object attributes wherever you want. But it makes life so much easier for everyone if you assign all of an object’s attributes in __init__, even if it’s just to give it a default value, or even None. Just because you can create an attribute whenever you want doesn’t mean that you should do such a thing.

Now that you know every object has attributes, it’s time to consider the fact that classes (i.e., user-defined types) also have attributes. Indeed, we can see this:

>>> class Foo(object):

Can we assign an attribute to a class?  Sure we can:

>>> = 100

Classes are objects, and thus classes have attributes. But it seems a bit annoying and roundabout for us to define attributes on our class in this way. We can define attributes on each individual instance inside of __init__. When is our class defined, and how can we stick attribute assignments in there?

The answer is easier than you might imagine. That’s because there is a fundamental difference between the body of a function definition (i.e., the block under a “def” statement) and the body of a class definition (i.e., the block under a “class” statement). A function’s body is only executed when we invoke the function. However, a the body of the class definition is executed immediately, and only once — when we define the function. We can execute code in our class definitions:

class Foo(object):
    print("Hello from inside of the class!")

Of course, you should never do this, but this is a byproduct of the fact that class definitions execute immediately. What if we put a variable assignment in the class definition?

class Foo(object):
    x = 100

If we assign a variable inside of the class definition, it turns out that we’re not assigning a variable at all. Rather, we’re creating (and then assigning to) an attribute. The attribute is on the class object. So immediately after executing the above, I can say:


and I’ll get the integer 100 returned back to me.

Are you a little surprised to discover that variable assignments inside of the class definition turn into attribute assignments on the class object? Many people are. They’re even more surprised, however, when they think a bit more deeply about what it must mean to have a function (or “method”) definition inside of the class:

>>> class Foo(object):
        def blah(self):
            return "blah"

>>> Foo.blah
<unbound method Foo.blah>

Think about it this way: If I define a new function with “def”, I’m defining a new variable in the current scope (usually the global scope). But if I define a new function with “def” inside of a class definition, then I’m really defining a new attribute with that name on the class.

In other words: Instance methods sit on a class in Python, not on an instance. When you invoke “f.blah()” on an instance of Foo, Python is actually invoking the “blah” method on Foo, and passing f as the first argument. Which is why it’s important that Python programmers understand that there is no difference between “f.blah()” and “Foo.blah(f)”, and that this is why we need to catch the object with “self”.

But wait a second: If I invoke “f.blah()”, then how does Python know to invoke “Foo.blah”?  f and Foo are two completely different objects; f is an instance of Foo, whereas Foo is an instance of type. Why is Python even looking for the “blah” attribute on Foo?

The answer is that Python has different rules for variable and attribute scoping. With variables, Python follows the LEGB rule: Local, Enclosing, Global, and Builtin. (See my free, five-part e-mail course on Python scopes, if you aren’t familiar with them.)  But with attributes, Python follows a different set of rules: First, it looks on the object in question. Then, it looks on the object’s class. Then it follows the inheritance chain up from the object’s class, until it hits “object” at the top.

Thus, in our case, we invoke “f.blah()”. Python looks on the instance f, and doesn’t find an attribute named “blah”. Thus, it looks on f’s class, Foo. It finds the attribute there, and performs some Python method rewriting magic, thus invoking “Foo.blah(f)”.

So Python doesn’t really have “instance variables” or “class variables.”  Rather, it has objects with attributes. Some of those attributes are defined on class objects, and others are defined on instance objects. (Of course, class objects are just instances of “type”, but let’s ignore that for now.)  This also explains why people sometimes think that they can or should define attributes on a class (“class variables”), because they’re visible to the instances. Yes, that is true, but it sometimes makes more sense than others to do so.

What you really want to avoid is creating an attribute on the instance that has the same name as an attribute on the class. For example, imagine this:

class Person(object):
    population = 0
    def __init__(self, first, last):
        self.first = first        
        self.last = last
        self.population += 1

p1 = Person('Reuven', 'Lerner')
p2 = Person('foo', 'bar')

This looks all nice, until you actually try to run it. You’ll quickly discover that Person.population remains stuck at 0, but p1.population and p2.population are both set to 1. What’s going on here?

The answer is that the line

self.population += 1

can be turned into

self.population = self.population + 1

As always, the right side of an assignment is evaluated before the left side. Thus, on the right side, we say “self.population”. Python looks at the instance, self, and looks for an attribute named “population”. No such attribute exists. It thus goes to Person, self’s class, and does find an attribute by that name, with a value of 0. It thus returns 0, and executes 0 + 1. That gives us the answer 1, which is then passed to the left side of the assignment. The left side says that we should store this result in self.population — in other words, an attribute on the instance! This works, because we can always assign any attribute. But in this case, we will now get different results for Person.population (which will remain at 0) and the individual instance values of population, such as p1 and p2.

We can actually see what attributes were actually set on the instance and on the class, using a list comprehension:

class Foo(object):
    def blah(self):
        return "blah"

>>> [attr_name for attr_name in dir(f) if attr_name not in dir(Foo)]

>>> [attr_name for attr_name in dir(Foo) if attr_name not in dir(object)]
['__dict__', '__module__', '__weakref__', 'blah']

In the above, we first define “Foo”, with a method “blah”. That method definition, as we know, is stored in the “blah” attribute on Foo. We haven’t assigned any attributes to f, which means that the only attributes available to f are those in its class.

I hope that this has helped to put Python’s attributes into perspective. If this sort of thing interests you, you might be interested in my full-day, live online class about object-oriented Python, which I’ll next be offering on October 29th.  I’m also giving a full-day, live online class about functional programming in Python on October 27th, and a free (no obligation) hour-long Webinar on October 22nd about functional programming in Python.

Announcing: A free Webinar, and two online courses (functional Python and OO Python)

It’s time again for me to offer a free Webinar, as well as two online courses:

  • I’m repeating my hour-long free Webinar about functional programming in Python on Wednesday, October 22nd.  When I offered it last month, more than 200 people got tickets, and 70 participated.  I had a blast, and nearly 2,000 people have viewed the YouTube version online.  Well, I have updated and improved this seminar, and we’ll be using technology (Google Hangouts On Air) that will let people participate with less hassle.  So if you’ve always wanted to learn about functional programming in Python, or if you have questions you would like to ask, please join me!
  • I’m also repeating my live, day-long class in functional Python programming on Monday, October 27th. This class goes into far more depth than the Webinar, and includes many exercises, as well as explanations of what functional programming is, what techniques are available in Python, and how we can use them to improve our program’s efficiency and maintainability.  We’ll cover some of the most powerful, but also misunderstood, aspects of Python programming, including list/set/dict comprehensions and the oft-maligned “reduce” function.
  • Finally, I’m offering a new live, day-long class in object-oriented Python programming on Monday, October 27th. This class starts off by describing what objects are, and quickly works its way toward classes, instance vs. class attributes, methods, class and static methods, inheritance, and even the basics of iteration. In my experience, many people who have been writing object-oriented Python for a year or more can benefit from this class; it’ll really help to solidify the concepts, and help you to understand how Python’s objects work behind the scenes.

These classes are versions of what I have taught many times at such companies as Apple, Cisco, HP, SANDisk, and VMWare around the world.  They have helped many Python programmers to become more proficient with the language, and to solve bigger and better problems in less time, and without sacrificing maintainability.

Even if you aren’t interested in the paid courses, you’re more than welcome to join the free Webinar.

As always, if you have questions, please let me know at


Sorting lists of dicts — an exercise from “Practice Makes Python”

Cover of "Practice Makes Python"

Cover of “Practice Makes Python”

My ebook, Practice Makes Python, will go on pre-sale one week from today. The book is a collection of 50 exercises that I have used and refined when training people in Python in the United States, Europe, Israel, and China. I have found these exercises to be useful in helping people to go beyond Python’s syntax, and see how the parts of the language fit together.

(By the way, I’ll be giving my functional programming class on October 27th, and my object-oriented programming class on October 29th. Both are full-day classes, taught live and online. Signups for both classes will be announced here more formally in the coming days. Contact me at if you already want details.)

Today, I’m posting an exercise that involves a common complex data structure, a list of dictionaries (aka “dicts”). I welcome feedback on the exercise, my proposed solution, and my discussion of that solution. If you’re trying to improve your Python skills, I strongly encourage you to try to solve the exercise yourself before looking at the answer. You will get much more out of struggling through the solutions to these exercises than from simply reading the answers.

Alphabetizing names

Let’s assume that you have phone book data in a list of dictionaries, as follows:

people =
[{'first':'Reuven', 'last':'Lerner', 'email':''},
 {'first':'Barack', 'last':'Obama', 'email':''},
 {'first':'Vladimir', 'last':'Putin', 'email':''}

First of all, if these are the only people in your phone book, then you should rethink whether Python programming is truly the best use of your time and connections. Regardless, let’s assume that you want to print information about all of these people, but in phone-book order – that is, sorted by last name and then by first name. Each line of the output should just look like this:

LastName, FirstName:


for person in sorted(people, key=lambda person: [person['last'], person['first']]):
    print("{last}, {first}: {email}".format(**person))


While Python’s data structures are useful by themselves, they become even more powerful and useful when combined. Lists of lists, lists of tuples, lists of dictionaries, and dictionaries of dictionaries are all quite common in Python. Learning to work with these is an important part of being a fluent Python programmer.

There are two parts to the above solution. The first is how we sort the names of the people in our list, and the second is how we print each of the people.

Let’s take the second problem first: We have a list of dictionaries. This means that when we iterate over our list, “person” is assigned a dictionary in each iteration. The dictionary has three keys: “first”, “last”, and “email”. We will want to use each of these keys to display each phone-book entry.

It’s true that the “str.format” method allows us to pass individual values, and then to grab those values in numerical order. Thus, we could say:

for person in people:
    print("{0}, {1}: {2}".format(person['last'], person['first'], person['email'])

Starting in Python 2.7, we can even eliminate the numbers, if we are planning to use them in order:

for person in people:
   print("{}, {}: {}".format(person['last'], person['first'], person['email'])

The thing is, we can also pass name-value pairs to “str.format”. For example, we could say:

for person in people:
    print("{last}, {first}: {email}".format(last=person['last'],

Even if our format string, with the “{first}” and “{last}”, is more readable, the name-value pairs we are passing are annoying to write. All we’re basically doing is taking our “person” dictionary, expanding it, and passing its name-value pairs as arguments to “str.format”.

However, there is a better way: We can take a dictionary and turn it into a set of keyword arguments by applying the “double splat” operator, “**”, on a dictionary. In other words, we can say:

for person in people:
    print("{last}, {first}: {email}".format(**person)

So far, so good. But we still haven’t covered the first problem, namely sorting the list of dictionaries by last name and then first name. Basically, we want to tell Python’s sort facility that before it compares two dictionaries from our “people” list, it should turn the dictionary into a list, consisting of the person’s last and first names. In other words, we want:


to become

['Putin', 'Vladimir']

Note that we’re not trying to sort them as strings. That would work in our particular case, but if two people have *almost* the same last name (e.g., “Lerner” and “Lerner-Friedman”), then sorting them as strings won’t work. Sorting them by lists will work, because Python sorts lists by comparing each element in sequence. One element cannot “spill over” into the next element when making the comparison.

If we want to apply a function to each list element before the sorting comparison takes place, pass a function to the “key” parameter. Thus, we can sort elements of a list by saying:

mylist = ['abcd', 'efg', 'hi', 'j']

After executing the above, “mylist” will now be sorted in increasing order of length, because the built-in “len” function will be applied to each element before it is compared with others. In the case of our alphabetizing exercise, we could write a function that takes a dict and returns the sort of list that’s necessary:

def person_dict_to_list(d):
 return [d['last'], d['first']]

We could then apply this function when sorting our list:


Following that, we could then iterate over the now-sorted list, and display our people.

However, it feels wrong to me to sort “people” permanently, if it’s just for the purposes of displaying its elements. Furthermore, I don’t see the point in writing a special-purpose named function if I’m only going to use it once.

We can thus use two pieces of Python which come from the functional programming world — the built-in “sorted” function, which returns a new, sorted list based on its inputs and the “lambda” operator, which returns a new, anonymous function. Combining these, thus get to the solution suggested above, namely:

for person in sorted(people, key=lambda person: [person['last'], person['first']]):
    print("{last}, {first}: {email}".format(**person))

This solution does not change the “people” list, but it does sort its elements for the purposes of printing them. And it prints them, in the phone-book order that we wanted, combining the “sorted” function, “lambda” for a built-in anonymous function, and the double-splat (“**”) operator on an argument to “str.format”.

Thinking with “map”

In the free Webinar I gave yesterday about functional programming, I mentioned that “map,” or its equivalent (e.g., Python’s list comprehensions), is a powerful tool that I use nearly every day. Once you get into the functional mode of thinking, you’re constantly finding ways to turn one collection into another collection. It’s a mindset that takes time and practice, but allows you to solve many problems quickly and easily. The trick is to see beyond the initial problem, and to understand how you can think in terms of a source collection and a target collection.

For example, I was teaching an introductory Python course just today, and someone came to me and asked how they can turn a URL query string (e.g., x=1&y=2&z=abc) into a dictionary. Now, this isn’t a super-hard problem, but the reaction on his face to the way in which I solved it demonstrated how he would have used a completely different approach, and that functional thinking didn’t even cross his mind.

The first thing to notice is that in a query string, you’ve got name-value pairs separated by & signs. So the first task is to take the query string, and turn it into a list:

>>> query_string.split('&')
['x=1', 'y=2', 'z=abc']

Now that we have these items in a list, we can transform each of them. But wait — transform them?  Yes, and that’s where the “map” mindset comes in. You want to be moving your data into a list, which allows you to transform each element into another one. In this case, I want to transform each of the elements of the list into a dictionary pair.

Fortunately, we see that each name-value pair has a “=” sign between the name and the value. We can use that to our advantage, splitting each of the pairs:

>>> [item.split('=') for item in query_string.split('&')]
[['x', '1'], ['y', '2'], ['z', 'abc']]

In other words, we have now created a list of lists, in which the first element of each sub-list is our intended dictionary key, and the second element is our intended dictionary value.

Well, we can use dict() to construct dictionaries in Python. And whadaya know, it works just fine with a sequence of two-element sequences. We normally think of feeding dict() a list of tuples, but it turns out that a list of lists works just fine, as well:

>>> dict([item.split('=') for item in query_string.split('&')])
{'x': '1', 'y': '2', 'z': 'abc'}

And just like that, we’ve created our dictionary.

Of course, we could also use a dictionary comprehension:

>>> { item.split('=')[0] : item.split('=')[1] 
    for item in query_string.split('&') }
{'a': '1', 'b': '2', 'c': 'xyz'}

Now, none of the steps here was particularly difficult. Indeed, while the syntax of comprehensions can be a bit complex, the real difficulty here was seeing the original string, and immediately thinking, “Wait, if I can just turn that into a list, then I can easily create a dictionary from that.”

These sorts of transformations are everywhere, and they allow us to take seemingly difficult tasks and turn them into relatively simple ones.

The relative speeds of str.format and %

My most recent blog post talked about the use of str.format instead of the % operator for interpolating values into strings. Some people who read the post wondered about their relative speeds.

I should first note that my first response to this is: I don’t really care that much. I’m not saying that speed isn’t important, or that optimization should never be done. Rather, my philosophy is that people are expensive and computers are cheap — and thus, anything we do to make people more productive, even if that comes at the expense of program speed, is probably fine.

Of course, that’s not always going to be true. Sometimes, you need (or just want) to squeeze more out of your computer. And to be a good programmer, you also need to know the relative advantages and disadvantages of the techniques you’re using.

So I decided to run a few, quick benchmarks on the relative speeds of str.format and %.  Sure enough, the % operator was a lot faster.  I ran my benchmarks the magic %timeit command that is built into the IPython interactive shell.  (If you’re using Python and aren’t using IPython, you should really switch ASAP.)  Note that in order to make things easier to read, I’m removing the IPython input and output prompts, and using >>> to denote where I entered text.

>>> name = 'Reuven'
>>> %timeit 'Hello there, {}'.format(name)
1000000 loops, best of 3: 243 ns per loop

>>> %timeit 'Hello there, %s' % name
10000000 loops, best of 3: 147 ns per loop

Wow.  As you can see, %timeit executed each of these lines of code 1,000,000 times. It then gave the average speed per loop. The % operator was, on average, about 100 ns faster than str.format. That shouldn’t come as a huge surprise, given that % is an operator (and thus doesn’t require a method call), doesn’t handle indexes and attributes, and can (I’m guessing) pass a great deal of its work off to C’s printf function.

Then again, is 100 ns really that long to wait for a formatted string?  I’m not so sure, to be honest.

What happens if we perform more than one interpolation?

>>> first = 'Reuven'
>>> last = 'Lerner'
>>> %timeit 'Hello there, {} {}'.format(first, last)
1000000 loops, best of 3: 371 ns per loop

>>> %timeit 'Hello there, %s %s' % (first, last)
1000000 loops, best of 3: 243 ns per loop

Each of these takes significantly longer to run than was the case with a single replacement. The difference between them continues to be about 120 ns per assignment — still not something to worry about too much, but the difference does exist.

What if I make the strings space-padded?

>>> %timeit 'Hello there, {:10} {:15}' % (first, last)
1000000 loops, best of 3: 459 ns per loop

>>> %timeit 'Hello there, %10s %15s' % (first, last)
1000000 loops, best of 3: 254 ns per loop

Now we see an even starker difference between the two ways of handling things. What about something like floating-point math, which takes longer?

>>> import math
>>> %timeit 'I love to eat {}'.format(math.pi)
1000000 loops, best of 3: 587 ns per loop

>>> %timeit 'I love to eat %f' % math.pi
1000000 loops, best of 3: 354 ns per loop

Limiting the number of decimals shown doesn’t seem to change the outcome very much:

>>> %timeit 'I love to eat {:.3}'.format(math.pi)
1000000 loops, best of 3: 582 ns per loop

>>>%timeit 'I love to eat %.3f' % math.pi
1000000 loops, best of 3: 329 ns per loop

UPDATE: Several people on Reddit took me to task for failing to consider the overhead of the str.format method call.  I mentioned this briefly above, but should have realized that there was an easy to to avoid this, namely aliasing the attributes (the method str.format and the float math.pi) to local variables:

>>> f = 'I love to eat {:.3}'.format
>>> p = math.pi
>>> %timeit f(p)
1000000 loops, best of 3: 489 ns per loop

>>> %timeit 'I love to eat %f' % p
1000000 loops, best of 3: 370 ns per loop

We still see significant overhead. Again, I’m guessing that a lot of this has to do with the overhead of a method vs. an operator. I’m not about to start looking at the bytecodes; this wasn’t meant to be a super-deep investigation or benchmark, but rather a quick check and comparison, and I think that on that front, it did the trick.

So, what have we learned?

  • Yes, str.format is slower than %.
  • The number of parameters you pass to str.format, and whether you then adjust the output with padding or a specified number of decimals, significantly influences the output speed.
  • That said, in many programs, the difference in execution speed is often 100 ns, which is not enough to cause trouble in many systems.

If speed is really important to you, then you should probably use %, and not str.format. However, if speed is more important than the maintainability or readability of your code, then I’d argue that Python is probably a poor choice of programming language.

Teaching an old dog new tricks — or, how I learned to love Python’s str.format, and gave up on %

I have been programming in Python for many years. One of the things that I wondered, soon after starting to work in Python, was how you can get Perl-style variable interpolation. After all, Perl (like the Unix shell) supports two types of quotes — single quotes (in which everything is taken literally) and double quotes (in which variables’ values are inserted). Thus, in Perl, you can do something like:

$name = 'Reuven';
print "Hello, $name\n";

And sure enough, it’ll print “Hello, Reuven”.

Because single and double quotes are equivalent in Python (so long as they come as a matched set), there is no variable interpolation. The technique that I learned years ago, when I started with Python, was that you could use the % operator on a string. In this context, % looks to the string on its left, determines how many values within the string need to be replaced, and then looks right to find those values. It then returns a new string, effectively interpolating the values. For example:

>>> name = 'Reuven'
>>> "Hello, %s" % name

'Hello, Reuven'

The above Python code works just fine, returning a string with a nice, personalized greeting. And indeed, for the length of my time working with Python, I have enjoyed using this % syntax. Yes, it’s a bit weird. And no, I cannot ever remember more than the absolute basics of printf’s various % codes, meaning that I either make everything a string (with %s), or I guess, or I look up the printf codes and what they do. But to be honest, I normally just use %s, and thus benefit additionally from the fact that Python will silently invoke “str” on the parameter.

The thing is, % is supposedly going away, or is at least deprecated. (A note on the python-dev list indicates that % will go away no sooner than 2022, which is a heckuva long time from now.) As of Python 2.6, not to mention Python 3.x, we have been told that it will eventually disappear, and that we shouldn’t use % any more. Instead, we should use the str.format method.

I have always mentioned str.format to my Python classes, but to be honest, I’ve usually relied upon % when giving live demonstrations and answering questions. And I would even encourage my students to use the % syntax, in part because I found it to be so much easier.

And yet.  I knew that I was doing something wrong, and I knew that I was probably misleading my students to some degree. Thus, in the last three classes I taught, I started to push harder in the direction of str.format. And that’s when I realized two things: (1) It’s just as easy as %, and actually easier in some ways, and (2) I hadn’t learned enough about str.format to use it, beyond the simplest ways. I thus spent a great deal of time researching it — and found out that str.format, while it takes some getting used to, is more than worth the effort.

Let’s start with the simplest case. I’d like to be able to say “Good morning” to someone, using both their first and last names. Assuming that I have variables named “first” and “last”, I can do this with the old syntax as follows:

>>> first = 'Reuven'
>>> last = 'Lerner'
>>> "Good morning, %s %s" % (first, last)

'Good morning, Reuven Lerner'

In this example, we already see one of the problems with the % syntax — namely, that if we have more than one value, we need to put it into a tuple. Perhaps this is logical and reasonable from Python’s perspective, but I can assure you that it surprises many of my students.

So, how would we do it using str.format? Pretty similarly, in many ways:

>>> "Good morning, {} {}".format(first, last)

'Good morning, Reuven Lerner'

Notice that we’ve changed things a bit here. No longer are we invoking a binary operator (%) on the string. Rather, we’re invoking a string method that takes a set of parameters. This is more logical and consistent. I can’t tell you how many of my students think that % is somehow connected to “print”, when in fact it’s connected (in the case of string formatting) to strings. Having to use put the “.format” at the end of the string makes the method call more obvious.

As you might already know, the “{} {}” in the string tells str.format to take its two parameters, and to insert them, in order, into the string. Because there are two arguments, we can only have two {} inside of the string. This is a bit harder to understand, both because having {} in Python reminds many people of a dictionary, and because the empty curly braces look a bit weird. But fine, I can live with that, and got used to it very quickly.

Where str.format quickly shows its advantages over %, however, is if I want to display the input parameters in reverse order. When I use %, there is no real way to do that. Plus, if I want to reuse a value passed to %, I cannot do so. With str.format, I can swap the order in which the inputs are displayed:

>>> "Good morning, {1} {0}".format(first, last)

'Good morning, Lerner Reuven'

Notice what happened in the above: If I just use “{} {}”, then str.format uses the two parameters in order. However, I’m also able to treat the parameters as a sequence, with indexes starting at 0. I can then insert them in reverse order, as I did above, or in the regular order:

>>> "Good morning, {0} {1}".format(first, last)

'Good morning, Reuven Lerner'

Note that if you explicitly state the field numbers, then you cannot rely on the automatic numbering.

Of course, this lets me also pass a sequence of values to be inserted, so long as we then use the splat (*) operator on it, to turn it into a parameter list:

>>> names = ('Reuven', 'Lerner')
>>> "Good morning, {} {}".format(*names)

'Good morning, Reuven Lerner'

You can also call str.format with keyword arguments. When you do this, you can then put a keyword name within the {}:

>>> "Good morning, {first} {last}".format(first='Reuven', last='Lerner')

'Good morning, Reuven Lerner'

The above really appeals to me. The named parameters are explicit (if a bit long), and the use of {first} and {last} is quite readable — certainly more so than %(first)s ever was with the % operator!

I can, of course, also pass a dictionary of names, using the ** operator to turn it into a set of keyword arguments:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Good morning, {first} {last}".format(**person)

'Good morning, Reuven Lerner'

I described all of these to my students in the last month, and I was pleasantly surprised to see how comfortable they were with the syntax. I’m sure that this reflects, to some degree, my comfort with the syntax, as well.

I should note that you can combine numeric and keyword arguments when working with str.format. I really suggest that you not do so. The results would look like this:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Good {0}, {first} {last}".format('morning', **person)

'Good morning, Reuven Lerner'


Now, the one thing that would appear to be missing from str.format is… well, formatting! The bad news is that str.format has a completely and different way of indicating how you want to format output. The good news is that it’s not too hard to learn and understand.

Let’s start with the easiest part: If you want to display a string within a fixed-width field, then you can do so by adding a colon (:) and then a number.  So to put my name in a fixed-width field of 10 spaces, we would say:

>>> "Your name is {name:10}".format(name="Reuven")

'Your name is Reuven    '

(Notice the trailing spaces after my name.)

In the above example, my name is left-justified. If I want it to be right-justified, I could use a > sign between the : and the number:

>>> "Your name is {name:>10}".format(name="Reuven")

'Your name is     Reuven'

And yes, I could have used an optional < symbol to say that my name should be left-justified within the field of 10 spaces in the first example.  Or I could center the text in a field of 10 spaces with the ^ specifier instead of < or >.

To pad the string with something other than a space, we specify it before the <, >, or ^ character. For example, if I’m moving to Hollywood, then perhaps I should do something like this:

>>> "Your name is {name:*^10}".format(name="Reuven")

'Your name is **Reuven**'

If I want to put the string in the (default) left-most position of the string, filling with characters on the right, then I must use the < specifier, so that the text will be on the left, and the stars on the right.

So it’s pretty clear that str.format is pretty snazzy when it comes to text. How about numbers? I wasn’t really sure how things would work here, but it turns out that they’re also quite straightforward. If you’re displaying integers, then you can go ahead and say:

>>> "The price is ${number}.".format(number=123)

'The price is $123.'

So far, we don’t see any difference between passing an integer and a string. And indeed, they share many characteristics. However, we might want to display an integer in a different way. We can do that using one of the (many) modifiers that str.format provides — letters placed just before the end of the closing } character. For example, we can get the price in binary (with a trailing “b”), or in hexadecimal (with a trailing “x”), as in the following example:

>>> "The price is ${number:x}.".format(number=123)

'The price is $7b.'

Of course, we can also zero-pad the number, such that it will always take up a fixed width. Just place a 0 between the colon and the width:

>>> "Your call is important to us. You are call #{number:05}.".format(number=123)

'Your call is important to us. You are call #00123.'

Notice that inside of the {}, we cannot put executable Python code. Instead, there is a mini-language that is separate and different from Python. However, there are two small exceptions to this rule: (1) We can retrieve any attribute with the standard . notation, and (2) we can retrieve a single item with the [] notation.

For example:

>>> class Foo(object):
        def __init__(self):
        self.x = 100
>>> f = Foo()
>>> 'Your number is {o.x}'.format(o=f)

'Your number is 100'n

Notice how we were able to retrieve the “x” attribute from the “f” object, which we mapped to “o” within the string. However, while you can retrieve an attribute, you cannot execute it. Thus, the following will not work:

>>> "Your name is {name.upper()}".format(name="Reuven")

AttributeError: 'str' object has no attribute 'upper()'

See what happened? I said “name.upper()”, in order to execute the method “str.upper” on “name”.  However, Python doesn’t want me to execute code there. So it takes the name of the attribute literally — and thus complained that there is no attribute “upper()”, with the parentheses. Of course, if you try it without the parentheses, it’ll work, for some value of “work”:

>>> "Your name is {name.upper}".format(name="Reuven")

'Your name is <built-in method upper of str object at 0x1028bf2a0>'

Similarly, we can retrieve an individual element of a sequence or mapping with []. However, we cannot use the slice notation for more than one element. For example:

>>> "Your favorite number is {n[3]}.".format(n=numbers)

'Your favorite number is 3.'


>>> "Your favorite numbers are {n[2:4]}.".format(n=numbers)

ValueError: Missing ']' in format string

The “:” character, which we use for slices, isn’t available in format strings, because it’s used to control the formatting of the output.

You can, of course, use [] on a dictionary, as well. However — and this is a bit weird for Python — we omit the quote marks, even when our key is a string. For example:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Your name is {p[first]}.".format(p=person)

'Your name is Reuven.'

If we were to include the quotes…

>>> "Your name is {p['first']}.".format(p=person)

KeyError: "'first'"

There is actually a lot more to str.format than what I have shared here. In particular,  each type has its own format specifications, which means that you can do certain things with floats (e.g., setting the precision) that you cannot do with strings.

You can even add formatting functionality to your own Python classes, such that they’ll be displayed in the way that you want, along with format specifiers that you define.

If you want to learn more about this, I’d definitely suggest reading PEP 3101, which describes str.format. I’d also suggest a slide show by Eric Smith, which summarizes things nicely. Finally, the Python documentation has some excellent examples, including a guide for moving from % to str.format.

I hope that this was helpful and useful! If you enjoyed this blog post, check out my many other resources, including my free e-mail course on Python scoping, and my free Webinar on functional programming in Python.

Three Pythonic products: A (free) Webinar, a course, and an ebook

I love developing software.  I also love helping people to learn how to develop better. That’s why I have been teaching programming classes for more than a decade, and why I write about programming. There is so much to learn; it’s a rare day on which I don’t learn something new, and it’s a rare week in which I don’t apply some new understanding to a problem that I’m solving for a client.

Now that I have finished my PhD, I have some more time to focus on creating products that I believe will help people to program better. I’m pleased to announce three initial such products, all aimed at Python developers who want to improve their skills:

  1. A free Webinar (“How functional programming will make you a better Python programmer“), scheduled for Monday, September 15th, at 2 p.m. Eastern Time. The Webinar will consist of a 45-minute presentation, followed by a Q&A period. My goal is to describe functional programming, and then demonstrate some of the functional techniques that are available in Python. If you are a Python programmer who has heard about functional programming but doesn’t really know what it is, then I think you’ll get a lot out of this free seminar. Please register and attend! Even if you’re not interested in my other products, I’d love to see you at this Webinar — if only to learn what topics people do want to learn, and what I should address in future Webinars, books, and online classes.
  2. For those who want to dig deeper into functional programming in Python, I’ll be giving a full-day, live, online course on the subject. This course, which will include lectures, demos, and many exercises, begins with an introduction to functions in Python, then describes how to pass functions as parameters, list/set/dict comprehensions, customizing list.sort and sorted, lambda, map/filter/zip, reduce, and the “functools” module. This class, like all of my classes, will be intense and highly interactive — but will give you a new perspective on programming in general, and particularly in Python. You can register for this one-day course at EventBrite; it’ll be given on Sunday, September 21st and again on Tuesday, September 23rd.
  3. If you have taken a Python course, but feel that you need to practice Python more thoroughly before you feel like you’re truly fluent in the language, then you might be interested in my ebook, “Practice Makes Python.” The book will contain about 50 exercises in various aspects of Python, with the aim of helping you to learn through doing. I hope to release the ebook about a month from now, and am basing many of the exercises on my courses, as well as comments I’ve received from many of my students. If you’re interested in the book, you should sign up for the announcement list, where I’ll be providing sample content and behind-the-scenes information.

These products (well, the paid ones, anyway) come with a full, 100% money-back guarantee. And of course, if you have questions, you can always contact me via e-mail.

The easiest way to return to the last Git branch

I don’t know about you, but it’s common for me to switch between branches in Git.  After all, that’s one of the main advantages of using Git — the incredible ease with which you can create and merge branches.

Just a few minutes ago, I was in the “adwords” branch of an application I’m working on.  I wanted to go back to “master”, make sure that I hadn’t missed any commits from the central repository, and then go back to “adwords”.  If there were any commits in “master” that I was missing in “adwords”, I figured that I would just rebase from “master” to “adwords”.

So I did a “git checkout master”, and found that I was up to date with the central Git server. For reasons that I can’t explain, I then decided to try something out: Instead of returning to the previous branch with “git checkout adwords”, I instead typed

git checkout -

(That’s a minus sign after the word “checkout.”)

Sure enough, I returned to the “adwords” branch!

Now, there is a fair amount of logic to this: In the Unix shell, you can typically return to the previous directory with “cd -“, which has proven to be quite useful over the years.  In Git, of course, branches are just aliases to commits.  So “git checkout -” is returning you to the previous branch, but it’s really just taking you back to whatever the last commit was that you worked on.

I just checked the Git documentation (“git checkout –help”), and it seems that this is a special case of a more generalizable command structure:

As a special case, the "@{-N}" syntax for the N-th last branch/commit checks out 
branches (instead of detaching). You may also specify - which is synonymous
with "@{-1}".

I can’t imagine wanting to tell Git to return to the 5th-most-recent branch that I worked on, so this generalized formula seems a bit much to me.

I predict that this trick will save me precious seconds every day, all of which I squandered in writing this blog post.  But I do think that this is a super-cool trick and feature, and demonstrates once again how clever and useful Git is.

Remember: Don’t mix = and NULL

I was recently performing an analysis on the database for my dissertation software, and wanted to look at one particular set of rows.  The table I was looking at is called Nodes, and because each node (represented by a row in the table) can optionally belong to a group, the group_id column allows for NULL values.

So, if I want to find all of the nodes in group 2, I can say

SELECT * FROM Nodes WHERE group_id = 2;

My data consists (of course) of several tables, including Versions, which has a many-to-one relationship with Nodes.  If I want all of the versions for nodes in group 2, I can say:

SELECT COUNT(*) FROM Versions WHERE node_id in (SELECT id FROM Nodes WHERE group_id = 2);

Now, let’s say that I want to get all of the versions that are not from group 2.  How would I do that?  Well, one way would be:

SELECT COUNT(*) FROM Versions WHERE node_id 
   NOT IN (SELECT id FROM Nodes WHERE group_id = 2);

But at some point, I flipped the logic, turning the NOT IN into IN, and = into <>:

SELECT COUNT(*) FROM Versions WHERE node_id 
  IN (SELECT id FROM Nodes WHERE group_id <> 2);

That, of course, was a bug.  That’s because SQL uses trinary (or ternary) logic, not the binary true-false logic to which we’re accustomed.  The result of comparing anything with NULL results in a NULL.  For example, consider the following table:

select * from truth_test ;
| val    |
| t      |
| f      |
| [null] |

I can create a truth table pretty easily:

SELECT val, val = true as is_true, val=false as is_false, 
       val=null as equal_null, val is null as is_null 
FROM truth_test;
| val    | is_true | is_false | equal_null | is_null |
| t      | t       | f        | [null]     | f       |
| f      | f       | t        | [null]     | f       |
| [null] | [null]  | [null]   | [null]     | t       |

You can see, then, that TRUE is equal to TRUE, and FALSE is equal to FALSE, which we would expect.  But using the = operator with NULL is always going to result in NULL — even with itself!

If you want to compare with NULLs, then you need to use IS NULL.  (Or you can use a function such as COALESCE, which returns the first non-NULL value passed to it.)  And that’s the mistake that I made in my queries yesterday: I was trying to get all of the nodes for which group_id was not 2.  But by using =, I inadvertently excluded all of the rows for which group_id was NULL — in other words, all of the nodes that weren’t assigned to a group.  Which, it turns out, was the majority of them.  Whoops.

My solution was to create a view containing all of the IDs that I want to exclude:

CREATE OR REPLACE VIEW group2_model_ids
FROM nodes
WHERE nodes.group_id = 2;

Now, if I want the number of versions that aren’t in group 2, I can use a subselect, as follows:

FROM Versions 
WHERE node_id NOT IN (SELECT id FROM group2_model_ids);

Of course, I could have simply rewritten the query, either to use COALESCE or to check for both NULL and not-NULL values.  But I prefer this way, especially since I’m only dealing with a few tens of thousands of rows of data in this table, in part because it gives me a name and abstraction that I can use in future queries.

Book Review: The Undercover Economist Strikes Back

In order to get an undergraduate degree from MIT, at least when I was there, you needed to take a certain number of humanities and social-science courses.  This was to stop you from coming out a complete one-dimensional student; the idea was that rounding out your education with knowledge from other fields was good for you as a person, and also good for you as an engineer or scientist.  (And yes, I realize that not everyone at MIT studied science or engineering, but those were the overwhelming favorites.)  One of the most popular social sciences that people took was economics — which is a social science, although MIT’s version actually included a great deal of math.

At the time, I tended to be quite dismissive of economics.  I didn’t think that it could possibly be interesting, or why so many of my friends were taking so many courses in that field.  What insights could they gain?

And then, just before I graduated, MIT Press had one of its amazing sales on overstock books.  I bought a book by a then-professor at MIT, named Paul Krugman, called “The Age of Diminished Expectations.”  Reading this book was quite a revelation for me; I suddenly realized that economics was complex, fascinating, and described the world in all sorts of interesting ways.  For years, I read and followed Krugman’s writing, in Slate and then (of course) the New York Times, gleaning what I could about economics.  (I also happen to subscribe to many of his political views, but that’s secondary.)  Whenever I could find an interesting and well-written book about economics, I would get it, because I found it to be so interesting and compelling.

Several years ago, a friend asked if I had read “The Undercover Economist,” by Tim Harford.  I hadn’t, but decided that perhaps it was worth a read, and found it to be delightful, but in a different way from Krugman. Harford isn’t an economics researcher, but he knows just how to put economics research into words and a perspective that everyone can understand.  His examples are often drawn from pop culture, and he’s able to distill the academic debates and intrigue to their essence.  The fact that he’s very funny only adds to his appeal.  I’ve since become quite a fan of Harford’s, listening (among other things) to the “More or Less” podcast from the BBC that he hosts, a sort of Mythbusters of statistics (Mathbusters?).

So it should come as no surprise that I ordered his latest book, “The Undercover Economist Strikes Back,” almost as soon as it came out earlier this year.  I just read it cover to cover over the weekend, and came away delighted.  As someone who has been reading Krugman’s work for years, and who also listens to NPR’s excellent Planet Money podcast, I can’t say that there was a huge amount of new information in this book.  But it was written so well, and put things into such nice context, that this doesn’t matter.

Harford has a gift for making economics not only understandable, but also interesting and relevant to our own lives.  In The Undercover Economist, he describes microeconomics, which describes how businesses and individuals respond to incentives.  In this new book, he describes macroeconomics, which is a different kettle of fish altogether — it’s about how governments and economies work.  If you think of macroeconomics as a complex system, then it’s no surprise that the aggregate behaves differently from its individual, constituent agents. (This, it took me many years to learn, is a much better explanation than what economics instructors tell their students, which is simply that “macro is different from micro.”)

The book talks about all sorts of great stuff, starting with recessions, moving onto unemployment, and covering a large number of topics that are in the newspaper each day, that affect each and every one of us, and which probably seem strange or detached from our reality, but which are actually quite important — particularly if you’re in a democracy, and need to separate real economics from crazy talk.

Harford includes a great definition and discussion of what money is, and brings up the famous story of the island of Yap, which used huge, largely immovable stones as money.  He also introduces the different schools of thought on the subject, and where (and how) they differ — and how much of what politicians in the US and Europe have been saying and doing over the last five years has been foolish or misplaced.

The question-and-answer format in which he wrote the book is a little tedious, but much less than I expected it to be.  Really?  Yes, really.

In my mind, perhaps the topic that was most obviously missing from the book was a discussion of currency, and how that can affect an economy.  If you live in the US, or even in England or Europe, you can largely ignore currency issues.  Sure, there are exchange rates, and yes, they affect you to some degree, but it’s not a huge deal.

In Israel, by contrast, the exchange rate is  a huge deal, because Israel imports and exports so much.  The dollar’s rise and fall affects everyone, from high-tech software companies to people shopping at the supermarket. The ways in which the Bank of Israel played with exchange rates and buying dollars, just to keep things relatively stable (while claiming that they were doing no such thing) are impressive, and point to the sorts of challenges that small, trade-oriented economies have but that large ones don’t.  I’m not sure if this was an omission due to time or space constraints, or if as someone living in England, Harford hasn’t had to think or worry much about currency issues.

I’ve changed my tune 100 percent since I was an undergrad; i now find economics to be totally fascinating, and very much enjoy reading the sorts of book that Harford has put out.  If you’ve always wondered what macroeconomics is, or what the newspapers is talking about when they mentioned recessions, or whether the politicians suggesting budget cuts during the latest recession were saying the most obviously brilliant thing or the most foolish thing imaginable, Harford’s book is a fun, interesting read, and is highly recommended.