In Python, it’s all about the attributes

Newcomers to Python are often amazed to discover how easily we can create new classes. For example:

class Foo(object): 
    pass

is a perfectly valid (if boring) class. We can even create instances of this class:

f = Foo()

This is a perfectly valid instance of Foo. Indeed, if we ask it to identify itself, we’ll find that it’s an instance of Foo:

>>> type(f)
<class '__main__.Foo'>

Now, while “f” might be a perfectly valid and reasonable instance of Foo, it’s not very useful. It’s at this point that many people who have come to Python from another language expect to learn where they can define instance variables. They’re relieved to know that they can write an  __init__ method, which is invoked on a new object immediately after its creation. For example:

class Foo(object):
    def __init__(self, x, y):
      self.x = x
      self.y = y

>> f = Foo(100, 'abc')

>>> f.x
100
>>> f.y
'abc'

On the surface, it might seem like we’re setting two instance variables, x and y, on f, our new instance of Foo. And indeed, the behavior is something like that, and many Python programmers think in these terms. But that’s not really the case, and the sooner that Python programmers stop thinking in terms of “instance variables” and “class variables,” the sooner they’ll understand how much of Python works, why objects work in the ways that they do, and how “instance variables” and “class variables” are specific cases of a more generalized system that exists throughout Python.

The bottom line is inside of __init__, we’re adding new attributes to self, the local reference to our newly created object.  Attributes are a fundamental part of objects in Python. Heck, attributes are fundamental to everything in Python. The sooner you understand what attributes are, and how they work, the sooner you’ll have a deeper understanding of Python.

Every object in Python has attributes. You can get a list of those attributes using the built-in “dir” function. For example:

>>> s = 'abc'
>>> len(dir(s))
71
>>> dir(s)[:5]
['__add__', '__class__', '__contains__', '__delattr__', '__doc__']

>>> i = 123
>>> len(dir(i))
64
>>> dir(i)[:5]
['__abs__', '__add__', '__and__', '__class__', '__cmp__']

>>> t = (1,2,3)
>>> len(dir(t))
32
>>> dir(t)[:5]
['__add__', '__class__', '__contains__', '__delattr__', '__doc__']

As you can see, even the basic data types in Python have a large number of attributes. We can see the first five attributes by limiting the output from “dir”; you can look at them yourself inside of your Python environment.

The thing is, these attribute names returned by “dir” are strings. How can I use this string to get or set the value of an attribute? We somehow need a way to translate between the world of strings and the world of attribute names.

Fortunately, Python provides us with several built-in functions that do just that. The “getattr” function lets us get the value of an attribute. We pass “getattr” two arguments: The object whose attribute we wish to read, and the name of the attribute, as a string:

>>> getattr(t, '__class__')
tuple

This is equivalent to:

>>> t.__class__
tuple

In other words, the dot notation that we use in Python all of the time is nothing more than syntactic sugar for “getattr”. Each has its uses; dot notation is far easier to read, and “getattr” gives us the flexibility to retrieve an attribute value with a dynamically built string.

Python also provides us with “setattr”, a function that takes three arguments: An object, a string indicating the name of the attribute, and the new value of the attribute. There is no difference between “setattr” and using the dot-notation on the left side of the = assignment operator:

>>> f = Foo()
>>> setattr(f, 'x', 5)
>>> getattr(f, 'x')
5
>>> f.x
5
>>> f.x = 100
>>> f.x
100

As with all assignments in Python, the new value can be any legitimate Python object. In this case, we’ve assigned f.x to be 5 and 100, both integers, but there’s no reason why we couldn’t assign a tuple, dictionary, file, or even a more complex object. From Python’s perspective, it really doesn’t matter.

In the above case, I used “setattr” and the dot notation (f.x) to assign a new value to the “x” attribute. f.x already existed, because it was set in __init__. But what if I were to assign an attribute that didn’t already exist?

The answer: It would work just fine:

>>> f.new_attrib = 'hello'
>>> f.new_attrib
'hello' 

>>> f.favorite_number = 72
>>> f.favorite_number
72

In other words, we can create and assign a new attribute value by … well, by assigning to it, just as we can create a new variable by assigning to it. (There are some exceptions to this rule, mainly in that you cannot add new attributes to many built-in classes.) Python is much less forgiving if we try to retrieve an attribute that doesn’t exist:

>>> f.no_such_attribute
AttributeError: 'Foo' object has no attribute 'no_such_attribute'

So, we’ve now seen that every Python object has attributes, that we can retrieve existing attributes using dot notation or “getattr”, and that we can always set attribute values. If the attribute didn’t exist before our assignment, then it certainly exists afterwards.

We can assign new attributes to nearly any object in Python. For example:

def hello():
    return "Hello"

>>> hello.abc_def = 'hi there!'

>>> hello.abc_def
'hi there!'

Yes, Python functions are objects. And because they’re objects, they have attributes. And because they’re objects, we can assign new attributes to them, as well as retrieve the values of those attributes.

So the first thing to understand about these “instance variables” that we oh-so-casually create in our __init__ methods is that we’re not creating variables at all. Rather, we’re adding one or more additional attributes to the particular object (i.e., instance) that has been passed to __init__. From Python’s perspective, there is no difference between saying “self.x = 5” inside of __init__, or “f.x = 5” outside of __init__. We can add new attributes whenever we want, and the fact that we do so inside of __init__ is convenient, and makes our code easier to read.

This is one of those conventions that is really useful to follow: Yes, you can create and assign object attributes wherever you want. But it makes life so much easier for everyone if you assign all of an object’s attributes in __init__, even if it’s just to give it a default value, or even None. Just because you can create an attribute whenever you want doesn’t mean that you should do such a thing.

Now that you know every object has attributes, it’s time to consider the fact that classes (i.e., user-defined types) also have attributes. Indeed, we can see this:

>>> class Foo(object):
        pass

Can we assign an attribute to a class?  Sure we can:

>>> Foo.bar = 100
>>> Foo.bar
100

Classes are objects, and thus classes have attributes. But it seems a bit annoying and roundabout for us to define attributes on our class in this way. We can define attributes on each individual instance inside of __init__. When is our class defined, and how can we stick attribute assignments in there?

The answer is easier than you might imagine. That’s because there is a fundamental difference between the body of a function definition (i.e., the block under a “def” statement) and the body of a class definition (i.e., the block under a “class” statement). A function’s body is only executed when we invoke the function. However, a the body of the class definition is executed immediately, and only once — when we define the function. We can execute code in our class definitions:

class Foo(object):
    print("Hello from inside of the class!")

Of course, you should never do this, but this is a byproduct of the fact that class definitions execute immediately. What if we put a variable assignment in the class definition?

class Foo(object):
    x = 100

If we assign a variable inside of the class definition, it turns out that we’re not assigning a variable at all. Rather, we’re creating (and then assigning to) an attribute. The attribute is on the class object. So immediately after executing the above, I can say:

Foo.x

and I’ll get the integer 100 returned back to me.

Are you a little surprised to discover that variable assignments inside of the class definition turn into attribute assignments on the class object? Many people are. They’re even more surprised, however, when they think a bit more deeply about what it must mean to have a function (or “method”) definition inside of the class:

>>> class Foo(object):
        def blah(self):
            return "blah"

>>> Foo.blah
<unbound method Foo.blah>

Think about it this way: If I define a new function with “def”, I’m defining a new variable in the current scope (usually the global scope). But if I define a new function with “def” inside of a class definition, then I’m really defining a new attribute with that name on the class.

In other words: Instance methods sit on a class in Python, not on an instance. When you invoke “f.blah()” on an instance of Foo, Python is actually invoking the “blah” method on Foo, and passing f as the first argument. Which is why it’s important that Python programmers understand that there is no difference between “f.blah()” and “Foo.blah(f)”, and that this is why we need to catch the object with “self”.

But wait a second: If I invoke “f.blah()”, then how does Python know to invoke “Foo.blah”?  f and Foo are two completely different objects; f is an instance of Foo, whereas Foo is an instance of type. Why is Python even looking for the “blah” attribute on Foo?

The answer is that Python has different rules for variable and attribute scoping. With variables, Python follows the LEGB rule: Local, Enclosing, Global, and Builtin. (See my free, five-part e-mail course on Python scopes, if you aren’t familiar with them.)  But with attributes, Python follows a different set of rules: First, it looks on the object in question. Then, it looks on the object’s class. Then it follows the inheritance chain up from the object’s class, until it hits “object” at the top.

Thus, in our case, we invoke “f.blah()”. Python looks on the instance f, and doesn’t find an attribute named “blah”. Thus, it looks on f’s class, Foo. It finds the attribute there, and performs some Python method rewriting magic, thus invoking “Foo.blah(f)”.

So Python doesn’t really have “instance variables” or “class variables.”  Rather, it has objects with attributes. Some of those attributes are defined on class objects, and others are defined on instance objects. (Of course, class objects are just instances of “type”, but let’s ignore that for now.)  This also explains why people sometimes think that they can or should define attributes on a class (“class variables”), because they’re visible to the instances. Yes, that is true, but it sometimes makes more sense than others to do so.

What you really want to avoid is creating an attribute on the instance that has the same name as an attribute on the class. For example, imagine this:

class Person(object):
    population = 0
    def __init__(self, first, last):
        self.first = first        
        self.last = last
        self.population += 1

p1 = Person('Reuven', 'Lerner')
p2 = Person('foo', 'bar')

This looks all nice, until you actually try to run it. You’ll quickly discover that Person.population remains stuck at 0, but p1.population and p2.population are both set to 1. What’s going on here?

The answer is that the line

self.population += 1

can be turned into

self.population = self.population + 1

As always, the right side of an assignment is evaluated before the left side. Thus, on the right side, we say “self.population”. Python looks at the instance, self, and looks for an attribute named “population”. No such attribute exists. It thus goes to Person, self’s class, and does find an attribute by that name, with a value of 0. It thus returns 0, and executes 0 + 1. That gives us the answer 1, which is then passed to the left side of the assignment. The left side says that we should store this result in self.population — in other words, an attribute on the instance! This works, because we can always assign any attribute. But in this case, we will now get different results for Person.population (which will remain at 0) and the individual instance values of population, such as p1 and p2.

We can actually see what attributes were actually set on the instance and on the class, using a list comprehension:

class Foo(object):
    def blah(self):
        return "blah"

>>> [attr_name for attr_name in dir(f) if attr_name not in dir(Foo)]
[]

>>> [attr_name for attr_name in dir(Foo) if attr_name not in dir(object)]
['__dict__', '__module__', '__weakref__', 'blah']

In the above, we first define “Foo”, with a method “blah”. That method definition, as we know, is stored in the “blah” attribute on Foo. We haven’t assigned any attributes to f, which means that the only attributes available to f are those in its class.

If you liked this explanation, then you’ll likely also enjoy my ebook, “Practice Makes Python,” with 50 exercises meant to improve your Python fluency.

24 thoughts on “In Python, it’s all about the attributes”

  1. Nice reading!

    You have a typo: when you give the example about self.population += 1, in the text you say 1 + 1 = 2 but it should be 1 + 0 = 1.

  2. Now I expect a follow up on how, why or why not we can make .population hold the “right” value in the original class.

    1. Ah, sorry — I should indeed have spelled that out.

      The solution is to explicitly name the class, such that we’re both setting and retrieving from the class attribute. That is, instead of

      self.population += self.population + 1

      we should say:

      Person.population += Person.population + 1

      That uses, and updates, the existing class attribute, giving us an accurate count.

      Does this help?

      1. I guess you meant:

        Person.population = Person.population + 1

        or:

        Person.population += 1

        (and you could also use):

        self.__class__.population += 1

      2. There’s a way of doing that without explicitly naming the class. Just as for accessing methods and attributes of a parent class we use super.something instead of explicitly naming the parent, here you could do this:

        class Person(object):
        population = 0
        def __init__(self, first, last):
        self.first = first
        self.last = last
        self.__class__.population += 1
        self.population = self.__class__.population


  3. >>> class Foo(object):
    ... x = 100
    ... def blah(self):
    ... return "blah"
    ...
    >>> Foo.blah
    <unbound method Foo.blah>
    >>> Foo.x
    100

    Why is Foo.blah “unbounded” but Foo.x is not? Aren’t both “blah” and “x” attributes of the class object?

    1. Yes, both “blah” and “x” are attributes of the class object, Foo.

      x is assigned to an integer.

      blah is assigned to a function. However, it’s not *just* a function. It’s a method, and an “unbounded” method, at that. If you define a function inside of the “class” keyword, that function definition is treated specially, so that you cannot invoke Foo.blah() (as if it were a class or static method), but so that you can say Foo().blah() (as an instance method).

  4. Thank you for the article.
    Additional question: is it somehow possible to prohibit creation of new attribute outside __init__ ?

    1. The short answer is “no.” It’s always possible to add new attributes to Python objects.

      There are some tricks that you can play, however, in order to make it more difficult. You can, for example, define a class-level attribute named __slots__ that names the attributes you wish to define. If you do that, then Python changes the way in which it stores attributes, making it impossible to add them in the normal way.

      But the general rule in Python is that things are open and dynamic, which can potentially lead to problems, but usually doesn’t.

        1. Many things can go wrong if you mistype names, adding a new attribute is just one of them. The answer is to unit test everything, all the time.

  5. Hi, thanks for the article.

    I think the example you used, though — self.population += 1 — is not a very good one. It works in your case, because Person.population was defined as an int, which is immutable; thus, indeed, it is equivalent to self.population = self.population + 1. But if the initial value is mutable, then the meaning changes to self.population = self.population.__iadd__(1) — the assignment is preceded by in-place mutation, changing the class attribute.

    Using augmented assignment (+= and its siblings) in this way is, IMO, error-prone and should be avoided.

    1. I think that it’s a very good example of what happens when people don’t understand the Python object model — namely, that Python first searches for attributes on an instance, and then on a class, and that we can have the same attribute set on these objects. I have seen *many* people confused by this concept, and surprised by what happens in their Python code.

      Whether I would really keep track of population in this way is another question. The answer is probably no, because I would use a database to keep track of objects for me.

      My goal (here, and in my book) is to use exercises as a way to first confuse them and then un-confuse them. In so doing, I hope to illuminate areas of Python that people don’t quite understand, in order to deepen their understanding of the language and thus use it more intelligently.

      I agree that understanding the difference between + and +=, and the need to define __iadd__ if you want your object to handle +=, is a good point, one which I should likely emphasize more in my classes and writing.

  6. The following example is in error:

    class Foo(object):
    def blah(self):
    return “blah”

    >>> [attr_name for attr_name in dir(f) if attr_name not in dir(Foo)]
    ———————————–
    BECAUSE: NameError: name ‘f’ is not defined

  7. “However, a the body of the class definition is executed immediately, and only once — when we define the function. ”

    You mean:
    “However, a the body of the class definition is executed immediately, and only once — when we define the CLASS. ” ??

  8. Thanks for your post!
    But when I send the request for《free, five-part e-mail course on Python scopes》, there is an error:

    Mailing List Not Active
    This mailing list is not currently active.
    Please notify the website owner.

Leave a Reply

Your email address will not be published. Required fields are marked *

× eight = thirty two