Last chance: Weekly Python Exercise registration closes soon

Weekly Python Exercise logoYou probably want to understand Python better, use it more efficiently, and write code that you (and others) can maintain — for yourself, your current job, and your career.

Weekly Python Exercise lets you make that improvement.  Over the course of a year, you learn to solve more interesting, useful, and complex problems.  You’ll learn how to use decorators, generators, and comprehensions, as well as inner functions, lambdas, and magic methods.

And you’ll learn not just via your own work, but by collaborating with other Python developers around the world in our private forum.  And in monthly office hours with me.

Registration for Weekly Python Exercise is closing soon, and I only open 1-2 cohorts each year.  If you want to level up your Python, then WPE is the best way I know of to do so.

Combine the many benefits of WPE for your career with my money-back guarantee, my “forever free” policy gaining you entry into future cohorts, and the discounts I offer to students, retirees/pensioners, and people living in non-wealthy countries, and I hope you’ll agree that Weekly Python Exercise is a great investment.

Not sure?  Or are you eligible for a discount code?  Or just want to see my e-mail is handled by a bot?  In any case, just e-mail me your questions.

Don’t delay. Weekly Python Exercise is starting soon — and along with it, your mastery of Python.

Click here to sign up for Weekly Python Exercise.

Weekly Python Exercise: Registration closes in two days

Weekly Python Exercise logoThis is just a reminder that registration for the next cohort of Weekly Python Exercise, my course that combines exercises and community to turn you into an advanced Python developer, closes in just two days, on September 18th.

If you’ve always wanted to improve your understanding of such topics as functions, objects, decorators, generators, comprehensions, and lambda (among other things), then WPE is for you!  I only open 1-2 cohorts per year, so if you want to level up your Python — and stop relying on Stack Overflow and Google to answer your questions — be sure to check it out.

With this cohort, I’m adding tests with PyTest to most exercise specifications! This means that you’ll not only get better at coding, but at testing, too.

You can read more at http://WeeklyPythonExercise.com/.

Remember: I offer discounts to students, pensioners, and residents of countries that aren’t among the world’s 30 richest.  Just e-mail me at reuven@lerner.co.il for a coupon code.

And: Once you buy WPE, my “forever free” policy means that you can join future cohorts, too.

And of course: There’s a 100% money-back guarantee.

I’m sure that WPE is the best way to improve your Python, and thus improve your career as a developer or data scientist. Questions? Just e-mail me at reuven@lerner.co.il, and I’ll respond ASAP.

 

Announcing: Weekly Python Exercise, Autumn 2018 cohort

Weekly Python Exercise logoJust about every day for the last decade, I’ve taught Python to developers at companies around the world. And if there’s anything that those developers want, it’s to improve their Python fluency.

Being a more fluent Python developer doesn’t only mean being able to solve problems faster and better — although these are nice benefits, for sure!

Being a more fluent Python developers means that you can solve bigger, more complex problems. That’s not only worth something to you, but to your employer, as well.

Developers know this, and are always asking me how they can improve their skills once my courses are over.

My solution is Weekly Python Exercise, a year-long course in which you get to improve your existing Python skills, and learn new ones, as you solve a new exercise each week. Because learning is always more effective with other people, WPE students can use our private forum to discuss solutions, collaborate on the best strategies, and even (much as I hate to admit it) tell me when my solution could have been better.

Oh, and there are live, monthly office hours as well, when you can ask me questions about the exercises. Or Python in general.  Or anything, really.  It’s your chance to pick my brain, in real time.

If you are tired of searching Stack Overflow every time you start a new Python project, and want to become a more fluent developer, then Weekly Python Exercise is for you. 

Note that it’s not a course for beginners!  WPE is meant for people who have already learned Python, and are using it, but want to gain fluency.  We’ll be dealing with all sorts of advanced topics, too — from inner functions to generators to decorators to object-oriented techniques.

Sound good? Learn more at https://WeeklyPythonExercise.com/.  And if you have any questions, then don’t hesitate to e-mail me at reuven@lerner.co.il.  I’ll be delighted to answer your question!

Avoiding Windows backslash problems with Python’s raw strings

I’m a Unix guy, but the participants in my Python classes overwhelmingly use Windows. Inevitably, when we get to talking about working with files in Python, someone will want to open a file using the complete path to the file.  And they’ll end up writing something like this:

filename = 'c:\abc\def\ghi.txt'

But when my students try to open the file, they discover that Python gives them an error, indicating that the file doesn’t exist!  In other words, they write:

for one_line in open(filename):    print(one_line)

What’s the problem?  This seems like pretty standard Python, no?

Remember that strings in Python normally contain characters. Those characters are normally printable, but there are times when you want to include a character that isn’t really printable, such as a newline.  In those cases, Python (like many programming languages) includes special codes that will insert the special character.

The best-known example is newline, aka ‘\n’, or ASCII 10. If you want to insert a newline into your Python string, then you can do so with ‘\n’ in the middle.  For example:

s = 'abc\ndef\nghi'

When we print the string, we’ll see:

>>> print(s)

abc

def

ghi

What if you want to print a literal ‘\n’ in your code? That is, you want a backslash, followed by an “n”?  Then you’ll need to double the backslash:The “\\” in a string will result in a single backslash character. The following “n” will then be normal. For example:

s = 'abc\\ndef\\nghi'

When we say:

>>> print(s)

abc\ndef\nghi

It’s pretty well known that you have to guard against this translation when you’re working with \n. But what other characters require it? It turns out, more than many people might expect:

  • \a — alarm bell (ASCII 7)
  • \b — backspace (ASCII
  • \f — form feed
  • \n — newline
  • \r — carriage return
  • \t — tab
  • \v — vertical tab
  • \ooo —  character with octal value ooo
  • \xhh — character with hex value hh
  • \N{name} — Unicode character {name}
  • \uxxxx — Unicode character with 16-bit hex value xxxx
  • \Uxxxxxxxx — Unicode character with 32-bit hex value xxxxxxxx

In my experience, you’re extremely unlikely to use some of these on purpose. I mean, when was the last time you needed to use a form feed character? Or a vertical tab?  I know — it was roughly the same day that you drove your dinosaur to work, after digging a well in your backyard for drinking water.

But nearly every time I teach Python — which is, every day — someone in my class bumps up against one of these characters by mistake. That’s because the combination of the backslashes used by these characters and the backslashes used in Windows paths makes for inevitable, and frustrating, bugs.

Remember that path I mentioned at the top of the blog post, which seems so innocent?

filename = 'c:\abc\def\ghi.txt'

It contains a “\a” character. Which means that when we print it:

>>> print(filename)
c:bc\def\ghi.txt

See? The “\a” is gone, replaced by an alarm bell character. If you’re lucky.

So, what can we do about this? Double the backslashes, of course. You only need to double those that would be turned into special characters, from the table I’ve reproduced above: But come on, are you really likely to remember that “\f” is special, but “\g” is not?  Probably not.

So my general rule, and what I tell my students, is that they should always double the backslashes in their Windows paths. In other words:

>>> filename = 'c:\\abc\\def\\ghi.txt'

>>> print(filename)
c:\abc\def\ghi.txt

It works!

But wait: No one wants to really wade through their pathnames, doubling every backslash, do they?  Of course not.

That’s where Python’s raw strings can help. I think of raw strings in two different ways:

  • what-you-see-is-what-you-get strings
  • automatically doubled backslashes in strings

Either way, the effect is the same: All of the backslashes are doubled, so all of these pesky and weird special characters go away.  Which is great when you’re working with Windows paths.

All you need to do is put an “r” before the opening quotes (single or double):

>>> filename = r'c:\abc\def\ghi.txt'

>>> print(filename)
c:\abc\def\ghi.txt

Note that a “raw string” isn’t really a different type of string at all. It’s just another way of entering a string into Python.  If you check, type(filename) will still be “str”, but its backslashes will all be doubled.

Bottom line: If you’re using Windows, then you should just write all of your hard-coded pathname strings as raw strings.  Even if you’re a Python expert, I can tell you from experience that you’ll bump up against this problem sometimes. And even for the best of us, finding that stray “\f” in a string can be time consuming and frustrating.

PS: Yes, it’s true that Windows users can get around this by using forward slashes, like we Unix folks do. But my students find this to be particularly strange looking, and so I don’t see it as a general-purpose solution.

A quick intro to the Unix “find” utility

One of the most powerful Unix command-line utilities is “find” — but it also has a huge number of options, and most of the documentation I’ve read on “find” is hard to follow and understand.  That’s a shame, because once you understand what “find” does and how it works, you can accomplish quite a bit.  I hope that this post will show you some of the basics of “find”, so that you can take advantage of it in your day-to-day work.

The basic idea is that “find” looks through a directory (and all of its subdirectories), applying one or more filters when deciding which files are interesting, and executing one or more actions on matching files.

So, what can you do with “find”?

  • Move any backup log older than 30 days to /tmp/
  • Find all of the MP4 files larger than 100MB
  • Find all of the documents with either “doc” or “docx” extensions anywhere in your home directory
  • In a directory of text files, find those containing the phrase “budget” which have not been touched in the last 30 days

(In these examples, I’m going to use the GNU version of find, which is standard on Linux machines and available for the Mac via Homebrew.  Note that if you use Homebrew on the Mac, then GNU “find” will be installed as “gfind” by default.  Use the –with-default-names option to “brew install” if you want to avoid this prefix.)

Note: There is a big difference between “find” and “locate”, which are often confused for one another:

  • “find” looks for files according to a number of criteria, and performs an action on the files matching those criteria. The search takes place when you run the program.
  • “locate” uses a database (typically created with the “updatedb” command) for filenames matching a pattern, and returns those filenames.

So if you know that you have a file named “important.txt” somewhere on your system, then you probably want to use “locate” — assuming, of course, you have been updating your filename database on a regular basis, typically via “cron”.

If you don’t remember the name of the file, but do remember that you modified it in the last 14 days, and that it contains the phrase “very important”, then you can use “find”.

For example, let’s say that I just want to find all of the files in the current directory and all of its subdirectories.  I can say:

find . -print

This means: Look at all files and directories in the current directory (.) and contained within its subdirectories, and then print them.

Now, in GNU find, both of these arguments are optional; you can just say

find

but I don’t recommend doing so, if only because it’s a bit ambiguous.  Moreover, the longer version emphasizes that “find” looks through a directory, filters through the results (although we don’t have any filters here), and then executes something (in this case, “print”).  The filters and actions are specified using command-line arguments; thus, we say “-print” if we want to print the name of the file.  Note that it’s not “–print” (i.e., with two “-” characters before “print”), which we might expect.

Also notice that the result includes all files, including directories and special Unix files (e.g., device files).  If you want to only look at files, then you can specify the “-type” filter.  For example, the following command shows all files (i.e., not subdirectories, symbolic links, or the like) under the current directory:

find . -type f -print    # find regular files

What if you want to find directories?  Then instead of using “-type f”, specify “-type d”:

find . -type d -print    # find directories

What if I only want to find files that match a certain pattern?  Then I can filter using the “-name” test and the shell’s standard characters.  For example, let’s say I want to find all of the files that end with “.txt”.  I can then say:

find . -type f -name "*.txt" -print

The above applies two tests —only regular files (i.e., not directories or the like) that match the pattern “*.txt” will match and be printed.

What if I want to find files that end with “.txt” or “.text”?  In such cases, it might be easiest to use the “or” option, written as “-o”, that combines two tests.  For example:

find . -type f \( -name '*.txt' -o -name '*.text' \) -print

The “-o” option (for logical “or” — and yes, there is also a “-a” option that’s logical “and”) allows either of the tests to succeed in order for it to declare success. However, the items on either side of “-o” must be inside of parentheses.  Since parentheses in the Unix shell have their own uses, we need to preface them with backslashes, to avoid clashes between the levels of parsing.  But wait — if the “\(” and “\)” are touching the arguments, then you’ll get hard-to-understand errors.  So make sure that “\(” and “\)” are surrounded by whitespace, if you want to avoid trouble.

Let’s say that I want to find old files on my system. Unix filesystems keep track of file ages in three different ways:

  • ctime (creation time) — when was the file first created?
  • mtime (modification time) — when was the file last modified?
  • atime (access time) — when was the file last accessed/read?

Let’s say that I want to find files in the current directory (and below) that were last accessed 7 days ago.  I can say:

find . -atime 7 -print

The “atime” is measured in 24-hour increments, starting with midnight of the current day. So “-atime 7” means, “last accessed 7*24 hours before midnight today.”

But wait a second — when was the last time you wanted to find files that were accessed exactly 7 days ago?  It’s far more likely that you want to find files that were last accessed less than 7 days ago. In order to do that, you need to preface the number with a “-” sign:

find . -atime -7 -print

By contrast, if you want to find all of those files that were accessed more than 7 days ago, you’ll want to preface the number with a “+” sign:

find . -atime +7 -print

And of course, if you want to find files that were accessed more than 2 days ago, but less than 9 days ago, you can say:

find . -atime +2 -atime -9 -print

Depending on your needs, it might well be better to use “mtime” rather than “atime”. I’m often interested in finding files I changed recently, rather than those I read recently. The same rules apply; here’s how I would find all of those files that I last modified more than two days ago but less than 9 days ago:

find . -mtime +2 -mtime -9 -print

Notice that I’m able to combine two rules (i.e., two “atime” or “mtime” rules) without using “-a” to join them together with a logical “and”.

Another useful thing to look for is big files. What files, for example, are bigger than 2 GB? I can say the following:

$ find . -size +2G -print

(I believe that this “-size” option only works this way on GNU find. Other versions might well require that you specify the file size in blocks. It has been a while since I used non-GNU versions.)

Look familiar? That’s right; the “+2” means “greater than”, and the “G” suffix means “GB”.  You can use a bunch of suffixes to the number, to indicate just how big the file should be.  As you might have guessed, you can say “-2M” to mean “less than 2 MB”, which on a modern computer is just about everything, to be honest.

We can also combine these, just as we did with “atime” and “mtime”: What files are bigger than 500 MB and smaller than 5 GB?

find . -size +500M -size -5G -print

We can combine these filters with others. What files are bigger than 500 MB and smaller than 5 GB, and were last accessed no more than 30 days ago?

find . -size +500M -size -5G -atime -30 -print

You can imagine using this sort of command to find large, unused files, such as old videos that you had forgotten are on your filesystem. Indeed, what if I’m only interested in finding MP4 files that are larger than 500 MB, smaller than 5 GB, and accessed in the last 30 days? I  can add another condition:

find . -size +500M -size -5G -atime -30 -name "*.mp4" -print

There are lots of other filters you can apply, and GNU find is especially full of them. There are alternative ways to specify dates. You can search for particular types of special files.  You can search for certain permissions. And so forth.  But the ones I’ve shown you are the ones I’ve used most often.

But the tests are only the first part of using “find”: Once you’ve gotten a list of files, what can you do with them?

So far, we’ve seen a single action, namely “-print”.  There are a few others that you might find useful.

The first is “-ls”, which runs the Unix “ls” command (with a few options that’ll show size and permissions):

find . -size +500M -size -5G -atime -30 -name "*.mp4" -ls

The above will not only print the filename (like “-print”), but will also show lots of other information about the files we’ve found. What if you want to write this list to a file? Then just use the “-fls” option, and give it a filename:

find . -size +500M -size -5G -atime -30 -name "*.mp4" -fls big-movies.txt

It’s pretty common to want to delete files. So you can use the “-delete” option to do so.  Warning: Running a program that automatically deletes files can be very dangerous. I almost never do this, because I’m always so worried that something will go wrong.  Here’s how I can remove all of the backup files in my Linux /var/log directory that are more than 21 days old:

find . -name '*.gz' -mtime +21 -delete -print

Note that you can have more than one action; in this case, my first action was “-delete”, and my second was “-print”.

It’s pretty common for me to want to search through an entire directory for a file that contains particular text. In other words, I want to run the “grep” utility on each file. I can do that by using the all-purpose “-exec” action.  The basic idea is as follows: You hand “-exec” a command, and the command is then ended with \; (yes, backlash + semicolon). In between, you can write whatever Unix command you want, including options. The current filename can be put into the command with the special formula {} (i.e., empty curly braces).  For example, I can say:

find . -name "*.txt" -exec grep Reuven {} \;

The above will show all lines from all files containing my name. (Of course, a regular expression can be far more complex than this; if you aren’t familiar with grep or regexps, you can take my free “regular expressions crash course.”)    But the output only shows the lines we would get from “grep”, which (by default) doens’t show the name of the current file if you’re running it one file at a time. For this reason, we would be wise to include the “-H” option:

$ find . -name "*.txt" -exec grep -H Reuven {} \;

While “grep” is the most common command that I run via “-exec”, you can use any program you want, including programs that you’ve written.  In this way, you can really make “find” work for you, and execute custom code for each file that fits a criteria. Combine “find” with “cron”, and you have an easy way to identify files that need your attention, or that should be removed, or that you’ve been looking for and otherwise cannot find.

If there’s one drawback to “find”, it’s that the search happens in real time. There is no database through which it runs. Which means that if you’re going through a very large directory structure, you might discover that “find” takes quite a while.

And that’s about it! If you’re like me, then you’ll find (no pun intended) that these use cases cover most of what you need with the “find” utility. The documentation is extremely long, but only because “find” has many other tests and actions that you can mix and match in a variety of ways.

 

 

 

Python parentheses primer

If you have children, then you probably remember them learning to walk, and then to read. If you’re like me, you were probably amazed by how long it took to do things that we don’t even think about. Things that we take for granted in our day-to-day lives, and which seem so obvious to us, take a long time to master.

You can often see and experience this when you compare how you learn a language as a native speaker, from how you learn as a second language. I grew up speaking English, and never learned all sorts of rules that my non-native-speaking friends learned in school. Similarly, I learned all sorts of rules for Hebrew grammar that my children never learned in school.

It’s thus super easy to take things for granted when you’re an expert. Indeed, that’s almost the definition of an expert — someone who understands a subject so well, that for them things are obvious.

To many developers, and especially Python developers, it’s obvious not only that there are different types of parentheses in Python, but that each type has multiple uses, and do completely different things. But to newcomers, it’s far from obvious when to use round parentheses, square brackets, and/or curly braces.

I’ve thus tried to summarize each of these types of parentheses, when we use them, and where you might get a surprise as a result.  If you’re new to Python, then I hope that this will help to give you a clearer picture of what is used when.

I should also note that the large number of parentheses that we use in Python means that using an editor that colorizes both matching and mismatched parentheses can really help. On no small number of occasions, I’ve been able to find bugs quickly thanks to the paren-coloring system in Emacs.

Regular parentheses — ()

Callables (functions and classes)

Perhaps the most obvious use for parentheses in Python is for calling functions and creating new objects. For example:

x = len('abcd')

i = int('12345')

It’s worth considering what happens if you don’t use parentheses. For example, I see the following code all the time in my courses:

d = {'a':1, 'b':2, 'c':3}
for key, value in d.items:
    print(f"{key}: {value}")

When you try to run this code, you can an error message that is true, but whose meaning isn’t completely obvious:

TypeError: 'builtin_function_or_method' object is not iterable

Huh?  What the heck does this mean?

It’s worth remembering how “for” loops work in Python:

  • “for” turns to the object at the end of the line, and asks whether it’s iterable
  • if so, then “for” asks the object for its next value
  • whenever the object says, “no more!” the loop stops

In this case, “for” turns to the method “d.items” and asks if it’s iterable. Note that we’re not asking whether the output from “d.items” is iterable, but rather whether the method itself is iterable.

That’s because there’s a world of difference between “d.items” and “d.items()”. The first returns the method. The second returns an iterable sequence of name-value pairs from the dictionary “d”.

The solution to this non-working code is thus to add parentheses:

d = {'a':1, 'b':2, 'c':3}
for key, value in d.items():
    print(f"{key}: {value}")

Once we do that, we get the desired result.

I should note that we also need to be careful in the other direction: Sometimes, we want to pass a function as an argument, and not execute it. One example is when we’re in the Jupyter notebook (or other interactive Python environment) and ask for help on a function or method:

help(len)

help(str.upper)

In both of the above cases, we don’t want to get help on the output of those functions; rather, we want to get help on the functions themselves.

Prioritizing operations

In elementary school, you probably learned the basic order of arithmetic operations — that first we multiply and divide, and only after do we add and subtract.

Python clearly went to elementary school as well, because it follows this order.  For example:

In [1]: 2 + 3 * 4
Out[1]: 14

We can change the priority by using round parentheses:

In [2]: (2 + 3) * 4
Out[2]: 20

Experienced developers often forget that we can use parentheses in this way, as well — but this is, in many ways, the most obvious and natural way for them to be used by new developers.


Creating tuples

Of course, we can also use () to create tuples. For example:

In [8]: t = (10,20,30)

In [9]: type(t)
Out[9]: tuple

What many beginning Python developers don’t know is that you actually don’t need the parentheses to create the tuple:

In [6]: t = 10,20,30

In [7]: type(t)
Out[7]: tuple

Which means that when you return multiple values from a function, you’re actually returning a tuple:

In [3]: def foo():
...: return 10, 20, 30
...:
...:

In [4]: x = foo()

In [5]: x
Out[5]: (10, 20, 30)

What surprises many newcomers to Python is the following:

In [10]: t = (10)

In [11]: type(t)
Out[11]: int

“Wait,” they say, “I used parentheses. Shouldn’t t be a tuple?”

No, t is an integer.  When Python’s parser sees something like “t = (10)”, it can’t know that we’re talking about a tuple.  Otherwise, it would also have to parse “t = (8+2)” as a tuple, which we clearly don’t want to happen, assuming that we want to use parentheses for prioritizing operations (see above).  And so, if you want to define a one-element tuple, you must use a comma:

In [12]: t = (10,)

In [13]: type(t)
Out[13]: tuple

Generator expressions

Finally, we can use round parentheses to create “generators,” using what are known as “generator expressions.” These are a somewhat advanced topic, requiring knowledge of both comprehensions and iterators. But they’re a really useful tool, allowing us to describe a sequence of data without actually creating each element of that sequence until it’s needed.

For example, if I say:

In [17]: g = (one_number * one_number
...: for one_number in range(10))

The above code defines “g” to be a generator, the result of executing our generator expression. “g” is than an iterable, an object that can be placed inside of a “for” loop or a similar context.   The fact that it’s a generator means that we can have a potentially infinite sequence of data without actually needing to install an infinite amount of RAM on our computers; so long as we can retrieve items from our iterable one at a time, we’re set.

The above generator “g” doesn’t actually return 10 numbers. Rather, it returns one number at a time. We can retrieve them all at once by wrapping it in a call to “list”:

In [18]: list(g) 
Out[18]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

But the whole point of a generator is that you don’t want to do that. Rather, you will get each element, one at a time, and thus reduce memory use.

Something funny happens with round parentheses when they’re used on a generator expression in a function call.  Let’s say I want to get a string containing the elements of a list of integers:

In [19]: mylist = [10, 20, 30]

In [20]: '*'.join(mylist)

This fails, because the elements of “mylist” are integers.  We can use a generator expression to turn each integer into a string:

In [21]: '*'.join((str(x)
...: for x in mylist))
Out[21]: '10*20*30'

Notice the double parentheses here; the outer ones are for the call to str.join, and the inner ones are for the generator expression. Well, it turns out that we can remove the inner set:

In [22]: '*'.join(str(x)
...: for x in mylist)
Out[22]: '10*20*30'

So the next time you see a call to a function, and a comprehension-looking thing inside of the parentheses, you’ll know that it’s a generator expression, rather than an error.

Cheating Python’s indentation rules

Python is famous for its use of indentation to mark off blocks of code, rather than curly braces, begin/end, or the like. In my experience, using indentation has numerous advantages, but tends to shock people who are new to the language, and who are somewhat offended that the language would dictate how and when to indent code.

There are, however, a few ways to cheat (at least a little) when it comes to these indentation rules. For example, let’s say I have a dictionary representing a person, and I want to know if the letter ‘e’ is in any of the values.  I can do something like this:

In [39]: person = {'first':'Reuven', 'last':'Lerner', 'email':'reuven@lerner.co.il'}

In [40]: if 'e' in person['first'] or 'e' in person['last'] or 'e' in person['email']:
...: print("Found it!")
...:
Found it!

That “if” line works, but it’s far too long to be reasonably readable.  What I’d love to do is this:

In [40]: if 'e' in person['first'] or 
            'e' in person['last'] or 
            'e' in person['email']:
...: print("Found it!")

The problem is that the above code won’t work; Python will get to the end of the first “or” and complain that it reached the end of the line (EOL) without a complete statement.

The solution is to use parentheses. That’s because once you’ve opened parentheses, Python is much more forgiving and flexible regarding indentation. For example, I can write:

In [41]: if ('e' in person['first'] or
...:         'e' in person['last'] or
...:         'e' in person['email']):
...:         print("Found it!")
...:
...:
Found it!

Our code is now (in my mind) far more readable, thanks to the otherwise useless parentheses that I’ve added.

By the way, this is true for all parentheses. So if I want to define my dict on more than one line, I can say:

In [42]: person = {'first':'Reuven',
...:               'last':'Lerner',
...:               'email':'reuven@lerner.co.il'}

Python sees the opening { and is forgiving until it finds the matching }. In the same way, we can open a list comprehension on one line and close it on another.  For years, I’ve written my list comprehensions on more than one line, in the belief that they’re easier to read, write, and understand. For example:

[one_number * one_number

for one_number in range(10)]

Square brackets — []

Creating lists

We can create lists with square brackets, as follows:

mylist = [ ]  # empty list

mylist = [10, 20, 30]  # list with three items

Note that according to PEP 8, you should write an empty list as [], without any space between the brackets.  I’ve found that with certain fonts, the two brackets end up looking like a square, and are hard for people in my courses to read and understand.  So I always put a space between the brackets when creating an empty list.

(And yes, I’m that rebellious in real life, not just when programming.)

We can use square brackets not just to create lists with explicitly named elements, but also to create lists via list comprehensions:

In [16]: [one_number * one_number
...: for one_number in range(10)]
...:
Out[16]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The square brackets tell Python that this is a list comprehension, producing a list.  If you use curly braces, you’ll get either a set or a dict back, and if you use regular parentheses, you’ll get a generator expression (see above).

Requesting individual items

Many people are surprised to discover that in Python, we always use square brackets to retrieve from a sequence or dictionary:

In [23]: mylist = [10, 20,30]

In [24]: t = (10, 20, 30)

In [25]: d = {'a':1, 'b':2, 'c':3}

In [26]: mylist[0]
Out[26]: 10

In [27]: t[1]
Out[27]: 20

In [28]: d['c']
Out[28]: 3

Why don’t we use regular parentheses with tuples and curly braces with dictionaries, when we want to retrieve an element? The simple answer is that square brackets, when used in this way, invoke a method — the __getitem__ method.

That’s right, there’s no difference between these two lines of code:

In [29]: d['c']
Out[29]: 3

In [30]: d.__getitem__('c')
Out[30]: 3

This means that if you define a new class, and you want instances of this class to be able to use square brackets, you just need to define __getitem__.  For example:

In [31]: class Foo(object):
...:         def __init__(self, x):
...:             self.x = x
...:         def __getitem__(self, index):
...:             return self.x[index]
...:

In [32]: f = Foo('abcd')

In [33]: f[2]
Out[33]: 'c'

See?  When we say f[2], that’s translated into f.__getitem__(2), which then returns “self.x[index]”.

The fact that square brackets are so generalized in this way means that Python can take advantage of them, even on user-created objects.

Requesting slices

You might also be familiar with slices. Slices are similar to individual indexes, except that they describe a range of indexes. For example:

In [46]: import string

In [47]: string.ascii_lowercase[10:20]
Out[47]: 'klmnopqrst'

In [48]: string.ascii_lowercase[10:20:3]
Out[48]: 'knqt'

As you can see, slices are either of the form [start:end+1] or [start:end+1:stepsize].  (If you don’t specify the stepsize, then it defaults to 1.)

Here’s a little tidbit that took me a long time to discover: You can get an IndexError exception if you ask for a single index beyond the boundaries of a sequence. But slices don’t have in such problems; they’ll just stop at the start or end of your string:

In [50]: string.ascii_lowercase[500]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-50-fad7a1a4ec3e> in <module>()
----> 1 string.ascii_lowercase[500]

IndexError: string index out of range


In [51]: string.ascii_lowercase[:500]
Out[51]: 'abcdefghijklmnopqrstuvwxyz'

How do the square brackets distinguish between an individual index and a slice? The answer: They don’t. In both cases, the __getitem__ method is being invoked. It’s up to __getitem__ to check to see what kind of value it got for the “index” parameter.

But wait: If we pass an integer or string (or even a tuple) to square brackets, we know what type will be passed along. What type is passed to our method if we use a slice?

In [55]: class Foo(object):
...:         def __getitem__(self, index):
...:             print(f"index = {index}, type(index) = {type(index)}")
...:


In [56]: f = Foo()

In [57]: f[100]
index = 100, type(index) = <class 'int'>

In [58]: f[5:100]
index = slice(5, 100, None), type(index) = <class 'slice'>

In [59]: f[5:100:3]
index = slice(5, 100, 3), type(index) = <class 'slice'>

Notice that in the first case, as expected, we get an integer. But in the second and third cases, we get a slice object. We can create these manually, if we want; “slice” is in the “bulitin” namespace, along with str, int, dict, and other favorites. And as you can see from its printed representation, we can call “slice” much as we do “range”, with start, stop, and step-size arguments.  I haven’t often needed or wanted to create slice objects, but you certainly could:

In [60]: s = slice(5,20,4)

In [61]: string.ascii_lowercase[s]
Out[61]: 'fjnr'

In [62]: string.ascii_uppercase[s]
Out[62]: 'FJNR'

In [63]: string.ascii_letters[s]
Out[63]: 'fjnr'

In [64]: string.punctuation[s]
Out[64]: '&*.<'

Curly braces — {}

Creating dicts

The classic way to create dictionaries (dicts) in Python is with curly braces. You can create an empty dict with an empty pair of curly braces:

In [65]: d = {}

In [66]: len(d)
Out[66]: 0

Or you can pre-populate a dict with some key-value pairs:

In [67]: d = {‘a’:1, ‘b’:2, ‘c’:3}

In [68]: len(d)
Out[68]: 3

You can, of course, create dicts in a few other ways. In particular, you can use the “dict” class to create a dictionary based on a sequence of two-element sequences:

In [69]: dict(['ab', 'cd', 'ef'])
Out[69]: {'a': 'b', 'c': 'd', 'e': 'f'}

In [70]: d = dict([('a', 1), ('b', 2), ('c', 3)])

In [71]: d
Out[71]: {'a': 1, 'b': 2, 'c': 3}

In [72]: d = dict(['ab', 'cd', 'ef'])

In [73]: d
Out[73]: {'a': 'b', 'c': 'd', 'e': 'f'}

But unless you need to create a dict programmatically, I’d say that {} is the best and clearest way to go. I remember reading someone’s blog post a few years ago (which I cannot find right now) in which it was found that {} is faster than calling “dict” — which makes sense, since {} is part of Python’s syntax, and doesn’t require a function call.

Of course, {} can also be used to create a dictionary via a dict comprehension:

In [74]: { one_number : one_number*one_number
...:       for one_number in range(10) }
...:
Out[74]: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

In the above code, we create a dict of the number 0-9 (keys) and their values to the second power (values).

Remember that a dict comprehension creates one dictionary, rather than a list containing many dictionaries.

Create sets

I’ve become quite the fan of Python’s sets. You can think of sets in a technical sense, namely that they are mutable, and contain unique, hashable values. As of Python 3.6, they are stored in insertion order.

But really, it’s just easiest to think of sets as dictionaries without any values. (Yes, this means that sets are nothing more than immoral dictionaries.)  Whatever applies to dict keys also applies to the elements of a set.

We can create a set with curly braces:

In [75]: s = {10,20,30}

In [76]: type(s)
Out[76]: set

As you can see, the fact that there is no colon (:) between the name-value pairs allows Python to parse this code correctly, defining ‘s” to be a set, rather than a dict.

Nearly every time I teach about sets, someone tries to create an empty set and add to it, using set.add:

In [77]: s = {}

In [78]: s.add(10)
—————————————————————————
AttributeError Traceback (most recent call last)
<ipython-input-78-721f80ddfefc> in <module>()
—-> 1 s.add(10)

AttributeError: ‘dict’ object has no attribute ‘add’

The error indicates that “s” is a dict, and that dicts lack the “add” method. Which is fine, but didn’t I define “s” to be a set?

Not really: Dicts came first, and thus {} is an empty dict, not an empty set. If you want to create an empty set, you’ll need to use the “set” class:

s = set()

s.add(10)

This works just fine, but is a bit confusing to people starting off in Python.

I often use sets to remove duplicate entries from a list. I can do this with the “set” class (callable), but I can also use the “* argument” syntax when calling a function:

In [79]: mylist = [10, 20, 30, 10, 20, 30, 40]

In [80]: s = {*mylist}

In [81]: s
Out[81]: {10, 20, 30, 40}

Note that there’s a bit difference between {*mylist} (which creates a set from the elements of mylist) and {mylist} which will try to create a set with one element, the list “mylist”, and will fail because lists are unhashable.

Just as we have list comprehensions and dict comprehensions, we also have set comprehensions, which means that we can also say:

In [84]: mylist = [10, 20, 30, 10, 20, 30, 40]

In [85]: {one_number
...: for one_number in mylist}
...:
Out[85]: {10, 20, 30, 40}

str.format

Another place where we can use curly braces is in string formatting. Whereas Python developers used to use the printf-style “%” operator to create new strings, the modern way to do so (until f-strings, see below) was the str.format method. It worked like this:

In [86]: name = ‘Reuven’

In [87]: “Hello, {0}”.format(name)
Out[87]: ‘Hello, Reuven’

Notice that str.format returns a new string; it doesn’t technically have anything to do with “print”, although they are often used together. You can assign the resulting string to a new variable, write it to a file, or (of course) print it to the screen.

str.format looks inside of the string, searching for curly braces with a number inside of them. It then grabs the argument with that index, and interpolates it into the resulting string.  For example:

In [88]: 'First is {0}, then is {1}, finally is {2}'.format(10, 20, 30)
Out[88]: 'First is 10, then is 20, finally is 30'

You can, of course, mix things up:

In [89]: 'First is {0}, finally is {2}, then is {1}'.format(10, 20, 30)
Out[89]: 'First is 10, finally is 30, then is 20'

You can also repeat values:

In [90]: 'First is {0}. Really, first is {0}. Then is {1}'.format(10, 20, 30)
Out[90]: 'First is 10. Really, first is 10. Then is 20'

If you’ll be using each argument once and in order, you can even remove the numbers — although I’ve been told that this makes the code hard to read.  And besides, it means you cannot repeat values, which is sometimes annoying:

In [91]: 'First is {}, then is {}, finally is {}'.format(10, 20, 30)
Out[91]: 'First is 10, then is 20, finally is 30'

You cannot switch from automatic to manual numbering in curly braces (or back):

In [92]: 'First is {0}, then is {}, finally is {}'.format(10, 20, 30)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-92-f00f3adf93eb> in <module>()
----> 1 'First is {0}, then is {}, finally is {}'.format(10, 20, 30)

ValueError: cannot switch from manual field specification to automatic field numbering

str.format also lets you use names instead of values, by passing keyword arguments (i.e., name-value pairs in the format of key=value):

In [93]: 'First is {x}, then is {y}, finally is {z}'.format(x=10, y=20, z=30)
Out[93]: 'First is 10, then is 20, finally is 30'

You can mix positional and keyword arguments, but I beg that you not do that:

In [94]: 'First is {0}, then is {y}, finally is {z}'.format(10, y=20, z=30)
Out[94]: 'First is 10, then is 20, finally is 30'

f-strings

As of Python 3.6, we have an even more modern way to perform string interpolation, using “f-strings”. Putting “f” before the opening quotes allows us to use curly braces to interpolate just about any Python expression we want — from variable names to operations to function/method calls — inside of a string:

In [99]: name = 'Reuven'

In [100]: f"Hello, {name}"
Out[100]: 'Hello, Reuven'

In [101]: f"Hello, {name.upper()}"
Out[101]: 'Hello, REUVEN'

In [102]: f"Hello, {name.split('e')}"
Out[102]: "Hello, ['R', 'uv', 'n']"

I love f-strings, and have started to use them in all of my code.  bash, Perl, Ruby, and PHP have had this capability for years; I’m delighted to (finally) have it in Python, too!

from __future__ import braces

Do you sometimes wish that you could use curly braces instead of indentation in Python? Yeah, you’re not alone.  Fortunately, the __future__ module is Python’s way of letting you try new features before they’re completely baked into your current Python version. For example, if you’re still using Python 2.7, you can say

from __future__ import division

and division will always return a float, rather than an integer, even if the two operands are integers.

So go ahead, try:

from __future__ import braces

(Yes, this is part of Python.  And no, don’t expect to be able to use curly braces instead of indentation any time soon.)

Did you enjoy these explanations?  Then join my free, weekly “Better developers” newsletter about Python and software development, currently read by more than 7,000 developers each week. You can sign up at http://lerner.co.il/newsletter .

20% off all books + courses during PyCon

Hi!  I’m in Cleveland, Ohio, for PyCon 2018.  I’ve already heard back from others who are here for the conference, and will be coordinating with all of you privately to meet up in person.  I’m excited to meet (and learn from) lots of other Python developers from around the world!

To celebrate the conference, I’m offering 20% off of all my books and courses.  Just use the coupon code PYCON2018 at checkout from my online store, or click on any of the following links:

This coupon code will only work through the end of the conference, on Sunday night.

PyCon!

I’ll be attending PyCon 2018 in Cleveland, Ohio later this week — from Friday morning through the first day of sprints on Monday.

I’m hoping to meet lots of people from the worldwide Python community — as well as readers of Linux Journal, listeners to the “Freelancers Show” podcast, people who have attended my training courses over the years, and subscribers to my “Better developers” newsletter.

I’m planning to host a get-together, hopefully on Friday; I’ll post to my blog and Twitter account once I know when and where.  I’m already planning to host an “open space” for Python trainers on Friday.

It’s my first PyCon, so I’m both excited and nervous about how overwhelming it’ll likely be.  But I’m mainly interested in meeting people — so don’t be shy, and please do make an effort to find me!  I’ll be delighted to meet you and chat about all things Python (and not).  You can contact me on Twitter as @reuvenmlerner and WeChat as ReuvenLerner.

See you there!

Master Python objects (and understand how they work)

I’ve been teaching Python for many years now, and it never ceases to amaze me to discover just how many people are able to use Python without ever having really learned it.

In particular, there are lots of people writing and using Python objects who assume that classes and instances in Python work just like those in Java and C#, but with less syntax.

For the most part, they can make those assumptions and not suffer too much.  But then, if something goes wrong… well, they’re at a loss to explain how and why things didn’t work.

And then little things nag them: Why does “self” need to be the first parameter in a method? And when we set a variable inside of a class definition, are we setting the default for the instances? (Answer: No, because you’re setting a class attribute.)  How does inheritance really work?  And how does super() work in Python?

Knowing the answers to these and other questions are the key to becoming a more fluent Python developer. That’s important, because fluency will allow you to do more in less time, to write more expressive and powerful code, and to write code that can be changed and maintained more easily.

I’ve taken my lessons and exercises from my intro Python class and turned them into a new online class, “Object oriented Python.”  The course assumes that you’re familiar with basic Python data structures and functions, but otherwise starts with the basics of object-oriented programming, through inheritance and operator overloading.

In more than 4 hours of video lectures, and with 15 exercises, I walk you through how Python’s object system works, and how you can make it work for you.

This course is for you if:

  • You’re a Python developer unfamiliar with objects
  • You’ve been cutting, pasting, and editing Python code for years from Stack Overflow and blogs, without really understanding how objects work
  • You’re new to Python, but experienced with objects in another language, such as C#, C++, or Java

This is the same material I teach at companies like Apple, Cisco, IBM, Intel, PayPal, Western Digital, and VMWare.  I’m booked solid 8 months in advance with training at these companies  — but you can get these lessons, and improve your Python coding, in the coming minutes.

The course includes:

  • More than four hours of lecture/screencasts
  • 15 exercises, including extensive walk-throughs of the solutions
  • PDFs of the slides I use when teaching about objects
  • The Jupyter notebook I used when live-coding the course
  • Any and all updates I make to this course in the future, via my “forever free” policy

Also: You get access to my online office hours, given every 3-4 weeks, online, to students in all of my classes.  Come and ask your Python questions!  Or submit them, and I’ll answer them for you in video.

Not sure?  Go to the course page; a number of the lessons are available for free preview.

Click here to order “Object-oriented Python”

This course, like all of my others, comes with a 100% money-back guarantee.  I also offer discounts for students and pensioners, group discounts, and parity pricing for people living in any country outside the 30 wealthiest in the world.  Just e-mail me to get the appropriate coupon codes.

If you’re a Python developer and have been unsure about how to use Python objects, this is your chance.  Order “Object oriented Python,” and become a more fluent Python developer today.

Still not sure?  Send me e-mail, at reuven@lerner.co.il, and let me know.  I’ll be delighted to answer your questions.

Freelancers Show #300 — live!

As you may know, I’ve been a panelist on the Freelancers Show podcast for a few years.  It’s one of the high points of my week to chat with my co-panelists, discuss various aspects of freelancing/consulting, and to interview interesting people who can help us improve our freelancing careers.

I’ve been consulting since 1995, but I think that my business has improved greatly thanks to the advice I’ve gotten from the show, and from the discussions we’ve had.

This coming Tuesday, we’ll be recording our 300th show. To celebrate, we’re making it a live webinar, in which we’ll take questions from … well, anyone who wants to ask!

(The show will also be recorded and available as usual.)

So, if you’re a freelancer/consultant, or if you’re thinking about taking the plunge, or just curious, or just want to see what I look like — well, join us!  You can sign up at the Crowdcast site here:

https://www.crowdcast.io/e/tfs300

If you can’t make it, then we would still love to get your questions.  Write them on the Crowdcast page, or leave them as comments here.  I promise we’ll try to answer them all!