Another free regexp Q&A webinar!

The last Webinar I did, with Q&A about regular expressions, was great fun — so much, that I’ve decided to do another one.

So, if you have questions (big or little) about regular expressions in Python, Ruby, JavaScript, and/or PostgreSQL, sign up for this free Webinar on Monday, April 11th:

If you already have questions, you can leave them in advance using the Crowdcast Q&A system.  (Or just surprise me during the Webinar itself.)

I look forward to seeing you there!

Free Webinar: Regexp Q&A

practice-makes-regexp-coverTo celebrate the publication of my new ebook, Practice Makes Regexp, my upcoming Webinar (on March 22nd) is all about regular expressions (“regexps”) in Python, Ruby, JavaScript, and PostgreSQL, as well as the Unix “grep” command.

Unlike previous Webinars, in which I gave a presentation and then took Q&A, this time will be all about Q&A: I want you to come with your questions about regular expressions, or even projects that you’re wondering how to attack using them.

I’ll do my best to answer your questions, whether they be about regexp syntax, differences between implementations and languages, how to debug hairy regexps, and even when they might not be the most appropriate tool for the job.

Please join me on March 22nd by signing up here:

And when you sign up, please don’t forget to ask a question or two!  (You can do that it advance — and doing so will really help me to prepare detailed answers.)

I look forward to your questions on the 22nd!


Yes, you can master regular expressions!

Announcing: My new book, “Practice Makes Regexp,” with 50 exercises meant to help you learn and master regular expressions. With explanations and code in Python, Ruby, JavaScript, and PostgreSQL.

I spend most of my time nowadays going to high-tech companies and training programmers in new languages and techniques. Actually, many of the things I teach them aren’t really new; rather, they’re new to the participants in my training. Python has been around for 25 years, but for my students, it’s new, and even a bit exciting.

I tell participants that my job is to add tools to their programming toolbox, so that if they encounter a new problem, they’ll have new and more appropriate or elegant ways to attack and solve it. Moreover, I tell them, once you are intimately familiar with a tool or technique, you’ll suddenly discover opportunities to use it.
Earlier this week, I was speaking with one of my consulting clients, who was worried that some potentially sensitive information had been stored in their Web application’s logfiles — and they weren’t sure if they had a good way to search through the logs.


I suggested the first solution that came to mind: Regular expressions.

Regular expressions are a lifesaver for anyone who works with text.  We can use them to search for patterns in files, in network data, and in databases. We can use them to search and replace.  To handle protocols that have changed ever so slightly from version to version. To handle human input, which is always messier than what we get from other computers.

Regular expressions are one of the most critical tools I have in my programming toolbox.  I use them at least a few times each day, and sometimes even dozens of times in a given day.

So, why don’t all developers know and use regular expressions? Quite simply, because the learning curve is so steep. Regexps, as they’re also known, are terse and cryptic. Changing one character can have a profound impact on what text a regexp matches, as well as its performance. Knowing which character to insert where, and how to build up your regexps, is a skill that takes time to learn and hone.

Many developers say, “If I have a problem that involves regular expressions, I’ll just go to Stack Overflow, where my problem has likely been addressed already.” And in many cases, they’re right.

But by that logic, I shouldn’t learn any French before I go to France, because I can always use a phrasebook.  Sure, I could work that way — but it’s far less efficient, and I’ll miss many opportunities that would come my way if I knew French.

Moreover, relying on Stack Overflow means that you never get a full picture of what you can really do with regular expressions. You get specific answers, but you don’t have a fully formed mental model of what they are and how they work.

But wait, it gets worse: If you’re under the gun, trying to get something done for your manager or a big client, you can’t spend time searching through Stack Overflow. You need to bring your best game to the table, demonstrating fluency in regular expressions.  Without that fluency, you’ll take longer to solve the problem — and possibly, not manage to solve it at all.

Believe me, I understand — my first attempt at learning regular expressions was a complete failure. I read about them in the Emacs manual, and thought to myself, “What could this seemingly random collection of characters really do for me?”  I ignored them for a few more years, until I started to program in Perl — a language that more or less expected you to use regexps.

So I spent some time learning regexp syntax.  The more I used them,  the more opportunities I found to use them.  And the more I found that they made my life easier, better, and more convenient.  I was able to solve problems that others couldn’t — or even if they could, they took much longer than I did.  Suddenly, processing text was a breeze.

I was so excited by what I had learned that when I started to teach advanced programming courses, I added regexps to the syllabus.  I figured that I could figure out a way to make regexps understandable in an hour or two.

But boy, was I wrong: If there’s something that’s hard for programmers to learn, it’s regular expressions.  I’ve thus created a two-day course for people who want to learn regular expressions.  I not only introduce the syntax, but I have them practice, practice, and practice some more.  I give them situations and tasks, and their job is to come up with a regexp that will solve the problem I’ve given them.  We discuss different solutions, and the way that different languages might go about solving the problem.

After lots of practice, my students not only know regexp syntax — they know when to use it, and how to use it.  They’re more efficient and valuable employees. They become the person to whom people can turn with tricky text-processing problems.  And when the boss is pressuring them for a

ImageAnd so, I’m delighted to announce the launch of my second ebook, “Practice Makes Regexp.”  This book contains 50 tasks for you to accomplish using regular expressions.  Once you have solved the problem, I present the solution, walking you through the general approach that we would use in regexps, and then with greater depth (and code) to solve the problem in Python, Ruby, JavaScript, and PostgreSQL.  My assumption in the book is that you have already learned regexps elsewhere, but that you’re not quite sure when to use them, how to apply them, and when each metacharacter is most appropriate.

After you go through all 50 exercises, I’m sure that you’ll be a master of regular expressions.  It’ll be tough going, but the point is to sweat a bit working on the exercises, so that you can worry a lot less when you’re at work. I call this “controlled frustration” — better to get frustrated working on exercises, than when the boss is demanding that you get something done right away.

Right now, the book is more than 150 pages long, with four complete chapters (including 17 exercises).  Within two weeks, the remaining 33 exercises will be done.  And then I’ll start work on 50 screencasts, one for each of the exercises, in which I walk you through solutions in each of Python, Ruby, JavaScript, and PostgreSQL.  If my previous ebook is any guide, there will be about 5 hours (!) of screencasts when I’m all done.

If you have always shied away from learning regular expressions, or want to harness their power, Practice Makes Regexp is what you have been looking for.  It’s not a tutorial, but it will help you to understand and internalize regexps, helping you to master a technology that frustrates many people.

To celebrate this launch, I’m offering a discount of 10%.  Just use the “regexplaunch” offer code, and take 10% off of any of the packages — the book, the developer package (which includes the solutions in separate program files, as well as the 300+ slides from the two-day regexp course I give at Fortune 100 companies), or the consultant package (which includes the screencasts, as well as what’s in the developer package).

I’m very excited by this book.  I think that it’ll really help a lot of people to understand and use regular expressions.  And I hope that you’ll find it makes you a more valuable programmer, with an especially useful tool in your toolbox.

All 50 “Practice Makes Python” screencasts are complete!

I’m delighted to announce that I’ve completed a screencast for every single one of the 50 exercises in my ebook, “Practice Makes Python.”  This is more than 300 minutes (5 hours!) of Python instruction, helping you to become a more expert Python programmer.

Each screencast consists of me solving one of the exercises in real-time, describing what I’m doing and what I’m doing it.   They range in length from 4 to 10 minutes.  The idea is that you’ll do the exercise, and then watch my video to compare your answer (and approach) with mine.

If you enjoy my Webinars or in-person courses, then I think you’ll also enjoy these videos.

The screencasts, available with the two higher-tier “Practice Makes Python” packages,  can be streamed in HD video quality, or can be downloaded (DRM-free) to your computer for more convenient viewing.

To celebrate finally finishing these videos, I’m offering the two higher-end packages at 20% off for the coming week, until February 18th. Just use the offer code “videodone” with either the “consultant” or “developer” package, and enjoy a huge amount of Python video.

You can explore these packages at the “Practice Makes Python” Web site.

Not interested in my book, but still want to improve your Python skills?  You can always take one of my two free e-mail courses, on Python variable scoping and working with files. Those are and will remain free forever. And of course, there’s my free Webinar on Python and data science next week.

Free Webinar: Pandas and Matplotlib

It’s time for another free hour-long Webinar! This time, I’ll be talking about the increasingly popular tools for data science in Python, namely Pandas and Matplotlib. How can you read data into Pandas, manipulate it, and then plot it? I’ll show you a large number of examples and use cases, and we’ll also have lots of time for Q&A. Previous Webinars have been lots of fun, and I expect that this one will be, too!

Register (for free) to participate here:

If you aren’t sure whether you’ll be able to make it, you can still sign up; I’ll be sending information, and a URL with the recording afterwards, soon after the Webinar concludes.

I look forward to seeing you there; if you have any questions, please feel free to contact me at or on Twitter as @reuvenmlerner.

Reminder: Free Webinar on data science in Python

There’s still time to register for my free, one-hour Webinar on data science in Python, which will be tomorrow (Tuesday).  There’s clearly too much material for me to give just one Webinar, so this will be the first in a series that I’ll be offering over the coming months.  But if you’re interested in hearing how Python fits into the world of data science, or how you can use free, open-source tools to do lots of great analysis work, then I invite you to join me for what should be a fun time:

There will be plenty of live-coding demos, bad jokes, and chances for you to ask questions. And it should be lots of fun, besides!

Free Webinar: Data science with Python on December 8th

It’s time for me to do another free one-hour Webinar, this time about data science with Python. It’ll be on December 8th, at 9 p.m. GMT.

Data science is all the rage, and rightly so — and Python is one of the best-known and best-equipped languages in which to do it.  In this Webinar, I’ll review some of the most popular packages used for analysis, including NumPy, SciPy, Pandas, and matplotlib, and will show how they can be used to answer questions that we have about our data.

As always, I hope that there will be lots of questions — and if we’re lucky, I’ll be able to provide answers, too!  Please come prepared for a highly interactive and fun event.  I’ll e-mail all registered participants about an hour before the Webinar with links for participating.

You can register at Eventbrite:

I look forward to seeing you there!  If you have any questions, please let me know via e-mail at or on Twitter as @reuvenmlerner.

Python’s objects and classes — a visual guide

Python developers love to say that “everything is an object.” And indeed, when I teach Python classes, I say this several times, and many people nod in agreement, assuming that I’m merely repeating something they’ve heard before.  After all, people often say that everything in Java is an object (except for the things that aren’t), and that everything in .NET is an object.

But when we say that everything in Python is an object, we really mean everything, including — much to the surprise of my students — classes. This makes enormous sense, and it makes the entire object system easier to understand. And yet, it is still hard to put things in perspective.

In this blog post, I want to walk through some of the connections that we have among objects in Python, in the hopes that it’ll help to cement some of the ideas that stem from this “everything is an object” idea. It’ll also demonstrate some of the fun that happens when you’re creating an object hierarchy, and how things can get a bit weird.

Let’s start with a simple class (MyClass), and a simple instance of that class (m). In Python, we would write:

class MyClass(object):

m = MyClass()

In Python 3, we don’t need to explicitly say that MyClass inherits from object, since that’s true for all classes. But in Python 2, we have to inherit from object; if we don’t, then we get old-style classes, which we really don’t want.

Let’s see how this looks visually, with the arrow indicating that m is an instance of MyClass:

Python objectsSo far, that’s not very exciting. But let’s remember that everything in Python is an object. Thus, it’s true that m is an instance of MyClass; we can learn this by using the type function:

>>> type(m)

What happens if we ask MyClass about its type?

>>> type(MyClass)

Yes, MyClass is an instance of type — just as str, int, bool, and other Python classes are instances of type. Our diagram has just gotten a bit more complex:

Python objects 2

In the above diagram, we see that m is an instance of MyClass, and MyClass is an instance of type.

One main difference between regular objects and classes is that classes have a __bases__ attribute, a tuple, which indicates from which other class(es) this class inherits. MyClass, like all classes, should really have two pointers in our diagram — one representing its type, and another representing from which class (object) it inherits:

Python objects 3Many of the people to whom I teach Python are confused by the distinction between type and object, and what roles they play in the object’s life. Consider this:

  • Because MyClass is an instance of type, type.__init__ determines what happens to our class when it is created.
  • Because MyClass inherits from object, invoking a method on m will result in first looking for that method on MyClass. If the method doesn’t exist on MyClass, then Python will look on object.

All of this is well and good, but let’s take it a bit further: We know that MyClass is an instance of type. But this means that type itself is a class, right? What is the type of this type class?

>>> type(type)

Yes, in one of my favorite parts of Python, the type of type is type. In other words, type is an instance of itself. Pretty cool, eh? Let’s see how that fits into our diagram:

Python objects 4

If type is a class, then we know it must have two pointers in our diagram — one pointing to its class (type, aka itself), but the one to the class from which it inherits. What does type inherit from?

>>> type.__bases__

Let’s thus update our diagram, to show that type inherits from object. This makes sense, since if we invoke str(MyClass), we can rely on the inherited  implementation of object.__str__, without having to create a separate type.__str__.  And indeed, it would seem that this is what happens:

>>> type.__str__ is object.__str__

Let’s now update our diagram to indicate that type inherits from object:

Python objects 5

Finally, let’s not neglect our object class. As an object, it too must have a type. And as a class, we know that its type is type. Let’s add that to our diagram:

Python objects 6Remember that object is at the top of our inheritance hierarchy. This is represented in Python by an empty tuple:

>>> object.__bases__

We can represent this in our diagram in the following way:

Python objects 7Finally, let’s see what happens when we add a new class to this hierarchy, subclassing from MyClass. MySubClass inherits from MyClass, but is still an instance of type:

Python objects 8

If you’re an experienced Python developer, then the above may well be second nature for you. But if you’re new to the language, and particularly to the ways in which the various objects and classes interact, then I hope this has provided you with some additional clarity. Please let me know if there are additional aspects that you find confusing, and I’ll try to clarify them in future blog posts.

If you liked this explanation, then you’ll likely also enjoy my ebook, “Practice Makes Python,” with 50 exercises meant to improve your Python fluency.

Registration is open for my October Webinars (about regexps and technical training)

September has been busy with work and holidays, but I’m gearing up for an exciting and busy October. Among other things, I’m giving two (free) Webinars in that month, and you can already register for them:

  • Intermediate Regular expressions: In my previous Webinar about regular expressions, I covered the basics.  In this one, we’ll go further, spending a great deal of time talking about groups, backreferences, and some other topics that tend to confuse people.  I’ll be using Python to demonstrate regexps, but this Webinar isn’t only aimed at Python developers. Registration is free; sign up here.
  • Technical training: I’m starting to spend more and more time helping people to become technical trainers, teaching programming in high-tech companies. (Indeed, I’m in the process of starting my coaching program for people interested in improving their training skills.) In this Webinar, which I expect will be the first of many, I’ll give an overview of the technical-training landscape, how it works, and why you should seriously consider providing training services. I’ll briefly review pedagogical, logistical, and business considerations for the aspiring technical trainer. Registration for this training is free; you can sign up here.

Understanding nested list comprehensions in Python

In my last blog post, I discussed list comprehensions, and how to think about them. Several people suggested (via e-mail, and in comments on the blog) that I should write a follow-up posting about nested list comprehensions.

I must admit that nested list comprehensions are something that I’ve shied away from for years. Every time I’ve tried to understand them, let alone teach them, I’ve found myself stumbling for words, without being clear about what was happening, what the syntax is, or where I would want to use them. I managed to use them on a few occasions, but only after a great deal of trial and error, and without really understanding what I was doing.

Fortunately, the requests that I received, asking how to work with such nested list comprehensions, forced me to get over my worries. I’ve figured out what’s going on, and even think that I understand what my problem was with understanding them before.

Get the bonus content: Nested list comprehensions

The key thing to remember is that in a list comprehension, we’re dealing with an iterable. So when I say:

[ len(line) 
for line in open('/etc/passwd') ]

I’m saying that I want to iterate over the file object we got from opening /etc/passwd. There will be one element in the output list for each element in the input iterable — aka, every line in the file.

That’s great if I want my list comprehension to return something based on each line of /etc/passwd. But each line of /etc/passwd is a string, and thus also iterable. Maybe I want to return something not based on the lines of the file, but on the characters of each line.

Were I to use a “for” loop to process the file, I would use a nested loop — i.e., one loop inside of the other, with the outer loop iterating over lines and the inner loop iterating over consonants. It turns out that we can use a nested list comprehension, too. Here’s a simple example of a nested list comprehension:

[(x,y) for x in range(5) for y in range(5)]

If your reaction to this is, “What in the blazes does that mean?!?” then you’re not alone. Until just recently, that’s what I thought, too.

However: If we rewrite the above nested list comprehension using my preferred (i.e., multi-line) list-comprehension style, I think that things become a bit clearer:

 for x in range(5)  
 for y in range(5)]

Let’s take this apart:

  • Our output expression is the tuple (x,y). That is, this list comprehension will produce a list of two-element tuples.
  • We first run over the source range(5), giving x the values 0 through 4.
  • For each value in x, we run through the source range(5), giving y the values 0 through 4.
  • The number of values in the output depends on the number of runs of  the final (second) “for” line.
  • The output, not surprisingly, will be all of the two-element tuples from (0,0) to (4,4).

Now, let’s mix things up by changing them a bit:

  for x in range(5)  
  for y in range(x+1)]

Notice that now, the maximum value of y will vary according to the value of x. So we’ll get from (0,0) to (4,4), but we won’t see such things as (2,4) because y will never be larger than x.

Again, it’s important to understand several things here:

  • Our “for y” loop will execute once for each iteration over x.
  • In our “for y” loop, we have access to the variable x.
  • In our “for x” loop, we don’t have access to y (unless you consider the last value of y to be useful, but you really shouldn’t).
  • Our (x,y) tuple is output once for each iteration of the *final* loop, at the bottom.

Here’s another example: Assume that we have a few friends over, and that we have decided to play several games of Scrabble. Being Python programmers, we have stored our scores in a dictionary:

{'Reuven':[300, 250, 350, 400], 
 'Atara':[200, 300, 450, 150], 
 'Shikma':[250, 380, 420, 120], 
 'Amotz':[100, 120, 150, 180] }

I want to know each player’s average score, so I write a little function:

def average(scores):  
    return sum(scores) / len(scores)

If we want to find out each individual’s average score, we can use our function and a standard comprehension — in this case, a dict comprehension, to preserve the names:

 >>> { name : average(score)  
       for name, score in scores.items() }

{'Amotz': 137, 'Atara': 275, 'Reuven': 325, 'Shikma': 292}

But what if I want to get the average score, across all of the players? In such a case, I will need to grab each of the scores from inside of the inner lists. To do that, I can use a nested list comprehension:

>>> average([ one_score  
              for one_player_scores in scores.values()  
              for one_score in one_player_scores ])


What if I’m only interested (for whatever reason) in including scores that were above 200? As with all list comprehensions, I can use the “if” clause to weed out values that I don’t want. That condition can use any and all of the values that I have picked out of the various “for” lines:

>>> [ one_score      
      for one_player_scores in scores.values()     
      for one_score in one_player_scores
      if one_score > 200]

[300, 250, 350, 400, 300, 450, 250, 380, 420]

If I want to put these above-200 scores into a CSV file of some sort, I could do the following:

>>> ','.join([ str(one_score)  
               for one_player_scores in scores.values() 
               for one_score in one_player_scores  
               if one_score > 200])


Here’s one final example that I hope will drive these points home: Let’s assume that I have information about a hotel. The hotel has stored its information in a Python list. The list contains lists (representing rooms), and each sublist contains one or more dictionaries (representing people). Here’s our data structure:

rooms = [[{'age': 14, 'hobby': 'horses', 'name': 'A'},  
          {'age': 12, 'hobby': 'piano', 'name': 'B'},  
          {'age': 9, 'hobby': 'chess', 'name': 'C'}],  
         [{'age': 15, 'hobby': 'programming', 'name': 'D'}, 
          {'age': 17, 'hobby': 'driving', 'name': 'E'}],  
         [{'age': 45, 'hobby': 'writing', 'name': 'F'},  
          {'age': 43, 'hobby': 'chess', 'name': 'G'}]]

What are the names of the people staying at our hotel?

 >>> [ person['name']      
       for room in rooms
       for person in room ]

['A', 'B', 'C', 'D', 'E', 'F', 'G']

How about the names of people staying in our hotel who enjoy chess?

>>> [ person['name']  
      for room in rooms  
      for person in room  
      if person['hobby'] == 'chess' ]

['C', 'G']

Basically, every “for” line flattens the items over which you’re iterating by one more level, gives you access to that level in both the output expression (i.e., first line) and in the condition (i.e., optional final line).

I hope that this helps you to understand nested list comprehensions. If it did, please let me know! (And if it didn’t, please let me know that, as well!)

Get the bonus content: Nested list comprehensions