Thinking with “map”

In the free Webinar I gave yesterday about functional programming, I mentioned that “map,” or its equivalent (e.g., Python’s list comprehensions), is a powerful tool that I use nearly every day. Once you get into the functional mode of thinking, you’re constantly finding ways to turn one collection into another collection. It’s a mindset that takes time and practice, but allows you to solve many problems quickly and easily. The trick is to see beyond the initial problem, and to understand how you can think in terms of a source collection and a target collection.

For example, I was teaching an introductory Python course just today, and someone came to me and asked how they can turn a URL query string (e.g., x=1&y=2&z=abc) into a dictionary. Now, this isn’t a super-hard problem, but the reaction on his face to the way in which I solved it demonstrated how he would have used a completely different approach, and that functional thinking didn’t even cross his mind.

The first thing to notice is that in a query string, you’ve got name-value pairs separated by & signs. So the first task is to take the query string, and turn it into a list:

>>> query_string.split('&')
['x=1', 'y=2', 'z=abc']

Now that we have these items in a list, we can transform each of them. But wait — transform them?  Yes, and that’s where the “map” mindset comes in. You want to be moving your data into a list, which allows you to transform each element into another one. In this case, I want to transform each of the elements of the list into a dictionary pair.

Fortunately, we see that each name-value pair has a “=” sign between the name and the value. We can use that to our advantage, splitting each of the pairs:

>>> [item.split('=') for item in query_string.split('&')]
[['x', '1'], ['y', '2'], ['z', 'abc']]

In other words, we have now created a list of lists, in which the first element of each sub-list is our intended dictionary key, and the second element is our intended dictionary value.

Well, we can use dict() to construct dictionaries in Python. And whadaya know, it works just fine with a sequence of two-element sequences. We normally think of feeding dict() a list of tuples, but it turns out that a list of lists works just fine, as well:

>>> dict([item.split('=') for item in query_string.split('&')])
{'x': '1', 'y': '2', 'z': 'abc'}

And just like that, we’ve created our dictionary.

Of course, we could also use a dictionary comprehension:

>>> { item.split('=')[0] : item.split('=')[1] 
    for item in query_string.split('&') }
{'a': '1', 'b': '2', 'c': 'xyz'}

Now, none of the steps here was particularly difficult. Indeed, while the syntax of comprehensions can be a bit complex, the real difficulty here was seeing the original string, and immediately thinking, “Wait, if I can just turn that into a list, then I can easily create a dictionary from that.”

These sorts of transformations are everywhere, and they allow us to take seemingly difficult tasks and turn them into relatively simple ones.

The relative speeds of str.format and %

My most recent blog post talked about the use of str.format instead of the % operator for interpolating values into strings. Some people who read the post wondered about their relative speeds.

I should first note that my first response to this is: I don’t really care that much. I’m not saying that speed isn’t important, or that optimization should never be done. Rather, my philosophy is that people are expensive and computers are cheap — and thus, anything we do to make people more productive, even if that comes at the expense of program speed, is probably fine.

Of course, that’s not always going to be true. Sometimes, you need (or just want) to squeeze more out of your computer. And to be a good programmer, you also need to know the relative advantages and disadvantages of the techniques you’re using.

So I decided to run a few, quick benchmarks on the relative speeds of str.format and %.  Sure enough, the % operator was a lot faster.  I ran my benchmarks the magic %timeit command that is built into the IPython interactive shell.  (If you’re using Python and aren’t using IPython, you should really switch ASAP.)  Note that in order to make things easier to read, I’m removing the IPython input and output prompts, and using >>> to denote where I entered text.

>>> name = 'Reuven'
>>> %timeit 'Hello there, {}'.format(name)
1000000 loops, best of 3: 243 ns per loop

>>> %timeit 'Hello there, %s' % name
10000000 loops, best of 3: 147 ns per loop

Wow.  As you can see, %timeit executed each of these lines of code 1,000,000 times. It then gave the average speed per loop. The % operator was, on average, about 100 ns faster than str.format. That shouldn’t come as a huge surprise, given that % is an operator (and thus doesn’t require a method call), doesn’t handle indexes and attributes, and can (I’m guessing) pass a great deal of its work off to C’s printf function.

Then again, is 100 ns really that long to wait for a formatted string?  I’m not so sure, to be honest.

What happens if we perform more than one interpolation?

>>> first = 'Reuven'
>>> last = 'Lerner'
>>> %timeit 'Hello there, {} {}'.format(first, last)
1000000 loops, best of 3: 371 ns per loop

>>> %timeit 'Hello there, %s %s' % (first, last)
1000000 loops, best of 3: 243 ns per loop

Each of these takes significantly longer to run than was the case with a single replacement. The difference between them continues to be about 120 ns per assignment — still not something to worry about too much, but the difference does exist.

What if I make the strings space-padded?

>>> %timeit 'Hello there, {:10} {:15}' % (first, last)
1000000 loops, best of 3: 459 ns per loop

>>> %timeit 'Hello there, %10s %15s' % (first, last)
1000000 loops, best of 3: 254 ns per loop

Now we see an even starker difference between the two ways of handling things. What about something like floating-point math, which takes longer?

>>> import math
>>> %timeit 'I love to eat {}'.format(math.pi)
1000000 loops, best of 3: 587 ns per loop

>>> %timeit 'I love to eat %f' % math.pi
1000000 loops, best of 3: 354 ns per loop

Limiting the number of decimals shown doesn’t seem to change the outcome very much:

>>> %timeit 'I love to eat {:.3}'.format(math.pi)
1000000 loops, best of 3: 582 ns per loop

>>>%timeit 'I love to eat %.3f' % math.pi
1000000 loops, best of 3: 329 ns per loop

UPDATE: Several people on Reddit took me to task for failing to consider the overhead of the str.format method call.  I mentioned this briefly above, but should have realized that there was an easy to to avoid this, namely aliasing the attributes (the method str.format and the float math.pi) to local variables:

>>> f = 'I love to eat {:.3}'.format
>>> p = math.pi
>>> %timeit f(p)
1000000 loops, best of 3: 489 ns per loop

>>> %timeit 'I love to eat %f' % p
1000000 loops, best of 3: 370 ns per loop

We still see significant overhead. Again, I’m guessing that a lot of this has to do with the overhead of a method vs. an operator. I’m not about to start looking at the bytecodes; this wasn’t meant to be a super-deep investigation or benchmark, but rather a quick check and comparison, and I think that on that front, it did the trick.

So, what have we learned?

  • Yes, str.format is slower than %.
  • The number of parameters you pass to str.format, and whether you then adjust the output with padding or a specified number of decimals, significantly influences the output speed.
  • That said, in many programs, the difference in execution speed is often 100 ns, which is not enough to cause trouble in many systems.

If speed is really important to you, then you should probably use %, and not str.format. However, if speed is more important than the maintainability or readability of your code, then I’d argue that Python is probably a poor choice of programming language.

Teaching an old dog new tricks — or, how I learned to love Python’s str.format, and gave up on %

I have been programming in Python for many years. One of the things that I wondered, soon after starting to work in Python, was how you can get Perl-style variable interpolation. After all, Perl (like the Unix shell) supports two types of quotes — single quotes (in which everything is taken literally) and double quotes (in which variables’ values are inserted). Thus, in Perl, you can do something like:

$name = 'Reuven';
print "Hello, $name\n";

And sure enough, it’ll print “Hello, Reuven”.

Because single and double quotes are equivalent in Python (so long as they come as a matched set), there is no variable interpolation. The technique that I learned years ago, when I started with Python, was that you could use the % operator on a string. In this context, % looks to the string on its left, determines how many values within the string need to be replaced, and then looks right to find those values. It then returns a new string, effectively interpolating the values. For example:

>>> name = 'Reuven'
>>> "Hello, %s" % name

'Hello, Reuven'

The above Python code works just fine, returning a string with a nice, personalized greeting. And indeed, for the length of my time working with Python, I have enjoyed using this % syntax. Yes, it’s a bit weird. And no, I cannot ever remember more than the absolute basics of printf’s various % codes, meaning that I either make everything a string (with %s), or I guess, or I look up the printf codes and what they do. But to be honest, I normally just use %s, and thus benefit additionally from the fact that Python will silently invoke “str” on the parameter.

The thing is, % is supposedly going away, or is at least deprecated. (A note on the python-dev list indicates that % will go away no sooner than 2022, which is a heckuva long time from now.) As of Python 2.6, not to mention Python 3.x, we have been told that it will eventually disappear, and that we shouldn’t use % any more. Instead, we should use the str.format method.

I have always mentioned str.format to my Python classes, but to be honest, I’ve usually relied upon % when giving live demonstrations and answering questions. And I would even encourage my students to use the % syntax, in part because I found it to be so much easier.

And yet.  I knew that I was doing something wrong, and I knew that I was probably misleading my students to some degree. Thus, in the last three classes I taught, I started to push harder in the direction of str.format. And that’s when I realized two things: (1) It’s just as easy as %, and actually easier in some ways, and (2) I hadn’t learned enough about str.format to use it, beyond the simplest ways. I thus spent a great deal of time researching it — and found out that str.format, while it takes some getting used to, is more than worth the effort.

Let’s start with the simplest case. I’d like to be able to say “Good morning” to someone, using both their first and last names. Assuming that I have variables named “first” and “last”, I can do this with the old syntax as follows:

>>> first = 'Reuven'
>>> last = 'Lerner'
>>> "Good morning, %s %s" % (first, last)

'Good morning, Reuven Lerner'

In this example, we already see one of the problems with the % syntax — namely, that if we have more than one value, we need to put it into a tuple. Perhaps this is logical and reasonable from Python’s perspective, but I can assure you that it surprises many of my students.

So, how would we do it using str.format? Pretty similarly, in many ways:

>>> "Good morning, {} {}".format(first, last)

'Good morning, Reuven Lerner'

Notice that we’ve changed things a bit here. No longer are we invoking a binary operator (%) on the string. Rather, we’re invoking a string method that takes a set of parameters. This is more logical and consistent. I can’t tell you how many of my students think that % is somehow connected to “print”, when in fact it’s connected (in the case of string formatting) to strings. Having to use put the “.format” at the end of the string makes the method call more obvious.

As you might already know, the “{} {}” in the string tells str.format to take its two parameters, and to insert them, in order, into the string. Because there are two arguments, we can only have two {} inside of the string. This is a bit harder to understand, both because having {} in Python reminds many people of a dictionary, and because the empty curly braces look a bit weird. But fine, I can live with that, and got used to it very quickly.

Where str.format quickly shows its advantages over %, however, is if I want to display the input parameters in reverse order. When I use %, there is no real way to do that. Plus, if I want to reuse a value passed to %, I cannot do so. With str.format, I can swap the order in which the inputs are displayed:

>>> "Good morning, {1} {0}".format(first, last)

'Good morning, Lerner Reuven'

Notice what happened in the above: If I just use “{} {}”, then str.format uses the two parameters in order. However, I’m also able to treat the parameters as a sequence, with indexes starting at 0. I can then insert them in reverse order, as I did above, or in the regular order:

>>> "Good morning, {0} {1}".format(first, last)

'Good morning, Reuven Lerner'

Note that if you explicitly state the field numbers, then you cannot rely on the automatic numbering.

Of course, this lets me also pass a sequence of values to be inserted, so long as we then use the splat (*) operator on it, to turn it into a parameter list:

>>> names = ('Reuven', 'Lerner')
>>> "Good morning, {} {}".format(*names)

'Good morning, Reuven Lerner'

You can also call str.format with keyword arguments. When you do this, you can then put a keyword name within the {}:

>>> "Good morning, {first} {last}".format(first='Reuven', last='Lerner')

'Good morning, Reuven Lerner'

The above really appeals to me. The named parameters are explicit (if a bit long), and the use of {first} and {last} is quite readable — certainly more so than %(first)s ever was with the % operator!

I can, of course, also pass a dictionary of names, using the ** operator to turn it into a set of keyword arguments:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Good morning, {first} {last}".format(**person)

'Good morning, Reuven Lerner'

I described all of these to my students in the last month, and I was pleasantly surprised to see how comfortable they were with the syntax. I’m sure that this reflects, to some degree, my comfort with the syntax, as well.

I should note that you can combine numeric and keyword arguments when working with str.format. I really suggest that you not do so. The results would look like this:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Good {0}, {first} {last}".format('morning', **person)

'Good morning, Reuven Lerner'

Yukko.

Now, the one thing that would appear to be missing from str.format is… well, formatting! The bad news is that str.format has a completely and different way of indicating how you want to format output. The good news is that it’s not too hard to learn and understand.

Let’s start with the easiest part: If you want to display a string within a fixed-width field, then you can do so by adding a colon (:) and then a number.  So to put my name in a fixed-width field of 10 spaces, we would say:

>>> "Your name is {name:10}".format(name="Reuven")

'Your name is Reuven    '

(Notice the trailing spaces after my name.)

In the above example, my name is left-justified. If I want it to be right-justified, I could use a > sign between the : and the number:

>>> "Your name is {name:>10}".format(name="Reuven")

'Your name is     Reuven'

And yes, I could have used an optional < symbol to say that my name should be left-justified within the field of 10 spaces in the first example.  Or I could center the text in a field of 10 spaces with the ^ specifier instead of < or >.

To pad the string with something other than a space, we specify it before the <, >, or ^ character. For example, if I’m moving to Hollywood, then perhaps I should do something like this:

>>> "Your name is {name:*^10}".format(name="Reuven")

'Your name is **Reuven**'

If I want to put the string in the (default) left-most position of the string, filling with characters on the right, then I must use the < specifier, so that the text will be on the left, and the stars on the right.

So it’s pretty clear that str.format is pretty snazzy when it comes to text. How about numbers? I wasn’t really sure how things would work here, but it turns out that they’re also quite straightforward. If you’re displaying integers, then you can go ahead and say:

>>> "The price is ${number}.".format(number=123)

'The price is $123.'

So far, we don’t see any difference between passing an integer and a string. And indeed, they share many characteristics. However, we might want to display an integer in a different way. We can do that using one of the (many) modifiers that str.format provides — letters placed just before the end of the closing } character. For example, we can get the price in binary (with a trailing “b”), or in hexadecimal (with a trailing “x”), as in the following example:

>>> "The price is ${number:x}.".format(number=123)

'The price is $7b.'

Of course, we can also zero-pad the number, such that it will always take up a fixed width. Just place a 0 between the colon and the width:

>>> "Your call is important to us. You are call #{number:05}.".format(number=123)

'Your call is important to us. You are call #00123.'

Notice that inside of the {}, we cannot put executable Python code. Instead, there is a mini-language that is separate and different from Python. However, there are two small exceptions to this rule: (1) We can retrieve any attribute with the standard . notation, and (2) we can retrieve a single item with the [] notation.

For example:

>>> class Foo(object):
        def __init__(self):
        self.x = 100
>>> f = Foo()
>>> 'Your number is {o.x}'.format(o=f)

'Your number is 100'n

Notice how we were able to retrieve the “x” attribute from the “f” object, which we mapped to “o” within the string. However, while you can retrieve an attribute, you cannot execute it. Thus, the following will not work:

>>> "Your name is {name.upper()}".format(name="Reuven")

AttributeError: 'str' object has no attribute 'upper()'

See what happened? I said “name.upper()”, in order to execute the method “str.upper” on “name”.  However, Python doesn’t want me to execute code there. So it takes the name of the attribute literally — and thus complained that there is no attribute “upper()”, with the parentheses. Of course, if you try it without the parentheses, it’ll work, for some value of “work”:

>>> "Your name is {name.upper}".format(name="Reuven")

'Your name is <built-in method upper of str object at 0x1028bf2a0>'

Similarly, we can retrieve an individual element of a sequence or mapping with []. However, we cannot use the slice notation for more than one element. For example:

>>> "Your favorite number is {n[3]}.".format(n=numbers)

'Your favorite number is 3.'

However:

>>> "Your favorite numbers are {n[2:4]}.".format(n=numbers)

ValueError: Missing ']' in format string

The “:” character, which we use for slices, isn’t available in format strings, because it’s used to control the formatting of the output.

You can, of course, use [] on a dictionary, as well. However — and this is a bit weird for Python — we omit the quote marks, even when our key is a string. For example:

>>> person = {'first':'Reuven', 'last':'Lerner'}
>>> "Your name is {p[first]}.".format(p=person)

'Your name is Reuven.'

If we were to include the quotes…

>>> "Your name is {p['first']}.".format(p=person)

KeyError: "'first'"

There is actually a lot more to str.format than what I have shared here. In particular,  each type has its own format specifications, which means that you can do certain things with floats (e.g., setting the precision) that you cannot do with strings.

You can even add formatting functionality to your own Python classes, such that they’ll be displayed in the way that you want, along with format specifiers that you define.

If you want to learn more about this, I’d definitely suggest reading PEP 3101, which describes str.format. I’d also suggest a slide show by Eric Smith, which summarizes things nicely. Finally, the Python documentation has some excellent examples, including a guide for moving from % to str.format.

I hope that this was helpful and useful! If you enjoyed this blog post, check out my many other resources, including my free e-mail course on Python scoping, and my free Webinar on functional programming in Python.

Three Pythonic products: A (free) Webinar, a course, and an ebook

I love developing software.  I also love helping people to learn how to develop better. That’s why I have been teaching programming classes for more than a decade, and why I write about programming. There is so much to learn; it’s a rare day on which I don’t learn something new, and it’s a rare week in which I don’t apply some new understanding to a problem that I’m solving for a client.

Now that I have finished my PhD, I have some more time to focus on creating products that I believe will help people to program better. I’m pleased to announce three initial such products, all aimed at Python developers who want to improve their skills:

  1. A free Webinar (“How functional programming will make you a better Python programmer“), scheduled for Monday, September 15th, at 2 p.m. Eastern Time. The Webinar will consist of a 45-minute presentation, followed by a Q&A period. My goal is to describe functional programming, and then demonstrate some of the functional techniques that are available in Python. If you are a Python programmer who has heard about functional programming but doesn’t really know what it is, then I think you’ll get a lot out of this free seminar. Please register and attend! Even if you’re not interested in my other products, I’d love to see you at this Webinar — if only to learn what topics people do want to learn, and what I should address in future Webinars, books, and online classes.
  2. For those who want to dig deeper into functional programming in Python, I’ll be giving a full-day, live, online course on the subject. This course, which will include lectures, demos, and many exercises, begins with an introduction to functions in Python, then describes how to pass functions as parameters, list/set/dict comprehensions, customizing list.sort and sorted, lambda, map/filter/zip, reduce, and the “functools” module. This class, like all of my classes, will be intense and highly interactive — but will give you a new perspective on programming in general, and particularly in Python. You can register for this one-day course at EventBrite; it’ll be given on Sunday, September 21st and again on Tuesday, September 23rd.
  3. If you have taken a Python course, but feel that you need to practice Python more thoroughly before you feel like you’re truly fluent in the language, then you might be interested in my ebook, “Practice Makes Python.” The book will contain about 50 exercises in various aspects of Python, with the aim of helping you to learn through doing. I hope to release the ebook about a month from now, and am basing many of the exercises on my courses, as well as comments I’ve received from many of my students. If you’re interested in the book, you should sign up for the announcement list, where I’ll be providing sample content and behind-the-scenes information.

These products (well, the paid ones, anyway) come with a full, 100% money-back guarantee. And of course, if you have questions, you can always contact me via e-mail.

Summary of my “reduce” series

I teach Ruby and Python to a lot of people — in formal courses, and in one-on-one pairing sessions, both online and in person.  I’ve found that for many people, the whole notion of functional programming seems strange and difficult, as well as something of a waste of time.  After all, if you have objects, why would you need functional programming?

The answer, of course, is that no single paradigm has all of the answers; each has its strengths and weaknesses.  Understanding how to use them together can provide great benefits.  As a result, I spend time when teaching both Ruby and Python on the basics of functional programming, and then the various functions and methods that each language provides in this area.

Of these, the function that most often causes people to wrinkle their noses and/or get confused is “reduce”.  And to be honest, I often tell students in my classes that “reduce” is one of those functions that is incredibly powerful and clever, but for which you sometimes need to wait in order to find a use case.  I decided to explore some use cases, and ways in which “reduce” could be used — and I hope that these have been useful.

To summarize, here are the posts that I wrote on this topic:

I hope that this series has been useful and interesting, and would appreciate hearing ideas for additional deep-dives into areas of Python, Ruby, or programming in general.  What subjects do you find confusing?  What methods do you think are somewhat useless?  Let me know, and I’ll try to address them in future blog posts!

 

Implementing “filter” with “reduce”, in Ruby and Python

We’re nearly at the end of my tour of the “reduce” function in Ruby and Python.  Just as I showed in the previous installment how we can implement the “map” function using “reduce”, I want to show how we can implement another functional-programming standard, “filter”, using “reduce” as well.  As before, I’ll show examples in both Ruby and Python.

Whereas “map” transforms each element of its input, but returns a value for each one, “filter” doesn’t change its inputs at all — but it does only selectively return them. The combination of “map” and “filter” is a classic and powerful one.  While Python still does include the “map” and “filter” functions, they are more usually expressed using list comprehensions.  The “map” part is the left-hand-side of the list comprehension, defining the transformation.  The “filter” part is the (optional) right-hand side of the comprehension, indicating when a value should be placed in the output.

Here is a use of “filter” to keep only odd numbers (taking advantage of the fact that in Python, 0 is a false value, and % is the modulus operator on integers):

>>> filter(lambda x: x%2, range(10))
[1, 3, 5, 7, 9]

It turns out that we can use “reduce” to implement our own version of “filter”, which does the same thing. Once again, we’ll define a function that takes two parameters, a sequence and a function:

def myfilter(sequence, f):
 return reduce(lambda total, current: total + ([current] if f(current) else []),
               sequence,
               [])

In other words: Our function invokes reduce, initializing it with an empty list ([]). We go over every element of “sequence”; for each element, we invoke the user-supplied function, f, passing “current” as a parameter.  We always return “total”, which is the list that we have created so far.  The question is whether we also want to return “current”, and that depends on whether f(current) returns True or False.  If it returns True, then we add “current” to the accumulated list.  If it returns False, then we add [] (i.e., the empty list) to the list. If we again want to filter a list of numbers, such that we get only the odd ones, we can now say:

>>> myfilter(range(10), lambda x: x%2)
[1, 3, 5, 7, 9]

What can be tricky for many newcomers to Python and functional programming is that the implementation of “myfilter” involves a lambda (i.e., anonymous function) in our invocation of “reduce”, but also a function (often defined as a lambda) that is passed to “myfilter” by the user.  I’ve found in my teaching that “lambda” tends to surprise and confuse many people, and two lambda expressions in the same place can really be a cognitive burden for newcomers.

Now that we’ve implemented a version of “filter” in Python, let’s turn to Ruby.  Here, as with “map”, we’re not going to pass a function (method) as a parameter. Rather, we’ll pass a block to our “myfilter” method, and then yield to the block whenever we want to invoke it.  For example:

def myfilter(enumerable)
 enumerable.reduce([]) {|total, current| yield(current) ? total << current : total }
end

(By the way, “filter” is traditionally called both “select” and “find_all” in Ruby circles.)  Notice that there are two blocks involved here: Enumerable#reduce invokes a block for each element of “enumerable” over which it iterates.  And then within that block, we yield to the block passed to “myfilter”, to check and see if the current element should be shared.

We can then invoke “myfilter” as follows:

e = (1..10).to_a
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

myfilter(e) {|n| n.even?}
=> [2, 4, 6, 8, 10]

Sure enough, we’ve managed to create our own version of “filter”, just as we were able to do for “map”.

Next time: The exciting conclusion of our tour of the “reduce” function.  Same reduce-time, same reduce-channel!

Implementing “map” with “reduce”, in Ruby and Python

This is another installment in my “reduce” series of posts, aimed at helping programmers understand this function, with examples in both Ruby and Python.  So far, we have seen how we can build a number of different types of data structures — integers, strings, arrays, and hashes — using “reduce”.  But the really interesting use of “reduce,” in my mind, is as a way to implement “map” and “filter,” two classic functions from the functional-programming world.  In this post, I show how to implement “map” in both Ruby and Python.

First, let’s remember what “map” does: It takes a sequence, and applies a function to that sequence, one element at a time. The output of applying that function on each element produces a sequence of values, which we then get back from “map”.   In Ruby, we can thus say:

[1,2,3,4,5].map {|n| n*n}

which produces the array:

[1,4,9,16,25]

In Ruby, any block that you want to pass is fair game.  The block takes a single parameter, the current element of the enumerable that should be transformed. Whatever the block returns is put into the output array; because nearly everything in Ruby is an expression, just about anything is fair game.  Moreover, the output array will always consist of the same number of elements the input array.  So if you run Enumerable#map on an array of 15 items, you can be sure that you’ll get a new, anonymous array of 15 items.  The value of each of these items will depend on what the block emits.

If we want to implement “map” using “reduce,” we know that we need to invoke a block once per element of the enumerable.  The block should add one element to the end of an array that we have created.  The element that we add to the array should be the result of invoking a method on the current element.

For example, if we want to use “reduce” to implement the “map” example from above, in which we get the squares of the input numbers, we could do the following:

[1,2,3,4,5].reduce([]) {|total, current| total << current * current}

and sure enough, we get the following output:

[1, 4, 9, 16, 25]

If squaring numbers were the only thing we were interested in doing, we could write a “mymap” method that would do that:

def mymap(enumerable)
  enumerable.reduce([]) {|total, current| total << current * current}
end

mymap([1,2,3,4,5])

And sure enough, we get the array of squared numbers.  But as you can imagine, this implementation of map is somewhat lacking… we would really like to be able to pass a block, just as we can do with the built-in Enumerable#map method.  In Ruby, we can do this very easily by passing a block to our method, and then yielding to it (i.e., invoking it) once per iteration within the “reduce”.  For example:

def mymap(enumerable)
  enumerable.reduce([]) {|total, current| total << yield(current)}
end

mymap([1,2,3,4,5]) {|n| n*n}
=> [1, 4, 9, 16, 25]

mymap([1,2,3,4,5]) {|n| (n * 30).to_s}
=> ["30", "60", "90", "120", "150"]

Not bad!  Using “reduce”, we’ve managed to implement a version of “map”.  That’s pretty snazzy, if you ask me.

We can do almost exactly the same thing in Python.  Python doesn’t have Ruby-style blocks, but it does have functions — both named functions and anonymous ones created with “lambda” — that can be passed easily as parameters, and then invoked using parentheses. The Python version of “mymap” would thus look something like this:

def mymap(sequence, f):
  return reduce(lambda total, current: total + [f(current)], sequence, [])

In other words, our function takes two parameters, a sequence and a function. We then build the resulting list, one element at a time, based on the output from invoking our “f” function on each element in sequence.  It turns out that “map” is not that different from our use of “reduce” to build lists; the only difference, in fact, is that we’re letting the user pass us a function, which we then invoke on the current item.

It turns out, then, that implementing “map” isn’t so hard or mysterious!  In the next installment of this series, we’ll see how to implement the “filter” function (called “select” or “find_all” in Ruby), another staple of functional programming.

Creating Python dictionaries with “reduce”

In the last few installments (first, second, third, and fourth) of this series, we saw how “reduce” can be used to build up a scalar value (such as a number or string), or even a simple collection, such as a list (in Python) or an array (in Ruby).  The jewel in the data-structure crown for these high-level languages is known by many names: Dictionary, hash, hash table, hash map, and mapping.  I tend to use these quite a bit, and I know that I’m not alone; hashes are easy to use and work with, have O(1) lookup characteristics, guarantee the uniqueness of their keys, and make the code self-documenting.  What’s not to like?

There are times when you might want to build a dictionary step by step, and “reduce” can help you to do that.  I should note that recent versions of Python offer “dictionary comprehensions,” which are one of my favorite features of the language, and can be used similarly to “reduce” — and probably with less confusion among new programmers.  Nevertheless, I find it interesting and instructive to see how we can use “reduce” to create a data structure that we wouldn’t normally associate with this function.

I should note that some of the things you’re going to see here are not really recommended coding practices, particularly in Python.  But they will be fun, which is important, no?

Let’s start with Python: If I want to create a dictionary using “reduce”, I have a few different options.  Perhaps the first and easiest one is not exactly what you might have had in mind, namely passing the “dict” function (which creates a new instance of “dict” — i.e., a new dictionary) a list of two-element tuples.  For example, I can say:

dict([('a', 1), ('b', 2)])

and I get:

{'a': 1, 'b': 2}

So if I can tell “reduce” to emit a list of tuples, I can then pass that list of tuples to “dict”, and get a dictionary back.  Let’s try that:

reduce(lambda output, current: output + [(current, ord(current))], 'abc', [])

In the above case (and in all of the examples we’ll use here), I’m trying to build an oh-so-useful dictionary in which the keys are the letters a, b, and c, and the values are the ASCII codes for those letters.  In the above Python code, I iterate over the string ‘abc’, which is a three-element sequence of those three letters.  For each letter, I return the current value of “output” (which is guaranteed to be a list), plus a new, single-element list.  That single-element list is a tuple consisting of the current letter and its ASCII code.  So the output of the above “reduce” call will be:

[('a', 97), ('b', 98), ('c', 99)]

which, when we feed it into dict():

>>> dict([('a', 97), ('b', 98), ('c', 99)])
{'a': 97, 'b': 98, 'c': 99}

Voila!  We’ve created a dictionary.  (By the way, the “dict.items” method, which is often used to iterate over the keys and values of a dictionary, returns a list of tuples in precisely this format.)

So this is nice, but perhaps there’s a way to have “reduce” build up a dictionary all by itself?  The answer is “yes, but” — because Python tries hard to stop us from modifying data structures within a lambda.  But if we’re willing to be a bit weird, we can get around that.  How? By taking the dictionary that we received from the previous iteration (i.e., “output”), turning it into a list of tuples with dict.items, adding that list to our current pair, and then turning the whole thing back into a dictionary.

reduce(lambda output, current: dict(output.items() + 
                     [(current, ord(current))]), 'abc', {})

Note that while this is unwieldy, it doesn’t violate one of the key tenets of functional programming, namely treating data as immutable.  However, putting this sort of code in a production system will lead to hatred from your colleagues, job security, or (if you’re really lucky) both.  Oh, and it’s probably rather inefficient as well, although I’m not crazy enough to actually benchmark it.

That said, consider what we’ve done here: We create a new dictionary with each iteration, passing it to “current”.  The new dictionary is created by taking the old one, breaking it into a list of tuples, adding to it a new tuple, and then turning it all into a dictionary again.

There is at least one other way to do this, but it’s going to make the above code seem like beautiful, classic Python in comparison: Remember that we cannot assign to a dictionary within a lambda.  However, we can invoke methods, including the dict.update method, which lets us merge one dictionary into another.  The thing is. dict.update, like most methods in Python that modify data structures, returns None.  If we’re willing to take some risks (and why stop now), we can use the following code to modify our existing dictionary and then return it to the next iteration:

reduce(lambda d, current: d.update({current : ord(current)}) or d, 'abc', {})

The above code takes advantage of the fact that “and” and “or” are short-circuit operators.  Thus, we first invoke d.update on our dictionary.  That returns None.  Our “or” operator then says, “Well, I’d better go to the second argument, because the first one returned a false-y value.  Sure enough, our dictionary — because it isn’t empty — is a true value, which is what the statement returns.  And sure enough, we get our dictionary.

A final way to do this would be to hide the assignment of the dictionary behind an external function.  That is, we have a function do the dirty work for us.  That’ll certainly work, although I see it as somewhat less fun than lambdas and trying to work around their limitations.

def update_and_return(d, new_key, new_value):
    d.update({new_key: new_value})
    return d

>>> reduce(lambda output, current: update_and_return(output, current, ord(current)), 'abc', {})

{'a': 97, 'b': 98, 'c': 99}

Fun stuff!  And, perhaps, the worst method ever for getting an ASCII table into a dictionary.

Next time, we’ll see how to do this with Ruby.

Creating collections with “reduce”

In the first few parts of this series (first, second, and third), I introduced the “reduce” function, and showed how it can be used in a number of ways. However, in all of the examples we have seen so far, the output from our invocations of “reduce” were integers or strings. If we reduce with a “sum” function, then we get an integer representing the sum of all integers.  If we reduce over a measurement, such as the size of an integer or the length of a string, we can implement “min” or “max”.

Another interesting and clever thing that we can do with “reduce” is to create an array (in Ruby) or list (in Python).  Think of it this way: The output of each iteration of “reduce”, which is then passed to the next iteration, can be any data structure we wish. If the output from the iteration is an array, then that’s what will be passed to the next iteration.

In other words, we can use “reduce” to build up an array, with each iteration emitting that same array, perhaps with additional elements added to it.  For example, I can create a new array whose contents are identical to the original one on which “reduce” was invoked:

['a', 'b', 'c'].reduce([]) {|output, current| output << current }

By adding each element to the beginning, I can reverse the array:

['a', 'b', 'c'].reduce([]) {|output, current| output.unshift(current) }

I can create a nested array, too:

['a', 'b', 'c'].reduce([]) {|output, current| output << [current, current] }

“reduce” is thus a great way to create an array, building it up piece by piece.  True, we need to pass “reduce” an empty array as a parameter, and then each iteration needs to return the current value of the array.  But other than that, we can add whatever we want to the array, and modify it in whatever ways we want.  If I want to do something more complex than the above, or if I want to be able to debug individual lines of my block, I can even use do-end, thus spreading the block across multiple lines.  (And yes, I know about the late Jim Weirich’s great suggestion, to use curly braces for returning values.  But if you’re going to have multiple lines in your block, then do-end still rings truer for me.)

Let’s see how we can do the same thing in Python.  First, we can return a list:

reduce(lambda output, current: output + [current], 'abc', [])

We start (using the optional final argument) with an empty list.  Then, in each iteration, we take the current element, turn it into a one-element list, and then append it to the existing list.  Note that we cannot use list.append to add the current element, because list.append returns None, like many other methods that change mutable objects in Python.

Using “reduce” to reverse the list is just as easy, by reversing the order in which we add things to the list:

reduce(lambda output, current: [current] + output, 'abc', [])

Finally, if we want to return a list of lists, we can do that fairly easily, just add a list of lists:

reduce(lambda output, current: output + [[current, current]], 'abc', [])

As you can see, “reduce” makes it possible for us to, in a single line, create a collection based on another set of inputs.  The output can contain any types that we want; in these simple examples, we created lists of strings.  But we could create lists of dictionaries, or of more complex objects, just as easily.

In the next installment, we’ll see how to create hashes/dictionaries using reduce.

 

 

Implementing “min” and “max” with “reduce”

This is the third installment of my “reduce” series of blog posts.  For the first, see here, and for the second, see here.

If you have been reading this series, then you know that “reduce” can be used to sum numbers, or to calculate scores.  In that sense, “reduce” justifies its name; we’re invoking a function repeatedly on each element of a collection, boiling the collection down to its essense, as defined by our function.

But “reduce” can also be used to to other, more interesting things.  One of them, mentioned in the Ruby documentation, isn’t an obvious candidate for beginners with “reduce”, namely implementing “min” and “max”.

In a world without functional programming or reduce, we could implement a “min” function as follows in Ruby:

def mymin(items)
  output = items[0]           # Assume the first item is the smallest
  items[1..-1].each do |item| # Iterate through all remaining items
    if item < output
      output = item           # This item is smaller, so keep it
    end
  end
  output                      # So far, this is the smallest we've seen
end

The definition, if you’re somewhat new to Ruby, works as follows: The “min” function takes an enumerable sequence of items. We assume that the first item is the smallest one, simply to make life easier for ourselves.  We then iterate over the remaining items, checking to see if the current one is smaller than the current winner.  Each iteration thus leaves “output” with its existing value, or replaces it with something smaller.

Now, the above code works just fine — but it’s a bit long and inelegant just to find the minimum value in an enumerable.  After all, it should be possible to implement the same algorithm we we used above in the implementation of “mymin”, but somehow more elegantly and concisely.

That’s where “reduce” comes in.  We saw in an earlier blog post that when we use “reduce” to sum the numbers in a collection, is sort of like saying ((((1+2)+3)+4)+5).  Well, let’s say that we had a function “smaller” that would return the smaller of two numbers.  Then, we could use “reduce” in the following way:

smaller(smaller(smaller(smaller(1,2),3),4),5)

Oh, is that hard to read?  Yeah, I thought so.  So instead, we can do the following:

items.reduce {|smallest, current| smaller(smallest, current) }

Each time “reduce” invokes our block, it keeps the smaller of its two parameters.  The first parameter, which I’ve here called “smallest”, contains the smallest value that we’ve seen so far. The second parameter, “current”, contains the current value from the collection.

That one line certainly looks nicer than our original implementation of “mymin”, I think. There’s just one problem: We haven’t defined any “smaller” method!  Fortunately, we can use Ruby’s ternary operator to have a tiny, one-line if-then-else statement that does the same thing.  In other words:

items.reduce {|smallest, current| smallest < current ? smallest : current }

The moment you realize that “reduce” can be used to hold onto only one of the previous values, you begin to see the opportunities and possibilities that it offers. For example, we can find the shortest or longest word in an array of words:

words = 'This is a sample sentence'.split
words.reduce {|shortest, current| (shortest.size < current.size) ? shortest : current }
=> "a"

words.reduce {|longest, current| (longest.size > current.size) ? longest : current }
=> "sentence"

Now, can we do the same thing in Python?  The builtin “reduce” function works similarly to Ruby’s Enumerate#reduce, as we’ve seen in previous posts I wrote on the subject.  However, the function that we pass to Python’s “reduce” is more limited than a Ruby block, especially if we use a lambda (i.e., anonymous function).

However, we do have a trick up our sleeve: Python provides an analogous version of Ruby’s ternary operator, which is meant for precisely these occasions.  I generally tell people to avoid this operator completely, because it causes more problems and confusion than necessary, and probably means that you’re doing something that you shouldn’t be in Python.  It works as follows:

>>> x = 'a'
>>> is_an_a = "yes" if x == 'a' else "no"

>>> is_an_a
'yes'

The oddest thing for me about this form of “if” in Python is that the condition is in the middle of the line.  It’s not the postfix “if” of Perl and Ruby, and it’s not the prefix “if” that everyone is used to, but something else entirely.  And as such, it’s usually a bad idea to have in your code.  But right now, it fits perfectly.

Here, then, is a version of “min” in Python, using “reduce”:

>>> items = [5,-5,10,1,100,-20,30]
>>> reduce(lambda smallest, current: smallest if (smallest < current) else current, items)
-20

And sure enough, we get the lowest value!  Next time, we’ll produce even more complex outputs, with more complex data structures.