Summary of my “reduce” series

I teach Ruby and Python to a lot of people — in formal courses, and in one-on-one pairing sessions, both online and in person.  I’ve found that for many people, the whole notion of functional programming seems strange and difficult, as well as something of a waste of time.  After all, if you have objects, why would you need functional programming?

The answer, of course, is that no single paradigm has all of the answers; each has its strengths and weaknesses.  Understanding how to use them together can provide great benefits.  As a result, I spend time when teaching both Ruby and Python on the basics of functional programming, and then the various functions and methods that each language provides in this area.

Of these, the function that most often causes people to wrinkle their noses and/or get confused is “reduce”.  And to be honest, I often tell students in my classes that “reduce” is one of those functions that is incredibly powerful and clever, but for which you sometimes need to wait in order to find a use case.  I decided to explore some use cases, and ways in which “reduce” could be used — and I hope that these have been useful.

To summarize, here are the posts that I wrote on this topic:

I hope that this series has been useful and interesting, and would appreciate hearing ideas for additional deep-dives into areas of Python, Ruby, or programming in general.  What subjects do you find confusing?  What methods do you think are somewhat useless?  Let me know, and I’ll try to address them in future blog posts!

 

Implementing “filter” with “reduce”, in Ruby and Python

We’re nearly at the end of my tour of the “reduce” function in Ruby and Python.  Just as I showed in the previous installment how we can implement the “map” function using “reduce”, I want to show how we can implement another functional-programming standard, “filter”, using “reduce” as well.  As before, I’ll show examples in both Ruby and Python.

Whereas “map” transforms each element of its input, but returns a value for each one, “filter” doesn’t change its inputs at all — but it does only selectively return them. The combination of “map” and “filter” is a classic and powerful one.  While Python still does include the “map” and “filter” functions, they are more usually expressed using list comprehensions.  The “map” part is the left-hand-side of the list comprehension, defining the transformation.  The “filter” part is the (optional) right-hand side of the comprehension, indicating when a value should be placed in the output.

Here is a use of “filter” to keep only odd numbers (taking advantage of the fact that in Python, 0 is a false value, and % is the modulus operator on integers):

>>> filter(lambda x: x%2, range(10))
[1, 3, 5, 7, 9]

It turns out that we can use “reduce” to implement our own version of “filter”, which does the same thing. Once again, we’ll define a function that takes two parameters, a sequence and a function:

def myfilter(sequence, f):
 return reduce(lambda total, current: total + ([current] if f(current) else []),
               sequence,
               [])

In other words: Our function invokes reduce, initializing it with an empty list ([]). We go over every element of “sequence”; for each element, we invoke the user-supplied function, f, passing “current” as a parameter.  We always return “total”, which is the list that we have created so far.  The question is whether we also want to return “current”, and that depends on whether f(current) returns True or False.  If it returns True, then we add “current” to the accumulated list.  If it returns False, then we add [] (i.e., the empty list) to the list. If we again want to filter a list of numbers, such that we get only the odd ones, we can now say:

>>> myfilter(range(10), lambda x: x%2)
[1, 3, 5, 7, 9]

What can be tricky for many newcomers to Python and functional programming is that the implementation of “myfilter” involves a lambda (i.e., anonymous function) in our invocation of “reduce”, but also a function (often defined as a lambda) that is passed to “myfilter” by the user.  I’ve found in my teaching that “lambda” tends to surprise and confuse many people, and two lambda expressions in the same place can really be a cognitive burden for newcomers.

Now that we’ve implemented a version of “filter” in Python, let’s turn to Ruby.  Here, as with “map”, we’re not going to pass a function (method) as a parameter. Rather, we’ll pass a block to our “myfilter” method, and then yield to the block whenever we want to invoke it.  For example:

def myfilter(enumerable)
 enumerable.reduce([]) {|total, current| yield(current) ? total << current : total }
end

(By the way, “filter” is traditionally called both “select” and “find_all” in Ruby circles.)  Notice that there are two blocks involved here: Enumerable#reduce invokes a block for each element of “enumerable” over which it iterates.  And then within that block, we yield to the block passed to “myfilter”, to check and see if the current element should be shared.

We can then invoke “myfilter” as follows:

e = (1..10).to_a
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

myfilter(e) {|n| n.even?}
=> [2, 4, 6, 8, 10]

Sure enough, we’ve managed to create our own version of “filter”, just as we were able to do for “map”.

Next time: The exciting conclusion of our tour of the “reduce” function.  Same reduce-time, same reduce-channel!

Implementing “map” with “reduce”, in Ruby and Python

This is another installment in my “reduce” series of posts, aimed at helping programmers understand this function, with examples in both Ruby and Python.  So far, we have seen how we can build a number of different types of data structures — integers, strings, arrays, and hashes — using “reduce”.  But the really interesting use of “reduce,” in my mind, is as a way to implement “map” and “filter,” two classic functions from the functional-programming world.  In this post, I show how to implement “map” in both Ruby and Python.

First, let’s remember what “map” does: It takes a sequence, and applies a function to that sequence, one element at a time. The output of applying that function on each element produces a sequence of values, which we then get back from “map”.   In Ruby, we can thus say:

[1,2,3,4,5].map {|n| n*n}

which produces the array:

[1,4,9,16,25]

In Ruby, any block that you want to pass is fair game.  The block takes a single parameter, the current element of the enumerable that should be transformed. Whatever the block returns is put into the output array; because nearly everything in Ruby is an expression, just about anything is fair game.  Moreover, the output array will always consist of the same number of elements the input array.  So if you run Enumerable#map on an array of 15 items, you can be sure that you’ll get a new, anonymous array of 15 items.  The value of each of these items will depend on what the block emits.

If we want to implement “map” using “reduce,” we know that we need to invoke a block once per element of the enumerable.  The block should add one element to the end of an array that we have created.  The element that we add to the array should be the result of invoking a method on the current element.

For example, if we want to use “reduce” to implement the “map” example from above, in which we get the squares of the input numbers, we could do the following:

[1,2,3,4,5].reduce([]) {|total, current| total << current * current}

and sure enough, we get the following output:

[1, 4, 9, 16, 25]

If squaring numbers were the only thing we were interested in doing, we could write a “mymap” method that would do that:

def mymap(enumerable)
  enumerable.reduce([]) {|total, current| total << current * current}
end

mymap([1,2,3,4,5])

And sure enough, we get the array of squared numbers.  But as you can imagine, this implementation of map is somewhat lacking… we would really like to be able to pass a block, just as we can do with the built-in Enumerable#map method.  In Ruby, we can do this very easily by passing a block to our method, and then yielding to it (i.e., invoking it) once per iteration within the “reduce”.  For example:

def mymap(enumerable)
  enumerable.reduce([]) {|total, current| total << yield(current)}
end

mymap([1,2,3,4,5]) {|n| n*n}
=> [1, 4, 9, 16, 25]

mymap([1,2,3,4,5]) {|n| (n * 30).to_s}
=> ["30", "60", "90", "120", "150"]

Not bad!  Using “reduce”, we’ve managed to implement a version of “map”.  That’s pretty snazzy, if you ask me.

We can do almost exactly the same thing in Python.  Python doesn’t have Ruby-style blocks, but it does have functions — both named functions and anonymous ones created with “lambda” — that can be passed easily as parameters, and then invoked using parentheses. The Python version of “mymap” would thus look something like this:

def mymap(sequence, f):
  return reduce(lambda total, current: total + [f(current)], sequence, [])

In other words, our function takes two parameters, a sequence and a function. We then build the resulting list, one element at a time, based on the output from invoking our “f” function on each element in sequence.  It turns out that “map” is not that different from our use of “reduce” to build lists; the only difference, in fact, is that we’re letting the user pass us a function, which we then invoke on the current item.

It turns out, then, that implementing “map” isn’t so hard or mysterious!  In the next installment of this series, we’ll see how to implement the “filter” function (called “select” or “find_all” in Ruby), another staple of functional programming.

Creating Ruby hashes with “reduce”

In yesterday’s post, I showed how we can use Python’s “reduce” function to create a dictionary.  Ruby, of course, also has dictionaries, but calls them “hashes.”  In this posting, I’m going to show how we can create a hash in Ruby when iterating over an enumerable.  In so doing, we’ll see how we can use the “reduce” method to create interesting hashes, building them up one step at a time.

Let’s start with the simplest and easiest solutions, which will be quite similar to what we did with Python: We’ll take an array of arrays, and turn that into a hash:

[['a', 1], ['b', 2]].to_h

Oh, wait — you’re not using Ruby 2.1?  Too bad; that version added Array#to_h, a method that creates key-value pairs from nested arrays.  The idea is that each inner array should have two elements, the first being a key and the second being a value.  So the above call to Array#to_h will result in:

{"a"=>1, "b"=>2}

Of course, we often want to use symbols for our hash keys, rather than strings.  But let’s ignore that for today, and just concentrate on how we can create hashes, rather than quibble over key types.

If you’re using a version of Ruby earlier than 2.1, and thus don’t have access to Array#to_h, then you can always use Hash[],

Hash['a', 1, 'b', 2]
=> {"a"=>1, "b"=>2}

Note that whereas Array#to_h expected to get an array of arrays, Hash[] expects to get a single array between the [ ].  So if we’re interested in using the “reduce” method to build up a hash, we can do it as follows:

Hash[%w(a b c).reduce([]) {|total, current| total << [current, current.ord]}]

In other words, we create an array consisting of three strings (‘a’, ‘b’, and ‘c’), by creating an array of arrays, and then turning that into a hash.  But of course, hashes (like all data structures in Ruby) are mutable, and (more importantly) methods that modify data structures often return the object on which they worked.  So we can actually build up our hash in a more direct way:

%w(a b c).reduce({}) {|total, current| total.update(current => current.ord)}

Here, we once again create an array of “a”, “b”, and “c”.  Then we initialize our call to “reduce” with an empty hash.  Then we reduce, with each call invoking Hash#update.  Hash#update not only merges the parameter’s value (a hash) into “total”, but returns the resulting hash.  Thus, with each invocation, “total” is overwritten with the new hash, to which we have added a new pair.

Of course, this example is a pretty trivial one; you can imagine that instead of invoking current.ord, that you create a new object based on the parameter, or that you do a calculation based on it instead.  The bottom line, though, is that creating hashes incrementally in Ruby is pretty easy to do.  Moreover, if you find that a single-line block is not enough space in which to do all of the calculations you need, you can always use a do-end style of block, which will let you have as many lines as you want.

Next time: Implementing “map” and “filter” with “reduce”!  If you understand those, then you’ve totally internalized the power (and dare I say it, fun) of this method.

 

 

 

Creating Python dictionaries with “reduce”

In the last few installments (first, second, third, and fourth) of this series, we saw how “reduce” can be used to build up a scalar value (such as a number or string), or even a simple collection, such as a list (in Python) or an array (in Ruby).  The jewel in the data-structure crown for these high-level languages is known by many names: Dictionary, hash, hash table, hash map, and mapping.  I tend to use these quite a bit, and I know that I’m not alone; hashes are easy to use and work with, have O(1) lookup characteristics, guarantee the uniqueness of their keys, and make the code self-documenting.  What’s not to like?

There are times when you might want to build a dictionary step by step, and “reduce” can help you to do that.  I should note that recent versions of Python offer “dictionary comprehensions,” which are one of my favorite features of the language, and can be used similarly to “reduce” — and probably with less confusion among new programmers.  Nevertheless, I find it interesting and instructive to see how we can use “reduce” to create a data structure that we wouldn’t normally associate with this function.

I should note that some of the things you’re going to see here are not really recommended coding practices, particularly in Python.  But they will be fun, which is important, no?

Let’s start with Python: If I want to create a dictionary using “reduce”, I have a few different options.  Perhaps the first and easiest one is not exactly what you might have had in mind, namely passing the “dict” function (which creates a new instance of “dict” — i.e., a new dictionary) a list of two-element tuples.  For example, I can say:

dict([('a', 1), ('b', 2)])

and I get:

{'a': 1, 'b': 2}

So if I can tell “reduce” to emit a list of tuples, I can then pass that list of tuples to “dict”, and get a dictionary back.  Let’s try that:

reduce(lambda output, current: output + [(current, ord(current))], 'abc', [])

In the above case (and in all of the examples we’ll use here), I’m trying to build an oh-so-useful dictionary in which the keys are the letters a, b, and c, and the values are the ASCII codes for those letters.  In the above Python code, I iterate over the string ‘abc’, which is a three-element sequence of those three letters.  For each letter, I return the current value of “output” (which is guaranteed to be a list), plus a new, single-element list.  That single-element list is a tuple consisting of the current letter and its ASCII code.  So the output of the above “reduce” call will be:

[('a', 97), ('b', 98), ('c', 99)]

which, when we feed it into dict():

>>> dict([('a', 97), ('b', 98), ('c', 99)])
{'a': 97, 'b': 98, 'c': 99}

Voila!  We’ve created a dictionary.  (By the way, the “dict.items” method, which is often used to iterate over the keys and values of a dictionary, returns a list of tuples in precisely this format.)

So this is nice, but perhaps there’s a way to have “reduce” build up a dictionary all by itself?  The answer is “yes, but” — because Python tries hard to stop us from modifying data structures within a lambda.  But if we’re willing to be a bit weird, we can get around that.  How? By taking the dictionary that we received from the previous iteration (i.e., “output”), turning it into a list of tuples with dict.items, adding that list to our current pair, and then turning the whole thing back into a dictionary.

reduce(lambda output, current: dict(output.items() + 
                     [(current, ord(current))]), 'abc', {})

Note that while this is unwieldy, it doesn’t violate one of the key tenets of functional programming, namely treating data as immutable.  However, putting this sort of code in a production system will lead to hatred from your colleagues, job security, or (if you’re really lucky) both.  Oh, and it’s probably rather inefficient as well, although I’m not crazy enough to actually benchmark it.

That said, consider what we’ve done here: We create a new dictionary with each iteration, passing it to “current”.  The new dictionary is created by taking the old one, breaking it into a list of tuples, adding to it a new tuple, and then turning it all into a dictionary again.

There is at least one other way to do this, but it’s going to make the above code seem like beautiful, classic Python in comparison: Remember that we cannot assign to a dictionary within a lambda.  However, we can invoke methods, including the dict.update method, which lets us merge one dictionary into another.  The thing is. dict.update, like most methods in Python that modify data structures, returns None.  If we’re willing to take some risks (and why stop now), we can use the following code to modify our existing dictionary and then return it to the next iteration:

reduce(lambda d, current: d.update({current : ord(current)}) or d, 'abc', {})

The above code takes advantage of the fact that “and” and “or” are short-circuit operators.  Thus, we first invoke d.update on our dictionary.  That returns None.  Our “or” operator then says, “Well, I’d better go to the second argument, because the first one returned a false-y value.  Sure enough, our dictionary — because it isn’t empty — is a true value, which is what the statement returns.  And sure enough, we get our dictionary.

A final way to do this would be to hide the assignment of the dictionary behind an external function.  That is, we have a function do the dirty work for us.  That’ll certainly work, although I see it as somewhat less fun than lambdas and trying to work around their limitations.

def update_and_return(d, new_key, new_value):
    d.update({new_key: new_value})
    return d

>>> reduce(lambda output, current: update_and_return(output, current, ord(current)), 'abc', {})

{'a': 97, 'b': 98, 'c': 99}

Fun stuff!  And, perhaps, the worst method ever for getting an ASCII table into a dictionary.

Next time, we’ll see how to do this with Ruby.

Creating collections with “reduce”

In the first few parts of this series (first, second, and third), I introduced the “reduce” function, and showed how it can be used in a number of ways. However, in all of the examples we have seen so far, the output from our invocations of “reduce” were integers or strings. If we reduce with a “sum” function, then we get an integer representing the sum of all integers.  If we reduce over a measurement, such as the size of an integer or the length of a string, we can implement “min” or “max”.

Another interesting and clever thing that we can do with “reduce” is to create an array (in Ruby) or list (in Python).  Think of it this way: The output of each iteration of “reduce”, which is then passed to the next iteration, can be any data structure we wish. If the output from the iteration is an array, then that’s what will be passed to the next iteration.

In other words, we can use “reduce” to build up an array, with each iteration emitting that same array, perhaps with additional elements added to it.  For example, I can create a new array whose contents are identical to the original one on which “reduce” was invoked:

['a', 'b', 'c'].reduce([]) {|output, current| output << current }

By adding each element to the beginning, I can reverse the array:

['a', 'b', 'c'].reduce([]) {|output, current| output.unshift(current) }

I can create a nested array, too:

['a', 'b', 'c'].reduce([]) {|output, current| output << [current, current] }

“reduce” is thus a great way to create an array, building it up piece by piece.  True, we need to pass “reduce” an empty array as a parameter, and then each iteration needs to return the current value of the array.  But other than that, we can add whatever we want to the array, and modify it in whatever ways we want.  If I want to do something more complex than the above, or if I want to be able to debug individual lines of my block, I can even use do-end, thus spreading the block across multiple lines.  (And yes, I know about the late Jim Weirich’s great suggestion, to use curly braces for returning values.  But if you’re going to have multiple lines in your block, then do-end still rings truer for me.)

Let’s see how we can do the same thing in Python.  First, we can return a list:

reduce(lambda output, current: output + [current], 'abc', [])

We start (using the optional final argument) with an empty list.  Then, in each iteration, we take the current element, turn it into a one-element list, and then append it to the existing list.  Note that we cannot use list.append to add the current element, because list.append returns None, like many other methods that change mutable objects in Python.

Using “reduce” to reverse the list is just as easy, by reversing the order in which we add things to the list:

reduce(lambda output, current: [current] + output, 'abc', [])

Finally, if we want to return a list of lists, we can do that fairly easily, just add a list of lists:

reduce(lambda output, current: output + [[current, current]], 'abc', [])

As you can see, “reduce” makes it possible for us to, in a single line, create a collection based on another set of inputs.  The output can contain any types that we want; in these simple examples, we created lists of strings.  But we could create lists of dictionaries, or of more complex objects, just as easily.

In the next installment, we’ll see how to create hashes/dictionaries using reduce.

 

 

Implementing “min” and “max” with “reduce”

This is the third installment of my “reduce” series of blog posts.  For the first, see here, and for the second, see here.

If you have been reading this series, then you know that “reduce” can be used to sum numbers, or to calculate scores.  In that sense, “reduce” justifies its name; we’re invoking a function repeatedly on each element of a collection, boiling the collection down to its essense, as defined by our function.

But “reduce” can also be used to to other, more interesting things.  One of them, mentioned in the Ruby documentation, isn’t an obvious candidate for beginners with “reduce”, namely implementing “min” and “max”.

In a world without functional programming or reduce, we could implement a “min” function as follows in Ruby:

def mymin(items)
  output = items[0]           # Assume the first item is the smallest
  items[1..-1].each do |item| # Iterate through all remaining items
    if item < output
      output = item           # This item is smaller, so keep it
    end
  end
  output                      # So far, this is the smallest we've seen
end

The definition, if you’re somewhat new to Ruby, works as follows: The “min” function takes an enumerable sequence of items. We assume that the first item is the smallest one, simply to make life easier for ourselves.  We then iterate over the remaining items, checking to see if the current one is smaller than the current winner.  Each iteration thus leaves “output” with its existing value, or replaces it with something smaller.

Now, the above code works just fine — but it’s a bit long and inelegant just to find the minimum value in an enumerable.  After all, it should be possible to implement the same algorithm we we used above in the implementation of “mymin”, but somehow more elegantly and concisely.

That’s where “reduce” comes in.  We saw in an earlier blog post that when we use “reduce” to sum the numbers in a collection, is sort of like saying ((((1+2)+3)+4)+5).  Well, let’s say that we had a function “smaller” that would return the smaller of two numbers.  Then, we could use “reduce” in the following way:

smaller(smaller(smaller(smaller(1,2),3),4),5)

Oh, is that hard to read?  Yeah, I thought so.  So instead, we can do the following:

items.reduce {|smallest, current| smaller(smallest, current) }

Each time “reduce” invokes our block, it keeps the smaller of its two parameters.  The first parameter, which I’ve here called “smallest”, contains the smallest value that we’ve seen so far. The second parameter, “current”, contains the current value from the collection.

That one line certainly looks nicer than our original implementation of “mymin”, I think. There’s just one problem: We haven’t defined any “smaller” method!  Fortunately, we can use Ruby’s ternary operator to have a tiny, one-line if-then-else statement that does the same thing.  In other words:

items.reduce {|smallest, current| smallest < current ? smallest : current }

The moment you realize that “reduce” can be used to hold onto only one of the previous values, you begin to see the opportunities and possibilities that it offers. For example, we can find the shortest or longest word in an array of words:

words = 'This is a sample sentence'.split
words.reduce {|shortest, current| (shortest.size < current.size) ? shortest : current }
=> "a"

words.reduce {|longest, current| (longest.size > current.size) ? longest : current }
=> "sentence"

Now, can we do the same thing in Python?  The builtin “reduce” function works similarly to Ruby’s Enumerate#reduce, as we’ve seen in previous posts I wrote on the subject.  However, the function that we pass to Python’s “reduce” is more limited than a Ruby block, especially if we use a lambda (i.e., anonymous function).

However, we do have a trick up our sleeve: Python provides an analogous version of Ruby’s ternary operator, which is meant for precisely these occasions.  I generally tell people to avoid this operator completely, because it causes more problems and confusion than necessary, and probably means that you’re doing something that you shouldn’t be in Python.  It works as follows:

>>> x = 'a'
>>> is_an_a = "yes" if x == 'a' else "no"

>>> is_an_a
'yes'

The oddest thing for me about this form of “if” in Python is that the condition is in the middle of the line.  It’s not the postfix “if” of Perl and Ruby, and it’s not the prefix “if” that everyone is used to, but something else entirely.  And as such, it’s usually a bad idea to have in your code.  But right now, it fits perfectly.

Here, then, is a version of “min” in Python, using “reduce”:

>>> items = [5,-5,10,1,100,-20,30]
>>> reduce(lambda smallest, current: smallest if (smallest < current) else current, items)
-20

And sure enough, we get the lowest value!  Next time, we’ll produce even more complex outputs, with more complex data structures.

Calculating Scrabble scores with “reduce”

This is the second installment of my series of blog posts on the “reduce” function/method.  For an introduction, see here.

I love to play Scrabble — or more commonly nowadays, I play Words with Friends on my phone. (I often say that the game should instead be called, “Words with people who used to be your friends,” given the competition that can result.)  The game, if you’re not familiar with it, is simple enough to understand: You choose seven English letters at a time from a bag or pool of letters.  Each letter has a different point value.  Your aim is to create, with your seven letters, the highest-scoring word that you can in each turn.

Fine, it’s a bit more complex than that: Your word needs to hook onto an existing word or letter, and you can get bonus points by completing more than one word, and there are squares which multiply your score, by doubling or tripling the value of an individual letter or the entire word.  I’m going to ignore all of these rules for now.

Let’s say that I want to write a Ruby program that will calculate the score of a given word in Scrabble. How can I do that?

Before I can begin, I’m going to need to need a list of the points that you get for each letter.  Because we’re dealing with a mapping between two values — a letter and its points — the easiest solution would be a hash:

points = {'a'=> 1, 'b'=> 3, 'c'=> 3, 'd'=> 2, 'e'=> 1, 'f'=> 4, 'g'=> 2, 'h'=>
 4, 'i'=> 1, 'j'=> 8, 'k'=> 5, 'l'=> 1, 'm'=> 3, 'n'=> 1, 'o'=> 1, 'p'=> 3,
 'q'=> 10, 'r'=> 1, 's'=> 1, 't'=> 1, 'u'=> 1, 'v'=> 4, 'w'=> 4, 'x'=> 8, 'y'=>
 4, 'z'=> 10}

Once we’ve defined our “points” hash, we can start calculating the number of points that we’ll get for a given word.  If you aren’t familiar with “reduce”, but have been programming in Ruby for at least a short while, you’ll probably realize that this is easily accomplished with a loop.  That is, we can iterate over each letter in the string, getting the point value for that letter, and adding it to the total.  For example, we could do something like this:

word = 'encyclopedia'
total = 0 
word.each_char do |letter|
  total += points[letter]
end

After the above loop, we’ll find that the total score is 22 points.  That’s fine, and it’ll work… but by using “reduce”, we can do the same thing in a single line:

word.split('').reduce(0) {|total, current| total + points[current]}

Now, how does the above work? First, it takes the string “word”, and turns it into an array, in which each letter is an element.  Then, we use “reduce” to calculate a score: Since it’s a numeric score, we’ll initialize the total with 0.  Then, for each letter in our word, we add the number of points associated with that word to our total.

To do the same thing in Python will require surprisingly similar code. First, we create a dictionary:

points = {'a': 1, 'b': 3, 'c': 3, 'd': 2, 'e': 1, 'f': 4, 'g': 2, 'h':
 4, 'i': 1, 'j': 8, 'k': 5, 'l': 1, 'm': 3, 'n': 1, 'o': 1, 'p': 3,
 'q': 10, 'r': 1, 's': 1, 't': 1, 'u': 1, 'v': 4, 'w': 4, 'x': 8, 'y':
 4, 'z': 10}

Then we run reduce in a similar way. Because strings in Python are sequences, we don’t need to turn our word into an an array before iterating over it; we can invoke “reduce” directly on the string:

>>> word = 'encyclopedia'
>>> reduce(lambda total, current: total + points[current], word, 0)
22

This is a simple demonstration of where “reduce” can be useful. I’m looking to apply the same operation on the elements of a sequence of data, adding the latest value to the total.  When I get to the end, I’d like to see what has accumulated so far.

In the next installment, we’ll see how we can use “reduce” to implement some other functions that we wouldn’t normally associate with functional programming, let alone with something called “reduce”.

 

Understanding “reduce” (first in a series)

One of the notable things about MIT’s computer science curriculum, at least back when I was studying there, was that you didn’t learn any “practical” programming languages.  Our work was all done in either Scheme (a dialect of Lisp) or in CLU (an early object-oriented language).  I can’t say that I have too many memories, let alone fond ones, of CLU.  But I definitely drank the Kool-Aid about Lisp, and have long believed that it has always represented the pinnacle of programming languages.  Things that I learned long ago in Lisp are only now becoming standard in popular languages.

For example, functional programming has become increasingly popular in the last few years, for a variety of reasons including the shrinking effects of Moore’s Law — and the resulting need to have multiple, immutable copies of your data in the computer, distributed across multiple processors.  Lisp has long had many functional capabilities; it was Lisp that first introduced me to such functions as “map”, which I use multiple times each day.  However, my regular uses of “map” are rarely in Lisp; rather, they’re in Python and Ruby, the languages that I use most in my day-to-day work.

When I teach classes in Ruby and Python, I spend a fair amount of time talking about functional programming techniques.  It might sound funny to discuss functional programming in two languages that are so clearly object-oriented, but I actually find it quite natural.  Sure, I create classes and throw objects around.  But if I have an array of values, then it’s very fast and easy for me to process them with functional techniques.  Python’s list comprehensions have basically taken the place of the “map” and “filter” functions, so while those exist, they’re not as necessary any more.  But once you understand such functions as “map” and “filter”, you’re poised to do all sorts of amazing things.

Perhaps the most intriguing of the functions from this school is “reduce”.  I have found, consistently over time, that “reduce” is the function most likely to surprise and confuse newcomers to this type of programming.  That’s because it’s easy to confuse what “reduce” is doing, and to forget how flexible its output can be.

I’ve thus decided to write a series of blog posts about “reduce”, in both Python and Ruby.  I’ll start with the simple stuff, and then move ahead with increasingly complex tasks.  You won’t necessarily start to use “reduce” all of the time, but if you’re like me, you will find all sorts of interesting uses for it.  I tell people that I tend to use “reduce” about once every six months — but when I do use it, it really saves the day.

In Ruby, we can use the “reduce” method (also available as “inject”, to satisfy people from the Smalltalk world) on any enumerable object.  The invocation looks like this:

[1,2,3,4,5].reduce(0) {|a, b| a+b}

I’ll explain what this all means in a moment.  But the most important change that I can already make is to use better names for the block parameters:

[1,2,3,4,5].reduce(0) {|total, current| total+current}

Ruby goes through each element of the enumerable, invoking the block on each element. Described in this way, we might confuse “reduce” with “map”.  However, in “map”, the output is an array of the same length as the input.  By contrast, with “reduce”, the output of each iteration is remembered for the next time around.  That is, the value of “total” in each iteration is the block’s result from the last iteration.  

The initial value of “total”, in the first iteration, is the parameter value that we pass, which is 0 in this case.  If you don’t pass a parameter, then “total” is initialized with the first element of the enumerable, and the first iteration (i.e., the first application of the block) takes place on the second element.  This is fine in the above example, but depending on the output you want, failing to pass a parameter, or passing one of the wrong type, can make a big difference.

You can also think of “reduce” as a sort of “join”, but one that evaluates its inputs and operations.  So instead of getting the string “1+2+3+4+5”, you get the result of actually invoking 1+2, and then (1+2)+3, and then ((1+2)+3)+4, and then finally (((1+2)+3)+4)+5. This is why some Lisp versions call this “fold”, rather than “reduce”.  I still don’t quite get why Smalltalk people call it “inject”, and thus never use that term in Ruby (except when introducing the five rhyming functional methods — select, detect, collect, inject, and reject — because it’s so much fun to say). But the effect, once you internalize it, can be used in many interesting ways.

Python doesn’t have a “reduce” method on sequences, but does have a builtin “reduce” function that can be invoked on sequences.  (In Python 3, the “reduce” function was moved to the “functools” module — which is better than the fate Guido had originally planned for it.)  Python’s “reduce” is similar to the one in Ruby.  To sum numbers, we say:

>>> numbers = range(10)
>>> reduce(lambda total, current: total + current, numbers)
45

As you can see (I hope), the use of “lambda” to create an anonymous function is analogous to the use of a block in Ruby.  In Python’s “reduce” function, you first pass the function you wish to invoke on the sequence, and then the sequence itself.  If you wish to pass an initial value for “total”, you can do so with an optional third parameter:

>>> reduce(lambda total, current: total + current, numbers, 10)
55

The classic use of “reduce” is to sum integers, as we saw above.  But we can, of course, perform additional types of operations, and produce additional types of output.  And to be honest, that’s where “reduce” starts to get more intriguing.  Any operation that you want to perform on an enumerable, such as an array, set, or range, and apply cumulatively to its elements, makes for a good choice for “reduce”. By experimenting with the input enumerable, the initial value of “total” that we pass as a parameter, and the block, we can do many interesting things.

In coming posts, I’ll explore some more of these ideas, and give you a tour of “reduce” in both Ruby and Python that will hopefully open your eyes to some of them, and give you a sense of where “reduce” can help to improve your thinking, and your code.

If you build it, they will come — but they might hate you

Several months ago, I was teaching an introductory Python course, and I happened to mention the fact that I use Git for all of my version-control needs.  I think that I would have gotten a more positive response if I had told them that my hobby is kicking puppies.

The reactions were roughly — and I’m not exaggerating here — something like, “What?  You use Git?!?  That so-called version control system whose main feature is eating our files?!?”   And I got this not just from one person, but from all 20-something people who were taking my Python course.  The more experience they had with Git, the more violently negative their reactions were.

I managed to calm them down a bit, and tried to tell them that Git is a wonderful system, except for one little problem, namely the fact that its interface is very hard to understand.  But, I promised them, once you understand how Git works, and once you start to work with it within the context of understanding what it’s doing, things start to make sense, and you can really enjoy and appreciate the system.

I should note that since that Python class, I’ve returned to the same company to give two day-long Git classes.  Based on the feedback I received, the Git class was very helpful, and I’m guessing that this is because I concentrated on what Git is really doing, and how the commands map to those actions.  I’m pretty sure that people from that class are starting to appreciate the power and flexibility of Git, rather than focusing only on their frustrations with it.

However, my experience working with and teaching Git have taught me a great deal about designing both software and UIs.  We love to say and think that excellent products with terrible marketing never get anywhere.  And in the commercial world, that might well be true. Everyone loves to quote the movie “Field of Dreams” (which I never really liked anyway), and how the main character builds a baseball field after repeatedly hearing, “If you build it, they will come.” As numerous other people have said, this is not the case for businesses: If you build it, they probably won’t come, unless you’ve invested time and money in marketing your product. 

However, in the open-source world,  we expect to invest time in learning a technology, and are generally more technical folks in any event.  Thus, we tend to be more forgiving of bad UIs, focusing on features rather than design. It’s thus possible for something brilliant, efficient, flexible, and profoundly frustrating for new users to become popular. Git is a perfect example of this.

Now, I happen to think that Git is one of the most brilliant pieces of software I’ve ever seen. Really, it’s impressively designed.  However, the commands are counter-intuitive for many people who used other version-control systems, and it’s possible to get yourself into a situation from which an expert can extract himself or herself, but in which a novice is completely befuddled.  Once you understand how Git works (brilliantly described in this video), things start to make sense.  But getting to that point can take a great deal of time, and not everyone has that time.

In open source, then, “If you build it, they will come” might sometimes work.  However, even if they do come, and even if they use the software that you have written, you might end up in a particularly unenviable situation: People will use the software, but will hate you for the way in which you designed it.

The upshot, then, is that it’s worth taking a bit of time to think about your users, and how they will use your system.  It’s worth taking the time to create an interface (including commands) that will make sense for people.  Look at WordPress, for example: It packs in a great deal of functionality, but also pays attention to the UI… and as a result, has become a hugely dominant part of the Web ecosystem.

Sure, Git is famous and popular, and I’m one of its biggest fans, at least in terms of functionality. But if Linus had spent just a bit more time thinking about command names, or behaviors, I think that we would have had an equally powerful tool, but with fewer people in need of courses to understand why their files are getting trampled.