The easiest way to return to the last Git branch

I don’t know about you, but it’s common for me to switch between branches in Git.  After all, that’s one of the main advantages of using Git — the incredible ease with which you can create and merge branches.

Just a few minutes ago, I was in the “adwords” branch of an application I’m working on.  I wanted to go back to “master”, make sure that I hadn’t missed any commits from the central repository, and then go back to “adwords”.  If there were any commits in “master” that I was missing in “adwords”, I figured that I would just rebase from “master” to “adwords”.

So I did a “git checkout master”, and found that I was up to date with the central Git server. For reasons that I can’t explain, I then decided to try something out: Instead of returning to the previous branch with “git checkout adwords”, I instead typed

git checkout -

(That’s a minus sign after the word “checkout.”)

Sure enough, I returned to the “adwords” branch!

Now, there is a fair amount of logic to this: In the Unix shell, you can typically return to the previous directory with “cd -“, which has proven to be quite useful over the years.  In Git, of course, branches are just aliases to commits.  So “git checkout -” is returning you to the previous branch, but it’s really just taking you back to whatever the last commit was that you worked on.

I just checked the Git documentation (“git checkout –help”), and it seems that this is a special case of a more generalizable command structure:

As a special case, the "@{-N}" syntax for the N-th last branch/commit checks out 
branches (instead of detaching). You may also specify - which is synonymous
with "@{-1}".

I can’t imagine wanting to tell Git to return to the 5th-most-recent branch that I worked on, so this generalized formula seems a bit much to me.

I predict that this trick will save me precious seconds every day, all of which I squandered in writing this blog post.  But I do think that this is a super-cool trick and feature, and demonstrates once again how clever and useful Git is.

Control-R, my favorite Unix shell command

If you use a modern, open-source Unix shell — and by that, I basically mean either bash or zsh — then you really should know this shortcut.  Control-R is probably the shell command (or keystroke, to be technical about it) that I use most often, since it lets me search through my command history.

Let’s start with the basics: When you use bash or zsh, your commands are saved into a history, typically put in the environment variable HISTFILE.  I use zsh (thanks to oh-my-zsh), and it puts my HISTFILE in ~/.zsh_history.  How many commands does it store?  That depends on the value of the environment variable HISTSIZE, which in my case is 10,000.  Yes, I store the 10,000 last commands that I entered into my shell.

Now, before control-R, there were a bunch of ways to search through and use the history.  Each command has its own number, and thus if you want to replay command 5329, you can do so by typing

!5329

But this requires that you keep track of the numbers, and while I used to do that, I found it to be more annoying than useful.  What I really wanted was just to repeat a command … you know, the last time I ssh’ed into a server, or something.  So yeah, you can do

!?ssh

and you’ll get the most recent “ssh” command that you entered.  But what if you have used ssh lots of times, to lots of servers?  You could start to search for the server name, but then things start to get complicated, messy, and annoying.

What control-R does is search backwards through HISTFILE, looking for a match for what you have entered until now.  If you use Emacs, then this will make perfect sense to you, since control-R is the reverse version of control-S in Emacs.  If you don’t know Emacs, then it’s a crying shame — but I’ll still be your friend, don’t worry.

Let’s say you have ssh’ed into five different servers today, and you want to ssh again into the third server of the bunch.  You type control-R, which puts you into bck-i-search (i.e., “backward incremental search”) mode.  Now type “s” (without enter).  The most recent command that you entered, which contains an “s”, will appear.  Now type another “s” (again, without pressing enter).  The most recent command containing two “s” characters in a row will appear.  Depending on your shell and configuration, the matching text might even be highlighted.

Now enter “h”.   In my case, I got to the most recent call to “ssh” that I made in my shell.  But I don’t want this last (fifth) one; I want the third one.  So I enter control-R again, and then again.  Now I’m at the third time (out of five) that I used ssh today, at the command I want.  I press “enter”, and I’ve now executed the command.

While searching backward, if you miss something because you hit control-R one too many times, you can use control-S to search forward.  You can use the “delete” key to remove characters, one at a time, from the search string.  And you can use “enter”, as described above, to end the search.  I should also note that I’ve modified my zsh prompts such that the matched text in control-R is highlighted, which has made it even more useful to me.

So, when was the last time I entered the full “ssh” command into a client’s server? I dunno, but it was a while ago… since the odds are that within the 10,000 most recent commands, I’ve got a mention of that client’s server.  And if I needed to pass specific options to ssh, such as a port number or a certificate file to get into AWS, that’ll be in the history, too.  By combining a huge history with control-R, you can basically write each command once, and then refer back to it many times.

Now the fact is that control-R isn’t really part of bash or zsh, per se.  Rather, it’s part of a GNU library called “readline” that is used in a large number of programs.  For example, it’s used in IPython, Pry, and the psql command-line client for PostgreSQL.  Everywhere I go, I can use control-R — and I do!  Each program saves its own history, so there’s no danger of mixing shell commands with PostgreSQL queries.

 

If you build it, they will come — but they might hate you

Several months ago, I was teaching an introductory Python course, and I happened to mention the fact that I use Git for all of my version-control needs.  I think that I would have gotten a more positive response if I had told them that my hobby is kicking puppies.

The reactions were roughly — and I’m not exaggerating here — something like, “What?  You use Git?!?  That so-called version control system whose main feature is eating our files?!?”   And I got this not just from one person, but from all 20-something people who were taking my Python course.  The more experience they had with Git, the more violently negative their reactions were.

I managed to calm them down a bit, and tried to tell them that Git is a wonderful system, except for one little problem, namely the fact that its interface is very hard to understand.  But, I promised them, once you understand how Git works, and once you start to work with it within the context of understanding what it’s doing, things start to make sense, and you can really enjoy and appreciate the system.

I should note that since that Python class, I’ve returned to the same company to give two day-long Git classes.  Based on the feedback I received, the Git class was very helpful, and I’m guessing that this is because I concentrated on what Git is really doing, and how the commands map to those actions.  I’m pretty sure that people from that class are starting to appreciate the power and flexibility of Git, rather than focusing only on their frustrations with it.

However, my experience working with and teaching Git have taught me a great deal about designing both software and UIs.  We love to say and think that excellent products with terrible marketing never get anywhere.  And in the commercial world, that might well be true. Everyone loves to quote the movie “Field of Dreams” (which I never really liked anyway), and how the main character builds a baseball field after repeatedly hearing, “If you build it, they will come.” As numerous other people have said, this is not the case for businesses: If you build it, they probably won’t come, unless you’ve invested time and money in marketing your product. 

However, in the open-source world,  we expect to invest time in learning a technology, and are generally more technical folks in any event.  Thus, we tend to be more forgiving of bad UIs, focusing on features rather than design. It’s thus possible for something brilliant, efficient, flexible, and profoundly frustrating for new users to become popular. Git is a perfect example of this.

Now, I happen to think that Git is one of the most brilliant pieces of software I’ve ever seen. Really, it’s impressively designed.  However, the commands are counter-intuitive for many people who used other version-control systems, and it’s possible to get yourself into a situation from which an expert can extract himself or herself, but in which a novice is completely befuddled.  Once you understand how Git works (brilliantly described in this video), things start to make sense.  But getting to that point can take a great deal of time, and not everyone has that time.

In open source, then, “If you build it, they will come” might sometimes work.  However, even if they do come, and even if they use the software that you have written, you might end up in a particularly unenviable situation: People will use the software, but will hate you for the way in which you designed it.

The upshot, then, is that it’s worth taking a bit of time to think about your users, and how they will use your system.  It’s worth taking the time to create an interface (including commands) that will make sense for people.  Look at WordPress, for example: It packs in a great deal of functionality, but also pays attention to the UI… and as a result, has become a hugely dominant part of the Web ecosystem.

Sure, Git is famous and popular, and I’m one of its biggest fans, at least in terms of functionality. But if Linus had spent just a bit more time thinking about command names, or behaviors, I think that we would have had an equally powerful tool, but with fewer people in need of courses to understand why their files are getting trampled.

Good intentions, unexpected results: Mailing lists and DMARC

If there’s anything that software people know, it’s that changing one part of a program can result in a change in a seemingly unrelated part of the program.  That’s why automated testing is so powerful; it can show you when you have made a mistake that you not only didn’t intend, but that you didn’t expect.

If unexpected results can happen in a system that you control and supposedly understand, it’s not hard to imagine what happens when the results of your changes involve many pieces of software other than yours, running on computers other than yours, being used by customers who aren’t yours.

This would appear to be the situation with one of the latest anti-spam and security features for e-mail, known as DMARC.

I’m not intimately familiar with this standard, but I’ve seen other standards relating to e-mail in the past to know that anything having to do with e-mail will be frustrating for some of the people involved.  E-mail is in use by so many people, on so many computers, and by so many different programs, that you can’t possibly make changes without someone getting upset.  Nevertheless, the DMARC implementation and rollout by a number of large e-mail providers over the last few weeks has been causing trouble.

Let me explain: DMARC promises, to some degree, to reduce the amount of spam that we get by verifying that the sender’s e-mail address (in the “From” field) matches the server from which the e-mail was sent.  So if you get e-mail from me, with a “From” address of “reuven@lerner.co.il”, DMARC will verify that the e-mail was really sent from the lerner.co.il server.  To anyone who has received spam, or fake messages, or illegal “phishing” messages, this sounds like a great thing: No longer will you get messages from your friend with a hotmail.com address, asking for money now that they’re stranded in London.  It really, admirably aims to reduce the number of such messages.

How? Very simply, by checking that the “From” address in the message matches the server from which the message was sent.  If your DMARC-compliant server receives e-mail from “reuven@lerner.co.il”, but the server was some anonymous IP address in Mongolia, your server will refuse to receive the e-mail message.

So far, so good.  But of course, for every rule, there are exceptions.  Consider, for example, e-mail lists: When someone posts to a list, the “From” address is preserved, so that the message appears to be coming from the sender.  But in fact, the message isn’t coming from the sender.  Rather, it’s coming from the e-mail program running on a server.

For example, if I (reuven@lerner.co.il) send e-mail to a mailing list (list@example.com), the e-mail will really be coming from the example.com server.  But it’ll have a “From” address of reuven@lerner.co.il.  So now, if a receiver is using DMARC, they’ll see the discrepancy, and refuse to receive the e-mail message.

If lerner.co.il is using DMARC in the strictest way possible, then reuven@lerner.co.il sending to list@example.com will have especially unpleasant consequences: lerner.co.il will refuse to receive its own subscriber’s message to the list, because DMARC will show it to be a fake.  These refusals will count as a “bounce” on the mailing list, meaning a message that failed to get to the recipient’s inbox.  Enough such bounces, and everyone at lerner.co.il will be unsubscribed.

Yes, this means that if your e-mail provider uses DMARC, and if you subscribe to an e-mail list, then posting to such a list may result (eventually) in every other user of your provider being unsubscribed from the list!

I’ve witnessed this myself over the last few weeks, as members of a large e-mail list I maintain for residents of my city have slowly but surely been unsubscribed.  Simply put, any time that a Hotmail, Yahoo, or AOL users posts to the list for Modi’in residents, all of these companies (and perhaps more) refuse the message.  This refusal increases the number of bounces attributed to the users, and eventually results in mass auto-subscriptions.

As if that weren’t bad enough (and yes, it’s pretty bad), people who have been passively reading (i.e., not participating) in the e-mail list for years are now getting cryptic messages from the list-management software, saying that they have been unsubscribed because of excessive bounces.  Most people have no idea what this means, which in turn leads to the list managers (such as me) having to explain intricate e-mail policy issues.

There are some solutions to this problem, of course.  But they’re all bad, so far as I can tell, and came without any serious warning or notification.  And when it comes to e-mail, you really don’t want to start rejecting message en masse without warning.  The potential solutions are:

  1. Subscribers can receive the digest mode of the list, which is always “From” an address on the server.  If you get the digest, this problem won’t happen to you.  If you are a mailing-list subscriber, rather than a list administrator, this is really the only recourse that you have.
  2. The list managers can change the list such that instead of each message being “From” the individual, it’ll come from the list’s address.  I know that there are some people who say that this is the right behavior for e-mail lists, but I have long subscribed (so to speak) to the school of thought that you don’t want to change the “From” address.  (For more on this subject, you can read “reply-to considered harmful” and its associated messages.)
  3. Supposedly, Mailman (the list-management software that I use) now has some support for DMARC that might solve the problem.  But the more I learn about DMARC, the less I’m convinced that Mailman can do anything.

And by the way, it’s not just little guys like me who are suffering.  The IETF, which writes the standards that make the Internet work, recently discovered that their e-mail lists are failing, too.

E-mail lists are incredibly useful tools, used by many millions (and perhaps billions) of people around the world.  You really don’t want to mess with how they work unless there’s a very good reason to do so.  Yes, spam and fraud are big problems, and I welcome the chance to change them.  

But really, would it have been so hard to contact all of the list-management software makers (how many can there be?) and work out some sort of deal?  Or at least get the message out to those of us running lists that this is going to happen?  I have personally spent many hours now researching this problem, and trying to find a solution for my list subscribers, with little or no success.

This all brings me back to my original point: The intentions here were good, and DMARC sounds like a good idea overall.  But it is affecting, in a very negative way, a very large number of people who are now suddenly, and to their surprise, cut off from their friends, colleagues, workplaces, and organizations.  The fact that AOL and other e-mail providers are saying, “Well, you’ll just need to reconfigure your list software,” without considering whether we want to do this, or whether e-mail lists really need to change after more than two decades (!) of working in a certain way, is rather surprising to me.  I’m not sure if there’s any way back, but I certainly hope that this is the last time such a drastic, negative solution is foisted on the public in this way.

Convention over confusion

One of the most celebrated phrases that has emerged from Ruby on Rails is “convention over configuration.” The basic idea is that software can traditionally be used in many different ways, and that we can customize it using configuration files. Over the years, configuration files for many types of software have become huge; installing software might be easy, but configuring it can be difficult. Moreover, given the option, everyone will configure software differently. This means that when you join a new project, you need to learn that project’s specific configuration and quirks.

“Convention over configuration” is the idea that we can make everyone’s lives easier if we agree to restrict our freedom. Ruby on Rails does this by telling you precisely what your directories will be named, and where they will be located. Rails tells you what to call your database tables, your class names, and even your filenames. The Ruby language, while generally quite open and flexible, also enforces certain conventions: Class and module names must begin with capital letters, for example.

It can take some time for developers to accept these conventions. Indeed, I was one of them: When I first started to work with Rails, I was somewhat offended to be told precisely what my database column names would be, especially when those names contradicted advice that I had heard and adopted years ago. (The advice was to prefix every column in a database table with the name of the table, which would make it more easily readable in joins.  Thus the primary key of the “People” table would be person_id, followed by person_first_name, person_last_name, and so forth.)  Over time, I have grown not only to use these Rails conventions, but to enjoy working with them; it turns out that people can changes pretty easily, at least when it comes to these arbitrary decisions.

The real benefit of such conventions has nothing to do with my own work. Rather, it reduces the need for communication among people working on the same project. If everyone does it the same way, then there are fewer things to negotiate, and we can all concentrate on the real problems, rather than the ones which are relatively arbitrary.

Back in college, I was the editor of the student newspaper. We, like many newspapers, used the AP Stylebook to determine the style that we would use. The AP Stylebook was our bible; whatever it said, we did.  Of course, we also had our own local style, to cover things that AP didn’t, such as building names and numbers (e.g., we could refer to “Building 54″). In some cases, I personally disagreed with the AP Stylebook, especially when it came to the “Oxford comma.” But by keeping that rule, we were able to download articles from the Washington Post and LA Times, and stick them into our newspaper with minimal editing. Again, I prefer the serial comma, and use it in my personal writing. By adhering to a standard, I was able to ensure consistency in our writing, and reduce the workload of the (already hard-working) newspaper staff.

Twice in the last few weeks, I’ve been reminded of the benefits of convention over configuration — both times, when developers on projects I inherited decided to flout the rules. Their decisions weren’t wrong, but they were so wildly different from the conventions of Rails that they caused trouble, delays, and bugs.

The first case had to do with the Rails “asset pipeline,” a part of Rails which handles static assets such as JavaScript and CSS files. The idea is that you create a file called application.js, and that file then tells Rails about all of the JavaScript files used by your application. Before deploying a new version of your application, Rails combines all of these files into one big file, thus improving site performance (by reducing the number of files to download) and improving caching. The asset pipeline is a great idea, and it even works well — but in many cases, getting it to work correctly can be difficult and painful, particularly if you’re new to Rails.

So you can imagine my surprise when I looked for the application.js file, and didn’t find it.  That was bad enough, but the asset pipeline mechanism, as well as the deployment scripts I was developing, got rather confused by the absence of application.js. When I confronted the original developer about this, he told me that actually, he liked to call it something else entirely, reflecting the name of the application and client. Why? He didn’t really have a technical reason; it was all for reasons of aesthetics. The fact is that the rest of the Rails ecosystem expected application.js, though, so his decision meant that the rest of the software needed to be configured in a special, different way.

As a way of justifying his decision, the other developer told me, “Conventions shouldn’t be a boundary when developing.”  No, just the opposite — the idea is that conventions are there to limit you, to tell you to work in a way that everyone else works, so that things will be smoother.  In much of the world, we drive on the right side of the road.  This is utterly random; as numerous countries (e.g., England) have proven, you can drive on the other side of the road just fine — but only so long as everyone is doing it.  The moment everyone decides on their own conventions, big problems can occur.

When Biblical Hebrew wants to describe anarchy, it uses the phrase, “People did whatever was right in their own eyes.”

Something similar occurred with another project where I inherited code from someone else: One of my favorite things about Ruby on Rails is the fact that it runs the application in an “environment.”  The three standard environments are development (which is optimized for developer speed, not for execution speed), production (which is optimized for execution speed), and test (which is meant for testing). The environments aren’t meant to change the application logic, but rather the way in which the application behaves.  For example, I recently changed the way in which e-mail is sent to users of my dissertation software, the Modeling Commons. When I send the e-mail in the “production” environment, the e-mail is actually sent — but when I do so within the “development” environment, the e-mail is opened in a browser, so that I can examine it.  This is standard and expected behavior; all Rails applications have development, production, and test environments — and some even havea  “staging” environment, in which we prepare things.

My client’s software, which I inherited from someone else, decided to do something a bit different: The code was meant to be used on several different sites, each with slightly different logic.  The developer decided to use Rails environments in order to distinguish between the logical functions.  Thus, if you run the application under the “xyz” environment, you’ll get one logical path, and if you run the application under the “abc” environment, you’ll get another logical path.

It’s hard to describe the number of surprises and problems that this seemingly small decision has created: It means that we can’t really test the application using the normal Rails tools, because nothing will work correctly in the “test” environment. It means that the Phusion Passenger server that we installed to run the application needs an additional, special configuration parameter (not normally needed in production) to find the right database, and execute with the correct algorithms. It means that when you’re trying to trace through the logic of the application, you need to check the environment.

Basically, all of the things that you can assume about most Rails applications aren’t true in this one.

Now, the point of me writing this isn’t to say that I’m brilliant and that other developers are stupid — although it is true that Reuven’s First Law of Consulting states that a new consultant on a project must call his predecessor a moron.  Rather, it’s to point to the fact that conventions are there for a reason, and that if you insist on ignoring them, you’ll be increasing the learning curve that other developers will need to work on your application.  Now, if you have oodles of time and money, that’s just fine — but as a general rule, a developer’s time is a software company’s greatest expense, and anything you can do to increase productivity, and  decrease the need for explanations and communication, is worthwhile.

By the way, this is the whole reason why one of the Python mantras is, “There’s only one way to do it” — a direct contrast with the Ruby and Perl mantra, “There’s more than one way to do it.” Having a single, common way to do things makes everyone’s code more similar readable, and easier to understand. It doesn’t stop you from doing brilliant and interesting things, but does ask that you demonstrate your brilliance within the context of established practice.

Of course, this doesn’t mean that conventions are written in stone, or that they are unchangeable.  But if and when you ignore them, it should be for good reason.  Even if you’re right, think about whether you’re so right that it’s worth having multiple people learn your way of doing things, instead of the way that they’re used to doing them.

What do you think?  Have you see these sorts of issues in your work?  Let me know!