Douglas McIlroy was not the kind of engineer who built systems and moved on — he was the kind who thought about the principles behind systems, and found the wrong principles quietly intolerable. At Bell Labs, where he eventually ran the Computing Science Research Center, he was known as a demanding critic: the person who would ask, in seminars or over coffee, the question that exposed the fundamental problem in whatever was being presented. Colleagues remembered his corrections as gifts rather than attacks.
In 1964, he wrote a three-page internal memo proposing that programs should connect the way garden hoses connect.2 The output of one program should flow naturally into the input of the next, in any combination, indefinitely. Each program should do one thing, do it well, accept a text stream as input, and produce a text stream as output. The memo circulated through Bell Labs. It was admired. It was not implemented — the environment it required did not yet exist: no Unix, no shell in which to express the idea in practice.
The infrastructure arrived slowly, from a failed project. Bell Labs had been part of Multics, an ambitious time-sharing system that consumed years of effort and produced something nobody seemed to want. Bell Labs withdrew in 1969. Ken Thompson, stranded without a machine to run a solar system game he had written, found a discarded PDP-7 machine in a corner of Building 2 and spent a summer writing a stripped-down operating system on it. Dennis Ritchie joined the project. There were no quarterly targets, no product roadmaps — Bell Labs ran on AT&T monopoly revenues directed by government decree toward basic research, which meant Thompson and Ritchie could build Unix in a register of serious play that a corporate environment would have killed immediately.
In the autumn of 1973, Thompson implemented pipes overnight.5 He added a single character — the pipe | — to the Unix shell, and by morning programs that had never been designed to speak to each other began to speak. McIlroy tested the implementation immediately, connecting tools in combinations their authors had not anticipated. The point was not any particular combination. The point was that the combinations were unlimited.
The three-sentence formulation appeared five years later, in McIlroy's foreword to a Bell System Technical Journal issue devoted to Unix: "Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."7
The argument was tested most vividly in 1986. Jon Bentley posed a word-frequency problem to several programmers: given a text file, list the most common words in descending order of frequency.8 Donald Knuth produced a ten-page literate program in Pascal with a custom hash table — carefully documented, algorithmically sophisticated, written to be read as much as executed. McIlroy's response admired Knuth's solution, then observed that the Unix pipeline reproduced the result in six commands. He was not arguing that Knuth's program was wrong.
He was arguing that the question Knuth had answered — how to write a maximally efficient program for a specific task — was a different question from the one a Unix programmer would ask: what is the simplest correct arrangement of existing tools that does this? The gap between those two questions was, in McIlroy's view, the gap between two entirely different theories of what programming was for. He had been making that argument since 1964.
- The 1964 memo is documented in Salus, A Quarter Century of Unix (1994), pp. 9–10. The "garden hose" metaphor is McIlroy's own. ↑
- The timing — "overnight" — is attested in Salus, pp. 33–34, and in Kernighan's memoir. The exact date in autumn 1973 is not precisely documented. ↑
- McIlroy, M.D., E.N. Pinson, and B.A. Tague (1978). "Unix Time-Sharing System: Foreword." The Bell System Technical Journal, 57(6), 1902–1903. The three sentences appear on p. 1902. ↑
- Bentley, Jon (1986). "Programming Pearls: A Literate Program." Communications of the ACM, 29(5), 364–369. Knuth's solution, McIlroy's review, and McIlroy's pipeline appear together. All three are worth reading. ↑
The first command handles one question only: what counts as a word? tr translates characters. The flags say: keep letters (-c complements the set, selecting everything not listed), squeeze repeats (-s), and replace everything else — spaces, punctuation, digits — with a line break. The output is one word per line. The question of what a word is has been answered once, here, and will not be revisited.
Time, time, and TIME are the same word. The second command handles one question only: does capitalisation matter? It translates every uppercase letter to its lowercase equivalent and passes the stream through unchanged otherwise. One question, one command. Nothing is decided that does not need to be decided here.
sort has no flags. It takes a stream of lowercase words and arranges them alphabetically. It does not know what came before it in the pipeline and does not know what comes next. It sorts. That is all. But sorting is precisely what the next command needs — because the next command can only count things that are already adjacent.
uniq removes consecutive duplicates. The -c flag prefixes each surviving line with a count of how many times it appeared. Because sort has already grouped identical words together, uniq -c effectively counts every word in the original text. The stream is now a frequency table: a number, then a word. This is the transformation the whole pipeline was built toward — accomplished in three characters.
Sort again — but differently. -r reverses the order, largest first. -n sorts numerically rather than alphabetically, so 10 comes after 9 rather than before 2. The most frequent words rise to the top. The same command as before; a different question; a different result. sort does not know it is being reused. It does not need to.
head prints the first ten lines of whatever stream it receives. It does not know what word frequency is. It does not know it is the last command. It prints the beginning of a list. The answer — the ten most common words in any text you care to feed in — falls out. No program was compiled. No variable was declared. Nothing was stored. The text entered, was transformed at each stage, and emerged as something else.