Douglas McIlroy is the kind of person who notices when something is wrong with the way things are arranged. He arrived at Bell Telephone Laboratories in 1958 with a PhD from MIT in applied mathematics and interests that ran across computing, linguistics, music, and the theory of what programs were actually for.1 He was not the kind of engineer who built systems and moved on. He was the kind who thought about the principles behind systems, and found the wrong principles quietly intolerable. At Bell Labs, where he eventually ran the Computing Science Research Center, he was known as a demanding critic — the person who would ask, in seminars or over coffee, the question that exposed the fundamental problem in whatever was being presented. Colleagues remembered his corrections as gifts rather than attacks, which is not a common achievement.
In 1964, McIlroy wrote a three-page internal memo proposing that programs should connect the way garden hoses connect.2 The output of one program should flow naturally into the input of the next, in any combination, indefinitely. Each program should do one thing, do it well, accept a text stream as input, and produce a text stream as output. Nothing in the chain should need to know what came before it or what came after. The memo circulated through Bell Labs. It was admired. It was not implemented. Part of the problem was that the environment it required did not yet exist: no Unix, no C, no shell in which to express the idea in practice. The proposal was ready. The infrastructure was not.
The infrastructure arrived slowly, from a failed project and a space travel game. Bell Labs had been part of Multics, an enormously ambitious time-sharing system designed to serve thousands of users simultaneously, a joint venture with MIT and General Electric that consumed years of effort and produced something nobody wanted to run.3 Bell Labs withdrew in 1969. Ken Thompson, who had joined the lab in 1966 fresh from Berkeley, had been using Multics to run a game he had written — a simulation of the solar system called Space Travel — and found himself without a machine to run it on. He located a discarded PDP-7 in a corner of Building 2, nobody claiming it, and spent a summer writing a stripped-down operating system small enough to run on it. Dennis Ritchie joined the project. The operating system had no name yet, and no official support.
What made this possible, and what made Bell Labs singular, was a funding structure that no longer exists anywhere.4 AT&T's telephone monopoly generated revenues that the government required to be directed partly toward basic research, with no obligation to produce products. The result was an institution that housed Nobel Prize winners alongside hackers and gave both the same instruction: do interesting work. There were no quarterly targets. There were no product roadmaps. There was a culture of open doors and shared results, of problems posted on whiteboards and left for whoever wanted to engage with them. Thompson and Ritchie built Unix inside this culture — McIlroy was their department head, continuing to advocate for pipes, and the whole enterprise operated in a register of serious play that a corporate environment would have killed immediately.
In the autumn of 1973, Thompson implemented pipes overnight.5 He added the pipe operator to the Unix shell — a single character, |, placed between two commands — and by morning the operator worked. The first programs that had never been designed to speak to each other began to speak. McIlroy tested the implementation immediately, connecting tools in combinations that their authors had not anticipated: programs that had existed for months were suddenly capable of things nobody had written them to do, because they could now be assembled with other programs into sequences that none of them had been built to support. The point was not any particular combination. The point was that the combinations were unlimited. McIlroy has described this as among the most satisfying moments of his career at Bell Labs.6
The three-sentence formulation of the philosophy appeared five years later, in 1978, in McIlroy's foreword to a special issue of the Bell System Technical Journal devoted to Unix: "Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."7 It was a description of something that already existed in practice — Unix had embodied all three principles from the beginning — but writing it down changed its status from habit to argument. Brian Kernighan and P.J. Plauger had already begun codifying similar ideas in Software Tools in 1976, describing how to build programs that composed cleanly with other programs. The philosophy was becoming portable: separable from Bell Labs, from Unix, from Thompson and Ritchie, transferable to anyone who wanted to think about software the same way.
The argument was tested most vividly in 1986. Jon Bentley, writing his "Programming Pearls" column for Communications of the ACM, posed a word-frequency problem to several programmers: given a text file, list the most common words in descending order of frequency.8 Donald Knuth — author of The Art of Computer Programming, one of the most rigorous formal programmers of the century — produced a ten-page literate program in Pascal with a custom hash table. It was, by any measure, a beautifully constructed solution: carefully documented, algorithmically sophisticated, written to be read as much as executed. McIlroy's response ran to several paragraphs of genuine admiration for Knuth's solution, followed by the observation that the Unix pipeline reproduced the result in six commands.9 He was not arguing that Knuth's program was wrong. He was arguing that the question Knuth had answered — how to write a maximally efficient, elegantly documented, self-contained program for a specific task — was a different question from the one a Unix programmer would ask: what is the simplest correct arrangement of existing tools that does this? The gap between those two questions was, in McIlroy's view, the gap between two entirely different theories of what programming was for. He had been making that argument since 1964. He was still making it. He had merely acquired a better example.
- On McIlroy's background and reputation at Bell Labs, see Kernighan, UNIX: A History and a Memoir (2019), ch. 2; and Seibel, Coders at Work (2009), pp. 185–214 (the McIlroy interview). ↑
- The 1964 memo is documented in Salus, A Quarter Century of Unix (1994), pp. 9–10. The "garden hose" metaphor is McIlroy's own. The memo itself has not been published in full, but its content is described consistently across multiple histories and in McIlroy's own accounts. ↑
- On Multics and Bell Labs' withdrawal, see Ritchie, Dennis M. (1984). "The Evolution of the Unix Time-Sharing System." AT&T Bell Laboratories Technical Journal, 63(6), 1577–1593. Also Salus, ch. 1. ↑
- The Bell Labs funding model and its consequences are the central argument of Gertner, The Idea Factory (2012). See especially ch. 1 and the epilogue. The AT&T consent decree of 1956 required Bell Labs to license its patents and directed research spending; the 1982 breakup of AT&T ended the model entirely. ↑
- The timing of Thompson's pipe implementation — "overnight" is the standard account — is attested in Salus, pp. 33–34, and in Kernighan's memoir. The exact date in autumn 1973 is not precisely documented. ↑
- McIlroy's characterisation of the pipe's first test is drawn from his interview in Seibel, Coders at Work, p. 192, and from remarks reproduced in Salus, p. 34. ↑
- McIlroy, M.D., E.N. Pinson, and B.A. Tague (1978). "Unix Time-Sharing System: Foreword." The Bell System Technical Journal, 57(6), 1902–1903. The three sentences appear on p. 1902. ↑
- Bentley, Jon (1986). "Programming Pearls: A Literate Program." Communications of the ACM, 29(5), 364–369. The column includes Knuth's full solution, McIlroy's review, and McIlroy's pipeline. All three are worth reading together. ↑
- McIlroy's pipeline as published in Bentley (1986):
tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | head. His commentary runs to approximately 200 words, of which the majority are about Knuth's program. The pipeline itself is introduced almost as an afterthought. ↑
The first command handles one question only: what counts as a word? tr translates characters. The flags say: keep letters (-c complements the set, selecting everything not listed), squeeze repeats (-s), and replace everything else — spaces, punctuation, digits — with a line break. The output is one word per line. The question of what a word is has been answered once, here, and will not be revisited.
Time, time, and TIME are the same word. The second command handles one question only: does capitalisation matter? It translates every uppercase letter to its lowercase equivalent and passes the stream through unchanged otherwise. One question, one command. Nothing is decided that does not need to be decided here.
sort has no flags. It takes a stream of lowercase words and arranges them alphabetically. It does not know what came before it in the pipeline and does not know what comes next. It sorts. That is all. But sorting is precisely what the next command needs — because the next command can only count things that are already adjacent.
uniq removes consecutive duplicates. The -c flag prefixes each surviving line with a count of how many times it appeared. Because sort has already grouped identical words together, uniq -c effectively counts every word in the original text. The stream is now a frequency table: a number, then a word. This is the transformation the whole pipeline was built toward — accomplished in three characters.
Sort again — but differently. -r reverses the order, largest first. -n sorts numerically rather than alphabetically, so 10 comes after 9 rather than before 2. The most frequent words rise to the top. The same command as before; a different question; a different result. sort does not know it is being reused. It does not need to.
head prints the first ten lines of whatever stream it receives. It does not know what word frequency is. It does not know it is the last command. It prints the beginning of a list. The answer — the ten most common words in any text you care to feed in — falls out. No program was compiled. No variable was declared. Nothing was stored. The text entered, was transformed at each stage, and emerged as something else.