tr -cs A-Za-z '\n'
| tr A-Z a-z
| sort
| uniq -c
| sort -rn
| head
Shell Command  ·  1973 / 1986

Unix Pipes

Doug McIlroy & Ken Thompson · 1932— & 1943—
Bell Telephone Laboratories

Six commands, five pipes, no program to compile — and an argument about how software should be built, expressed by building software that way.

Lesson 8

In 1964, Douglas McIlroy wrote a three-page memo proposing that programs should connect the way garden hoses connect. The output of one should flow naturally into the input of the next, in any combination, indefinitely. Each program should do one thing and do it well. The memo circulated through Bell Labs. It was admired. It was not implemented — the infrastructure it required did not yet exist. Nine years later, Ken Thompson built that infrastructure overnight. He added a single character to the Unix shell — the pipe | — and by morning, programs that had never been designed to speak to each other began to speak.

tr -cs A-Za-z '\n'
Input — what is a word?

The first command handles one question only: what counts as a word? tr translates characters. The flags say: keep letters (-c complements the set, selecting everything not listed), squeeze repeats (-s), and replace everything else — spaces, punctuation, digits — with a line break. The output is one word per line. The question of what a word is has been answered once, here, and will not be revisited.

| tr A-Z a-z
Normalise

Time, time, and TIME are the same word. The second command handles one question only: does capitalisation matter? It translates every uppercase letter to its lowercase equivalent and passes the stream through unchanged otherwise. One question, one command. Nothing is decided that does not need to be decided here.

| sort
Sort

sort has no flags. It takes a stream of lowercase words and arranges them alphabetically. It does not know what came before it in the pipeline and does not know what comes next. It sorts. That is all. But sorting is precisely what the next command needs — because the next command can only count things that are already adjacent.

| uniq -c
Count

uniq removes consecutive duplicates. The -c flag prefixes each surviving line with a count of how many times it appeared. Because sort has already grouped identical words together, uniq -c effectively counts every word in the original text. The stream is now a frequency table: a number, then a word. This is the transformation the whole pipeline was built toward — accomplished in three characters.

| sort -rn
Rank

Sort again — but differently. -r reverses the order, largest first. -n sorts numerically rather than alphabetically, so 10 comes after 9 rather than before 2. The most frequent words rise to the top. The same command as before; a different question; a different result. sort does not know it is being reused. It does not need to.

| head
Cut

head prints the first ten lines of whatever stream it receives. It does not know what word frequency is. It does not know it is the last command. It prints the beginning of a list. The answer — the ten most common words in any text you care to feed in — falls out. No program was compiled. No variable was declared. Nothing was stored. The text entered, was transformed at each stage, and emerged as something else.

Douglas McIlroy was not the kind of engineer who built systems and moved on — he was the kind who thought about the principles behind systems, and found the wrong principles quietly intolerable. At Bell Labs, where he eventually ran the Computing Science Research Center, he was known as a demanding critic: the person who would ask, in seminars or over coffee, the question that exposed the fundamental problem in whatever was being presented. Colleagues remembered his corrections as gifts rather than attacks.

In 1964, he wrote a three-page internal memo proposing that programs should connect the way garden hoses connect.2 The output of one program should flow naturally into the input of the next, in any combination, indefinitely. Each program should do one thing, do it well, accept a text stream as input, and produce a text stream as output. The memo circulated through Bell Labs. It was admired. It was not implemented — the environment it required did not yet exist: no Unix, no shell in which to express the idea in practice.

The infrastructure arrived slowly, from a failed project. Bell Labs had been part of Multics, an ambitious time-sharing system that consumed years of effort and produced something nobody seemed to want. Bell Labs withdrew in 1969. Ken Thompson, stranded without a machine to run a solar system game he had written, found a discarded PDP-7 machine in a corner of Building 2 and spent a summer writing a stripped-down operating system on it. Dennis Ritchie joined the project. There were no quarterly targets, no product roadmaps — Bell Labs ran on AT&T monopoly revenues directed by government decree toward basic research, which meant Thompson and Ritchie could build Unix in a register of serious play that a corporate environment would have killed immediately.

Brian Kernighan demonstrates Unix pipes at Bell Labs, from a 1982 internal AT&T documentary — the clearest filmed record of what pipes looked like in practice.

In the autumn of 1973, Thompson implemented pipes overnight.5 He added a single character — the pipe | — to the Unix shell, and by morning programs that had never been designed to speak to each other began to speak. McIlroy tested the implementation immediately, connecting tools in combinations their authors had not anticipated. The point was not any particular combination. The point was that the combinations were unlimited.

The three-sentence formulation appeared five years later, in McIlroy's foreword to a Bell System Technical Journal issue devoted to Unix: "Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."7

The argument was tested most vividly in 1986. Jon Bentley posed a word-frequency problem to several programmers: given a text file, list the most common words in descending order of frequency.8 Donald Knuth produced a ten-page literate program in Pascal with a custom hash table — carefully documented, algorithmically sophisticated, written to be read as much as executed. McIlroy's response admired Knuth's solution, then observed that the Unix pipeline reproduced the result in six commands. He was not arguing that Knuth's program was wrong.

He was arguing that the question Knuth had answered — how to write a maximally efficient program for a specific task — was a different question from the one a Unix programmer would ask: what is the simplest correct arrangement of existing tools that does this? The gap between those two questions was, in McIlroy's view, the gap between two entirely different theories of what programming was for. He had been making that argument since 1964.

  1. The 1964 memo is documented in Salus, A Quarter Century of Unix (1994), pp. 9–10. The "garden hose" metaphor is McIlroy's own.
  2. The timing — "overnight" — is attested in Salus, pp. 33–34, and in Kernighan's memoir. The exact date in autumn 1973 is not precisely documented.
  3. McIlroy, M.D., E.N. Pinson, and B.A. Tague (1978). "Unix Time-Sharing System: Foreword." The Bell System Technical Journal, 57(6), 1902–1903. The three sentences appear on p. 1902.
  4. Bentley, Jon (1986). "Programming Pearls: A Literate Program." Communications of the ACM, 29(5), 364–369. Knuth's solution, McIlroy's review, and McIlroy's pipeline appear together. All three are worth reading.
Doug McIlroy & Ken Thompson
Conceived
McIlroy, 1964 Bell Labs internal memo
Implemented
Thompson, overnight October 1973
Runs on
Every Unix, Linux, macOS terminal
McIlroy also made
diff, spell, join, speak — all Bell Labs
Thompson also made
grep, UTF-8 (with Pike), the Go language
The philosophy
"Do one thing well. Write programs to work together."
  1. Ritchie, Dennis M., and Ken Thompson (1974). 'The UNIX Time-Sharing System.' Communications of the ACM, 17(7), 365–375. — The original paper. Short, plainly written, still worth reading in full.
  2. McIlroy, M.D., E.N. Pinson, and B.A. Tague (1978). 'Unix Time-Sharing System: Foreword.' The Bell System Technical Journal, 57(6), 1902–1903. — Four paragraphs. Contains McIlroy's three-sentence statement of the Unix philosophy, the closest the field has to a founding document.
  3. Kernighan, Brian W., and P.J. Plauger (1976). Software Tools. Reading, MA: Addison-Wesley. — The book that codified the Unix philosophy in practice. The pipe is the central argument.
  4. Bentley, Jon (1986). 'Programming Pearls: A Literate Program.' Communications of the ACM, 29(5), 364–369. — The Knuth–McIlroy exchange. Knuth's ten pages and McIlroy's pipeline, side by side. Both are included in the column.
  5. Salus, Peter H. (1994). A Quarter Century of Unix. Reading, MA: Addison-Wesley. — The best history of Unix culture, based on interviews with the people who built it. Contains first-hand accounts of the pipe's invention.
  6. Kernighan, Brian W. (2019). UNIX: A History and a Memoir. Kindle Direct Publishing. — Kernighan's first-hand account of Bell Labs and the Unix years. Readable, personal, accurate on atmosphere and chronology.
  7. Gertner, Jon (2012). The Idea Factory: Bell Labs and the Great Age of American Innovation. New York: Penguin Press. — The definitive account of Bell Labs as an institution: its funding structure, its culture, why it produced what it produced, and why nothing like it exists today.
  8. Seibel, Peter (2009). Coders at Work: Reflections on the Craft of Programming. New York: Apress. — Includes long interviews with both McIlroy and Thompson. The closest thing to hearing them think aloud about what they were doing and why.

Take any process you know well — editing a text, cooking a meal, researching a topic — and write it as a pipeline. One step per line, separated by |. Each step must do exactly one thing.

You will find that most real processes are not clean pipelines. They loop back. They depend on earlier results. They require judgment that cannot be passed along a pipe. Write those moments down too — 'return to step 2', 'if unsatisfied, restart from step 1'. Notice where the linear flow breaks. That is where the interesting decisions live. McIlroy spent nine years asking why software could not work the way he imagined. When Thompson built it in a night, the question collapsed into the answer. What he had understood was not a technical trick but a claim about complexity: that it should emerge from combination, not from complication. Bring your pipeline to the session. We will ask: where did it break, and why?

Earlier
Jacquard Loom Punch Cards

Thompson cited the Jacquard loom when describing Unix's approach to text as a universal medium. Each punch card is a line of instruction; the machine runs through them in sequence. Pipes extend the logic: what one machine produces, the next consumes.

Parallel
Smalltalk — Alan Kay & Adele Goldberg

Same decade, same problem — how should software be organised — and the opposite answer. Where Unix says combine small things, Smalltalk says build one rich world. Both traditions are still active. They have never fully agreed.

Later
MapReduce — Dean & Ghemawat, Google

The industrial-scale pipeline. Map: transform each item. Reduce: aggregate results. The same logic as sort | uniq -c, applied to billions of documents across thousands of machines. Published 2004. Still running.