Why Is E a Single Dot? Morse Code Length and English Letter Frequency

The Type Cases at the Print Shop

The smartest design decision in Morse code is that the more frequent the letter, the shorter its code. The story behind it is famous: to estimate how often each letter was used, Samuel Morse's partner Alfred Vail visited the local newspaper office in Morristown, New Jersey, and counted the pieces of movable type in the printers' cases — print shops stocked more type for common letters and less for rare ones, a ready-made frequency table. The Library of Congress's historical essay on the telegraph records this episode of corpus linguistics by type case.

The result is the alphabet we use today: E, the most common letter, is a single dot (.); T, the runner-up, a single dash (-); while rare Q (--.-) and J (.---) stretch to four elements.

What the Data Says: Frequency vs. Code Length

Modern corpora confirm Vail's type-case survey was remarkably accurate. Peter Norvig, Google's director of research, computed English letter frequencies over trillions of characters from the Google Books corpus. Set those against each letter's Morse duration (dot = 1 unit, dash = 3, gap between elements = 1):

LetterCorpus frequencyCodeUnits
E12.49%.1
T9.28%-3
A8.04%.-5
O7.64%---11
Q0.12%--.-13
J0.16%.---13

Frequency and length are roughly inversely related. The mapping is not perfect — O is longer than the more frequent N deserves, for instance — but for 1838 tools and knowledge, it is an impressively good approximation of an optimal code.

A Preview of Information Theory

"Short codes for common symbols" is exactly the core idea information theory would formalize a century later. Claude Shannon's 1951 paper "Prediction and Entropy of Printed English" quantified the statistical structure and redundancy of English, proving how compressible text really is; Huffman's 1952 algorithm then constructed provably optimal variable-length codes — on the very principle Morse and Vail had used: rank by frequency, assign the short codes first. In a real sense, the telegraph wires of the 1840s ran a compression scheme that mathematics only caught up with a hundred years later.

What It Means for Learners

This design is good news if you are learning: master the short, high-frequency letters E, T, A, O, I, N first, and you can already hear nearly half the characters in typical English text. Open our Morse code translator, type an English sentence and press play — listen for those fleeting short tones. Their rhythm is the echo of a type case counted 180 years ago.

Try the Morse Code Translator Now

References

  1. Peter Norvig, "English Letter Frequency Counts: Mayzner Revisited" — letter frequencies from the Google Books corpus.
    https://www.norvig.com/mayzner.html
  2. C. E. Shannon, "Prediction and Entropy of Printed English," Bell System Technical Journal, 1951 (PDF hosted by Princeton University).
    https://www.princeton.edu/~wbialek/rome/refs/shannon_51.pdf
  3. Library of Congress, "The Invention of the Telegraph" — historical account of Vail and the code design (accessed via the Internet Archive).
    https://web.archive.org/web/20250109153743/https://www.loc.gov/collections/samuel-morse-papers/articles-and-essays/invention-of-the-telegraph/