← All Articles

How Diff Algorithms Work: The Science of Comparing Text

March 2026 · 7 min read

Every time you review a Pull Request on GitHub, track changes in Word, or use any text comparison tool, a sophisticated diff algorithm is working behind the scenes. How does it find all the differences between two documents in mere milliseconds? Let's dive in.

What Is Diff?

Diff (difference comparison) is a technique for comparing two texts and identifying their differences. Its goal is to find the Minimum Edit Distance — the fewest operations (additions, deletions, modifications) needed to transform the original document into the new one.

A good diff algorithm doesn't just find differences — it produces a human-readable, intuitive difference report. The same two documents could yield multiple valid diff results; the algorithm aims to find the most meaningful one.

Longest Common Subsequence (LCS)

The core concept behind diff algorithms is the Longest Common Subsequence (LCS). Simply put, the LCS is the longest portion that appears in both documents in the same order.

Once the LCS is found, everything not in the LCS represents differences — elements in the original but not in the LCS are "deletions," and elements in the new document but not in the LCS are "additions."

Key Takeaway: LCS is the mathematical foundation of diff algorithms. Finding the longest common subsequence is equivalent to finding the minimal set of differences.

Myers Diff Algorithm

In 1986, Eugene W. Myers published a groundbreaking paper "An O(ND) Difference Algorithm and Its Variations," proposing what is now the most widely used diff algorithm.

Core Idea

The brilliance of Myers' algorithm lies in transforming the diff problem into a shortest-path problem in graph theory. It searches for the shortest path from the top-left to bottom-right corner of an Edit Graph:

The algorithm's goal is to find the path that maximizes diagonal movements (minimizing additions and deletions).

Time Complexity

Myers' algorithm has a time complexity of O(ND), where N is the total length of both documents and D is the number of differences. When differences are few (the most common case), the algorithm is extremely efficient.

AlgorithmTime ComplexityCharacteristics
Brute ForceO(2^N)Enumerates all possibilities, impractical
Dynamic Programming LCSO(N*M)Classic approach, high memory usage
Myers DiffO(ND)Extremely fast when differences are few
Patience DiffO(N log N)More human-friendly results

Hunt-McIlroy Algorithm

Before Myers, Hunt and McIlroy proposed the original Unix diff algorithm in 1976. Also based on LCS, it uses a different search strategy — first finding all matching lines, then selecting the longest common subsequence from them.

The Hunt-McIlroy algorithm has O(N log N) complexity where N is the number of matching lines. It's efficient when matches are few but may be slower than Myers when matches are abundant.

Patience Diff

Patience Diff is a variant of Myers that uses a "patience sorting" strategy. Its distinguishing feature is producing diff results that are more readable for humans. Git allows you to use git diff --patience to enable this algorithm.

The core idea is to first identify unique lines in both documents, use them as anchor points, then perform fine-grained comparison between anchors. This typically produces results that better match human intuition.

Line-Level vs Character-Level Comparison

Most diff tools default to line-level comparison, but sometimes you need to know exactly which characters changed within a line. Character-level diff pinpoints every character change.

Real-World Applications

Diff algorithms are ubiquitous in modern software development:

Our text diff tool is built on these algorithms, letting you quickly find differences between any two pieces of text.

Try the Text Diff Tool Now →

Conclusion

Diff algorithms may seem simple on the surface, but they involve deep computer science principles. From Hunt-McIlroy in 1976 to Myers in 1986 to modern variants, this field continues to evolve. Next time you see a clean diff report, you'll appreciate the elegant mathematics behind it.

References

  1. Myers, Eugene W. "An O(ND) Difference Algorithm and Its Variations." Algorithmica, 1(2), 1986, pp. 251-266. https://doi.org/10.1007/BF01840446
  2. Hunt, James W. and McIlroy, M. Douglas. "An Algorithm for Differential File Comparison." Bell Laboratories Computing Science Technical Report, No. 41, 1976.
  3. GNU Project. "GNU Diffutils Manual." Free Software Foundation, 2023. https://www.gnu.org/software/diffutils/manual/
  4. Bram Cohen. "Patience Diff Advantages." Codeville Blog, 2005. https://bramcohen.livejournal.com/73318.html