What Is a Hash Function? A Complete Beginner's Guide

What is a hash function?
The four core properties of a hash function
How is a hash value computed?
Differences between MD5 and the SHA family
Real-world uses of hash functions
Which algorithm should you choose?

What is a hash function?

A hash function is a mathematical operation that turns data of any length—a sentence, a document, a video—into a fixed-length string. That string is called the "hash value," "digest," or "fingerprint." Whether you feed in a single character or a 100 GB movie, the output is always the same length.

For example, run the word "hello" through the SHA-256 algorithm and you get:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

This string of 64 hexadecimal characters is the SHA-256 hash of "hello." As long as the input is exactly the same, anyone on any computer will compute the identical result; but change a single letter and the entire hash becomes completely different.

Many people confuse hashing with encryption, but they are two different things. Encryption exists so the original can later be recovered, so it is reversible; hashing is deliberately designed as a one-way process—easy to compute forward, impossible to reverse. The properties section below explains this, and you can also read the dedicated article on hashing vs encryption.

The four core properties of a hash function

A proper hash function must possess all four of the following properties at once. Understand these four points and you understand the essence of hashing.

1. Fixed-length output

No matter how large the input, the output length is always fixed. That length is determined by the algorithm and is unrelated to how much content you feed in:

MD5 outputs 128 bits, equal to 32 hexadecimal characters.
SHA-1 outputs 160 bits, equal to 40 characters.
SHA-256 outputs 256 bits, equal to 64 characters.
SHA-512 outputs 512 bits, equal to 128 characters.

Since every 4 bits corresponds neatly to one hexadecimal character, the number of bits divided by 4 gives the number of characters. This property makes hash values ideal as indexes or as a basis for comparison—the length is predictable, and they are easy to store and transmit.

2. One-way and irreversible

Computing a hash from an input is easy, but trying to work backward from the hash to the original input is computationally infeasible. This isn't because the data is hidden away; it's because hashing "compresses and destroys" the original information—packing data of any size into a fixed length necessarily discards information, leaving nothing to restore from. This is also why hashing is well suited to protecting passwords: even if a database leaks, all an attacker obtains is the hash value, not the password itself.

3. The avalanche effect

The slightest change to the input produces a dramatic change in the output—in theory, about half the bits flip. This phenomenon is called the "avalanche effect." For example:

The MD5 of "cat" is d077f244def8a70e5ea758bd8352fcd8
The MD5 of "Cat" is fa3ebd6742c360b2d9652b7f78d9bd7d

Merely capitalizing the first letter yields two hashes with no resemblance whatsoever. The avalanche effect makes it impossible to guess that "two inputs are somewhat similar" from "two hashes look somewhat similar," which is crucial for security.

4. Determinism

The same input always yields the same output. A hash function has no random component—compute it today or next year and the result is identical. This is precisely why you can use a hash value to verify whether a file has been altered—rehash a downloaded file and compare it against the officially published value; if they match, the file is intact.

Tip: The avalanche effect combined with determinism is the foundation of file verification. Tamper with even a single byte and the hash changes completely, while an untouched file always produces the same value.

How is a hash value computed?

You don't need to understand the underlying math to use a hash, but knowing the general flow helps build intuition. Taking SHA-256 as an example, the computation breaks roughly into several stages:

Padding: the input data is padded to a multiple of 512 bits to ensure a uniform length.
Chunking: the padded data is split into 512-bit blocks.
Compression loop: starting from a set of fixed initial values, each block undergoes a large number of bitwise operations (shifts, XOR, addition, modular arithmetic), and every block scrambles and mixes the result of the previous one.
Output: once all blocks are processed, the accumulated internal state is combined into the final 256-bit hash value.

The key is that these operations are carefully designed so the process is easy to compute forward yet nearly impossible to reverse, while amplifying any tiny difference in the input—this is what delivers the one-way property and the avalanche effect described above.

Differences between MD5 and the SHA family

The common hash algorithms in use can be grouped into a few generations, each with different security and purposes:

Algorithm	Output length	Characters	Status
MD5	128 bits	32	Broken, for non-security checks only
SHA-1	160 bits	40	Broken, being phased out
SHA-256	256 bits	64	Mainstream, currently secure
SHA-512	512 bits	128	Secure, faster on 64-bit systems

MD5

Introduced in 1992, MD5 was once the most widely used hash algorithm. But researchers demonstrated practical "collisions" as early as 2004—two different pieces of data producing the same MD5 value—and today an ordinary computer can construct a collision in seconds. MD5 must therefore never be used for digital signatures, passwords, or any tamper-resistance purpose; today it is only suitable for fast verification where there is no malicious adversary, such as checking whether a file transfer was accidentally corrupted.

SHA-1

Designed by the U.S. NSA in 1995, SHA-1 is more secure than MD5 and was once widely used in TLS certificates and Git. But in 2017 Google published a practical collision attack named SHAttered, proving that SHA-1 is no longer secure either. Major browsers and certificate authorities have all stopped trusting SHA-1 signatures.

SHA-256 and SHA-512 (the SHA-2 family)

SHA-2 is the current workhorse, and SHA-256 and SHA-512 are its two most commonly used members, differing mainly in output length and the bit width of their internal operations. SHA-256 is the most common choice today, widely used in TLS, software signing, blockchain, and more; SHA-512 offers higher security strength and may even be faster than SHA-256 on 64-bit processors. As of now, the SHA-2 family has no known practical collision attacks.

Real-world uses of hash functions

Hashing appears in every corner of information systems:

File integrity verification: software websites often publish a file's SHA-256 value so you can compare it after downloading and confirm the file hasn't been tampered with or corrupted. See How to Verify File Integrity with Checksums.
Password storage: websites don't store your plaintext password—they store its hash, and compare hashes at login. That said, password hashing has its own requirements: it must be salted and use a slow algorithm. See Hash Algorithm Security Comparison.
Digital signatures and certificates: before signing, the content is hashed, and the hash value is signed, ensuring the content hasn't been altered.
Data deduplication and indexing: using the hash as a key lets you quickly determine whether two pieces of data are identical.
Blockchain: each block contains the hash of the previous one, chaining them together so any tampering is immediately detected.

Which algorithm should you choose?

The guiding principle is actually straightforward:

Just checking whether a file transferred correctly, with no malicious adversary: MD5 or SHA-256 both work; MD5 is faster.
Need to prevent deliberate tampering (digital signatures, anti-forgery): use SHA-256, not MD5 or SHA-1.
Storing user passwords: don't use a general-purpose hash function—use bcrypt or Argon2, which are designed for passwords.

Want to try it out right now? Open the tool and enter any text to get MD5, SHA-1, SHA-256, and SHA-512 results all at once, or upload a file to compute its checksum.

Use the Hash Generator now

What Is a Hash Function? A Complete Beginner's Guide

Contents