Cryptographic Hash Function

Use a cryptographic hash function to verify the authenticity of data

Photo of formulas and pictures on a blackboard

Yagi Studio/Getty Images

A cryptographic hash function is an algorithm that can be run on data such as an individual file or a password to produce a value called a checksum.

The main use of a cryptographic hash function is to verify the authenticity of a piece of data. Two files can be assumed to be identical only if the checksums generated from each file, using the same cryptographic hash function, are identical.

Some commonly used cryptographic hash functions include MD5 and SHA-1, although many others also exist.

Cryptographic hash functions are often referred to as "hash functions," but that's not technically correct. A hash function is a generic term that encompasses cryptographic hash functions along with other sorts of algorithms like cyclic redundancy checks.

Cryptographic Hash Functions: A Use Case

Say you download the latest version of the Firefox browser. For some reason, you needed to download it from a site other than Mozilla's. Because it isn't being hosted on a site you've learned to trust, you'd like to make sure that the installation file you just downloaded is exactly the same as the one Mozilla offers.

Using a checksum calculator, you compute a checksum using a particular cryptographic hash function, such as SHA-2, and then compare that to the one published on Mozilla's site. If they're equal, you can be reasonably sure that the download you have is the one Mozilla intended you to have.

Can Cryptographic Hash Functions Be Reversed?

Cryptographic hash functions are designed to prevent the ability to reverse the checksums they create back to the original texts. However, even though they are virtually impossible to reverse, they're not 100 percent guaranteed to safeguard data.

Hackers may use a rainbow table to figure out the plain text of a checksum.

Rainbow tables are dictionaries that list thousands, millions, or even billions of checksums alongside their corresponding plain text value.

While this isn't technically reversing the cryptographic hash algorithm, it might as well be, given that it's so simple to do. In reality, since no rainbow table can list every possible checksum in existence, they're usually only helpful for simple phrases like weak passwords.

Here's a simplified version of a rainbow table to show how one would work when using the SHA-1 cryptographic hash function:

PlaintextSHA-1 Checksum
123458cb2237d0679ca88db6464eac60da96345513964
password1e38ad214943daad1d64c102faec29de4afe9da3d
ilovemydoga25fb3505406c9ac761c8428692fbf5d5ddf1316
Jenny4007d5eb0173008fe55275d12e9629eef8bdb408c1f
dallas1984c1ebe6d80f4c7c087ad29d2c0dc3e059fc919da2

 

A hacker must know which cryptographic hash algorithm was used to generate the checksums to figure out the values.

For added protection, some websites that store user passwords perform additional functions on the cryptographic hash algorithm after the value is generated but before it's stored. This process produces a new value that only the web server understands and that doesn't match the original checksum.

For example, after a password is entered and the checksum generated, it may be separated into several parts and rearranged before it's stored in the password database, or certain characters might be swapped with others. When attempting to authenticate the next time the user signs on, the web server reverses this additional function, and the original checksum is generated again to verify that a user's password is valid.

Taking these steps limits the usefulness of a hack where all the checksums are stolen. The idea is to perform a function that is unknown, so if the hacker knows the cryptographic hash algorithm but not the custom one, then knowing the password checksums is unhelpful.

Passwords and Cryptographic Hash Functions

A database saves user passwords in a manner similar to a rainbow table. When your password is entered, the checksum is generated and compared with the one on record with your username. You're then granted access if the two are identical.

Given that a cryptographic hash function produces a nonreversible checksum, is it safe for you to make your password as simple as 12345, instead of 12@34$5, simply because the checksums themselves can't be understood? No, and here's why.

These two passwords are both impossible to decipher just by looking just at the checksums:

MD5 for 12345: 827ccb0eea8a706c4c34a16891f84e7b

MD5 for 12@34$5: a4d3cc004f487b18b2ccd4853053818b

At first glance, you may think that it's fine to use either of these passwords. This is true if an attacker tried figuring out your password by guessing the MD5 checksum, which nobody does, but not true if a brute force or dictionary attack is performed, which is a common tactic.

A brute force attack occurs when multiple random stabs are taken at guessing a password. In this case, it would be easy to guess 12345, but pretty difficult to randomly figure out the other one. A dictionary attack is similar in that the attacker can try every word, number, or phrase from a list of common (and not-so-common) passwords, and 12345 is one of those common passwords.

Even though cryptographic hash functions produce difficult- to impossible-to-guess checksums, you should still use a complex password for all your online and local user accounts.

More Information on Cryptographic Hash Functions

It might seem like cryptographic hash functions are related to encryption, but the two work in different ways.

Encryption is a two-way process where something is encrypted to become unreadable and then decrypted later to be used normally again. You might encrypt files you've stored so that anyone who accesses them is unable to use them, or you can use file transfer encryption to encrypt files that are moving over a network, like the ones you upload or download online.

Cryptographic hash functions work differently in that the checksums are not meant to be reversed with a special dehashing password. The only purpose cryptographic hash functions serve is to compare two pieces of data, such as when downloading files, storing passwords, and pulling data from a database.

It's possible for a cryptographic hash function to produce the same checksum for different pieces of data. When this happens, it's called a collision, which is a huge problem considering the entire point of a cryptographic hash function is to make unique checksums for every data input into it.

Collisions can occur is because each cryptographic hash function produces a value of a fixed length regardless of the input data. For example, the MD5 cryptographic hash function generates 827ccb0eea8a706c4c34a16891f84e7b, 1f633b2909b9c1addf32302c7a497983, and e10adc3949ba59abbe56e057f20f883e for three totally different blocks of data.

The first checksum is from 12345. The second was generated from over 700 letters and numbers, and the third is from 123456.

All three inputs are of different lengths, but the results are always just 32 characters long since MD5 checksum was used.

There is no limit to the number of checksums that could be created because each tiny change in the input is supposed to produce a completely different checksum. Because there is a limit to the number of checksums that one cryptographic hash function can produce, there's always the possibility that you'll encounter a collision.

This is why other cryptographic hash functions have been created. While MD5 generates a 32-character value, SHA-1 generates 40 characters and SHA-2 (512) generates 128. The greater the number of characters that the checksum has, the less likely that a collision will occur.