The Ultimate Guide to BCrypt and Authentication Protocols
Learn why bcrypt is the industry standard hashing algorithm for authentication - including its history and how it compares to other protocols.
Digital security becomes more critical every day. As financial accounts and other sensitive data continues to move online, it’s vital for companies to keep their clients’ information safe. It’s not enough to simply provide password protected accounts. Those passwords must also be stored safely and effectively.
That’s where bcrypt becomes important. Bcrypt is an algorithm designed to hash and salt passwords for safe storage. It’s an industry standard that’s time-tested and proven to resist threats from hackers and other malicious agents. Keep reading to learn the fundamentals of bcrypt, why it’s so effective, and how it compares to different password protection algorithms.
What is hashing?
Hashing is a critical first step in the password storage process. Programs use hashing to condense data of a random size into a fixed size. Essentially, the program uses a mathematical formula to take an arbitrarily long text or data input and convert it into a hash. Hash algorithms apply a procedure to the input many times to obscure the original data.
The hash itself is a series of letters and numbers that will always be the same length. For example, a 10 character input and a 40 character input might be stored as 16 character hashes.
It’s important to note that this process can lead to multiple strings generating identical hashes. Since the process converts all input into strings of identical length, it’s inevitable that some will be the same. Usually, this is fine, but it can lead to vulnerabilities. For example, if too many inputs lead to identical strings, then it’s easier for hackers to brute-force their way into users’ accounts. There are more potential random inputs the hacker could guess that would lead to the hash matching.
Good hashing programs must meet a few qualifications:
- A specific input will always generate the same output
- Modifying an input will change the resulting output
- Different inputs should usually provide different outputs
Hashing can be applied to data of any kind, but it’s particularly useful for passwords. The hashing process is a one-way conversion. Once data has been hashed, it’s essentially impossible to unhash or reverse the process to come up with the original input. This property reduces the risk of leaks or attacks and makes hashed passwords safe to store.
Furthermore, since a specific input will always provide a specific output, a hashed password will always match the stored hash. The program will take the user’s input password, run it through the hash function, and compare it to the hash connected to their account. If they match, the user has entered the correct password and is allowed to access their account.
Hashing vs. encryption
Hashing is different from encryption, another method of storing passwords and sensitive information. When information is encrypted, it can be unencrypted with the right program and encryption key. The encryption process is a two-way street, while hashing only goes one way.
Since encryption can be reversed, it can serve a wider variety of use-cases, but it’s less secure than hashing. A hashed password can’t be converted back into the plaintext input, but encrypted passwords can. If a hacker can steal the encryption key, then the plaintext input is theirs.
Limitations to hashing
Of course, hashing alone isn’t a perfect solution. It has a few limitations that prevent it from solving all password storage problems up front. These weaknesses include:
Brute force attacks
Brute force attacks are a significant problem for sites that rely entirely on hashing. These attacks are performed by trying many different passwords with a particular username until one of them finally works. It’s considered a “brute force” attack because there’s no actual attempt to figure out the user’s password. The hacker is just using a program to guess many different passwords very quickly until one matches the hash.
Without other security methods, brute force attacks will eventually work — it’s just a matter of time and computing power. And if a hacker uses a list of common passwords first, they will likely be able to log into a large number of the users’ accounts with little effort in a short amount of time.
Rainbow table attacks
A rainbow table attack is more sophisticated. This method doesn’t bother with figuring out the actual password; it focuses on finding text that will produce the correct hash.
The rainbow table is a collection of common passwords and the hashes they generate under specific circumstances. If a hacker can access the stored hashes against which the authentication technology checks hashed password input, they can use the rainbow table to look up the resulting hash. If the hash is on the table, they can simply input the associated plaintext password and log in to the user’s account.
Collision and pre-image attacks
“Collisions” are different inputs that both result in the same hash. If a hacker can find a fake password that generates the same hash as an actual password, it works just as effectively against a hashed database. Collision attacks focus on finding any strings that generate identical hashes and then look for hashes that match what they’ve generated. These attacks work best if the hacker has uncovered a large number of password hashes — they have more hashed passwords they can potentially match.
For example, in the now-outdated hashing program MD5, it was discovered that anyone could easily generate multiple strings that would lead to identical hashes. This was a significant vulnerability that hackers used to violate secure computer programs and spoof security clearances and letters of recommendation.
Similarly, a pre-image attack is an attempt to match a specific hash. Instead of trying to find any match to any hash, the hacker tries to brute-force the discovery of a match of a particular password. Pre-image attacks work best against short passwords and hashes that can be performed quickly, so lengthening both passwords and the hashing time are essential to keep accounts safe.
Phishing and spoofing
Finally, hashing can do nothing to protect users from phishing or spoofing attempts. These types of attacks don’t rely on any kind of technical attack. Instead, they are aimed at the user.
In a phishing attack, the hacker uses a fake website or email address to convince users that they’re interacting with a trustworthy source. This “spoofed” site or email will look exactly like the real version, with something small like a single letter changed. The hacker then convinces the user to “log in” to a fake site and instead just steals their account information. Hashing can’t prevent this because the attack happens entirely “offsite”.
The two halves of bcrypt
The term “bcrypt” is a reference to two programs: crypt, the hashing function used by the UNIX password system, and Blowfish, a specific cipher that’s useful for password hashing. Here’s how each of these elements works in the context of bcrypt.
The root “crypt”
The crypt algorithm is one of the original password protection solutions, reaching back to the earliest days of the digital revolution. In the 1970s, when crypt was first implemented, only about four passwords could be hashed per second. This essentially made brute force and pre-image attacks irrelevant.
However, by the late 1990s, a computer could use crypt to hash more than 200,000 passwords per second. In minutes, an attacker could successfully brute-force their way into any system using the same crypt algorithm.
Accordingly, the crypt function needed to be updated.
Blowfish: Where the “b” comes from
What sets bcrypt apart from crypt is its use of the Blowfish cipher. Blowfish is considered a “fast block cipher,” with one exception. The cipher uses a “key” to encrypt text. This key is not a password for the platform, but rather a filter that can be used to encrypt and decrypt files.
When Blowfish changes keys, it slows down dramatically because it needs to perform the pre-processing equivalent of encrypting four kilobytes worth of text. However, bcrypt uses Blowfish in an “off-label” way. Instead of saving keys to unencrypt any data, it hashes these keys to make breaking a hash more difficult. The keys can’t be used to unencrypt the hash because they’re actually part of the hash. So, instead of being used to “unlock” an encryption, these keys are used to “jam the lock,” thus making the hash harder to break.
Basically, bcrypt uses the Blowfish slowdown as a way to make other tasks longer — slower speeds are actually a good thing for password hashing. Niels Provos and David Mazières, the pair who created bcrypt, took advantage of the Blowfish slowdown by creating a key setup algorithm they called “eksblowfish.” This program, which stands for “expensive key schedule Blowfish,” is run to generate subkeys from the primary key, or the user’s password.
This is how the eksblowfish function performs something called “key stretching.” Many users will choose passwords that are short or common, meaning they’re easy to guess. The more times the hashing function is run, the longer it takes for the password to be checked against the original hash. Essentially, key stretching makes the calculation process take longer, so it “costs” more to crack the password.
The eksblowfish function also “salts” the password by adding random information to it. This extra information makes it stronger. Salts can be any length, but the longer they are, the more secure the password will be.
How bcrypt works
So how does bcrypt actually work? That’s a good question. Like all hash programs, the fundamental method is simple: passwords are hashed, and the hashes are stored to compare against user inputs. It’s the hashing process where the difference is found.
The bcrypt hashing process heavily relies on the eksblowfish function. When the user inputs their password for the first time, the site uses “EksBlowfishSetup” to set two important parameters. First, it adds the “salt” to the password. A salt is a string of random characters used by the site to make passwords more complex.
Second, the EksBlowfishSetup will indicate the desired “cost” of the password. Remember, “eks” stands for “expensive key schedule.” The “expensive” part is the number of times the key schedule is run. Each iteration slows down the password process a little more.
Once the setup is complete, the eksblowfish function runs. The program then encrypts the value “ OrpheanBeholderScryDoubt” with the final state from the last run of the key schedule 64 times. The final string is a deeply secure hash that results from the combination of the cost, the salt, and the hashed input.
Why bcrypt is so secure
Bcrypt is still a hash, so why is it more robust than other hashing functions? It’s because of the extra steps involved. The salt and key stretching function make bcrypt more secure against pre-image attacks. Since the salt is a random string, hackers can’t just run a rainbow table attack and hope for the best. The random element blocks these dictionary attacks from working.
Meanwhile, bcrypt is also resistant to brute force attacks — both now and in the future. The “cost” to hash a password is malleable and can be updated. So can the length of the salt. That means that as computers get faster, the cost can be set higher to keep the hashing process slow. This keeps hackers from simply guessing passwords by running the entire hash thousands or millions of times per second. Given that Moore’s Law still seems to be holding, this flexibility is critical.
How bcrypt compares to other hashing methodologies
As well as bcrypt works, it’s not the only hash security function. Several other functions are regularly used to protect passwords against malicious users. Here’s how bcrypt compares to some of the most common alternatives.
The widely used function MD5 is a hashing function that was first developed in 1991. The function is considered cryptographically broken. An algorithm that’s regarded as broken is one that can be hacked without any preexisting information. In the case of MD5, it’s possible to generate collision hashes within minutes. Data security systems that rely on MD5 can be easily hacked by anyone with a basic understanding of the function.
On the other hand, bcrypt is not broken. As a result, it’s still able to keep passwords and information safe. Modern passwords should never be stored behind MD5.
SHA, SHA-256, and SHA-512
There are three primary varieties of the SHA hash function. SHA, like MD5, is cryptographically broken. However, SHA-256 and SHA-512 are still considered secure. In particular, SHA-256 is one of the most common hashes for current website certificates.
Why is SHA-256 so popular? It’s because it’s a fast hash. It’s quick to run, so it’s able to quickly hash large amounts of data in a short timeframe. That’s precisely why it’s not ideal for passwords. The slower bcrypt is better for passwords because it’s more resistant to brute force attacks for the short amount of data in question.
PBKDF2 is considered secure. However, it’s another old solution to password hashing. While it does salt passwords, it’s also a lightweight program that can be run on a single core. When it was first developed, this made it secure but functional. Today, however, it’s a liability. It’s possible to parallelize PBKDF2 on multicore systems, dramatically cutting down the amount of time it takes to brute force passwords.
In contrast, bcrypt is not as easily parallelized. Consequently, it’s far more secure as an alternative.
Scrypt is a follow-up program to bcrypt. It uses many of the same functions, but it’s significantly newer than bcrypt. In terms of security, this is not actually a good thing. Scrypt has received half as much scrutiny and testing as bcrypt has just because of the time it’s been in use. While there have not been significant vulnerabilities discovered yet, it simply isn’t as well-tested. Though scrypt claims to be more secure, organizations looking for a better-tested option will still see more reliable results from bcrypt.
Argon2 is a new, award-winning password hashing function that’s become well-known in the cryptography community. However, for online and web-based passwords it has some weaknesses. If users are looking for short hashing times, Argon2 and its derivations are actually weaker than bcrypt. The way Argon2 functions relies on longer runtimes for security, while bcrypt just relies on iterations. Argon2 is also newer, so it hasn’t been tested as thoroughly. Essentially, bcrypt is considered more secure for any application designed to allow sign-ins in less than a second.
The bcrypt function may be older than some, but it’s stood the test of time. Though there may be modern algorithms that might have some theoretical advantages, they have not been as extensively tested. Until they have, or until bcrypt shows meaningful vulnerabilities for a password hashing or security use-case, it’s likely to remain the industry standard.
Clerk uses bcrypt to keep passwords secure, and can help it implement highly-secure, well-tested user authentication that will keep your user information safe.