A previous version of this article confused the process of "hashing" with the process of "encryption". It's worth noting that, while similar, hashing and encryption are two different processes.
Hashing involves a many-to-one transformation, where a given input is mapped to a (usually fixed-size, usually shorter) output, which is not unique to that particular input. In other words, collisions are likely when running a hashing algorithm over a huge range of input values (where different inputs map to the same output). Hashing is an irreversible process because of the nature of hashing algorithms. This SO answer gives a good overview of the differences between hashing and encryption and provides a nice example of why a hashing algorithm might be practically irreversible.
As a mathematical example, consider the modulo (aka. modulus aka. mod) function (expressed here as defined by Donald Knuth):
a % n = mod(a, n) = a - n * floor(a/n)
The simple interpretation of the mod function is that it's the remainder of integer division:
7 % 2 = 1
4 % 3 = 1
33 % 16 = 1
Note how all three examples above give the same output even though they have different inputs. The modulo function is non-invertible because we lose information when we apply it. Even given one of the inputs along with the output, there's no way to calculate the other input value. You just have to guess until you get it right.
Hashing algorithms take advantage of non-invertible functions like this (as well as bit shifts, etc.), and are often repeated many times, astronomically increasing the number of required guesses. The goal of hashing is to make it extremely expensive, computationally, to decode the original information. Oftentimes, it's easier to brute force a hashing algorithm (by trying many possible inputs, as quickly as possible) than to try to "reverse" the hashing algorithm to decode the hashed information.
A common method of deterring even these brute-force attacks is to add a second random piece of information as "salt". This prevents hackers from performing dictionary attacks, where a list of common passwords are mapped to their hashed outputs -- if the hacker knows the hashing algorithm used and can gain access to the database where the hashes are stored, they can use their "dictionary" to map back to the original password and gain access to those accounts. Salting the data means that not only do the hackers have to run a particular password through the hashing algorithm and verify that it matches the hashed output, but they have to re-run that process for every possible value of the salt (usually a string of tens to hundreds of bytes). A random 100-byte salt means (100*2^8 or) 256,000 total password-salt combinations must be tried for each possible password.
Another method of deterring brute-force attacks is to simply require your algorithm to take some long amount of time to run. If your algorithm takes just 2 seconds to run and uses the 100-byte salt mentioned above, it would take nearly 6 days to try all possible salt strings for just a single potential password. This is partly why hashing algorithms are often iterated thousands of times. To produce a secure hash, make sure you re-introduce the salt each time you iterate, otherwise you're setting yourself up for more hash collisions than necessary (see that SO link, above).
Encryption, as opposed to hashing, is always one-to-one and reversible (via decryption). So a particular input will always produce a particular output and only that specific input will produce that specific output. Unlike hashing algorithms, which produce hashes of a fixed length, encryption algorithms will produce variable-length outputs. A good encryption algorithm should produce output which is indistinguishable from random noise, so that patterns in the output cannot be exploited in order to decode it.
Encryption should be used when the data stored needs to be extracted at some point. For instance, messaging apps might encrypt data before transporting it, but that data needs to be decrypted back into plaintext once it's received, so that the recipient can read it. Note that this is not the case with passwords: if your password and the stored salt generate the hashcode stored in the database, then it's very likely that the password you input is the correct one. It's safer to hash the password, then, and just re-calculate the hash code than it is to store an encrypted password which could possibly be decrypted.
But if encrypted information can be decrypted, how can it be secure? Well, there are two standard methods of encryption and decryption, symmetric key encryption and asymmetric key encryption.
Symmetric key encryption means that the same cryptographic key is used to encrypt and decrypt the data. A common analogy used to explain encryption is sending locked packages through the post. If you put a padlock on a box and send it to me, and I have a copy of the key which opens the padlock, then I can easily unlock it and read your message inside. I can send you back a message and you can use the same key to open the box on the other side.
While the box is in transport, no one can open the box and read the message unless they have the same symmetric key that we have. The idea, then, is to keep the key a secret and not share it with anyone other than the intended recipient of the message. Since there are two symmetric private keys, though, if anyone manages to steal (or create) a copy of either private key, all future messages between you and me will be compromised. In other words, we trust each other to keep our keys secure.
Asymmetric key encryption is slightly different, in that you and I both have padlocks and keys, but the first package we send to each other in the post should be our unlocked padlocks:
Then, you can write a message, and lock the message in a box with my padlock. From that point on, the only person who can unlock the box is me, with my private key. When I receive your message, I lock it in a box with your padlock and send it back to you. From that point on, the only person who can unlock that box is you, with your private key.
This method is also known as public key encryption because, oftentimes, the "padlock" in this case is made widely available. Anyone who wants to encrypt a message intended for a particular recipient can do so. Public key encryption is recommended over plaintext passwords because it effectively makes the password much longer and more difficult to guess, reduces the chances of someone "looking over your shoulder" and stealing your password, and is just generally much easier than retyping a password over and over again.
Obviously this barely scratches the surface of hashing and encryption, but I hope it gives you a better understanding of the differences between the two. Now, back to our regularly scheduled programming...
Original article (updated for clarity):
The process of hashing a password in Java can be difficult to get your head around at first, but there are really only three things you need:
- a password
- a hashing algorithm
- some tasty
salt
(Note that encryption and decryption is also possible in Java, but is generally not used for passwords, because the password itself doesn't need to be recovered. We just need to check that the password the user enters recreates the hash that we've saved in a database.)
Password-based encryption generates a cryptographic key using a user password as a starting point. We irreversibly convert the password to a fixed-length hash code using a one-way hash function, adding a second random string as "salt", to prevent hackers from performing dictionary attacks, where a list of common passwords are mapped to their hashed outputs -- if the hacker knows the hashing algorithm used and can gain access to the database where the hashcodes are stored, they can use their "dictionary" to map back to the original password and gain access to those accounts.
We can generate salt simply by using Java's SecureRandom class:
import java.security.SecureRandom;
import java.util.Base64;
import java.util.Optional;
private static final SecureRandom RAND = new SecureRandom();
public static Optional<String> generateSalt (final int length) {
if (length < 1) {
System.err.println("error in generateSalt: length must be > 0");
return Optional.empty();
}
byte[] salt = new byte[length];
RAND.nextBytes(salt);
return Optional.of(Base64.getEncoder().encodeToString(salt));
}
(Note: you could also get a SecureRandom
instance with SecureRandom.getInstanceStrong(), though this throws a NoSuchAlgorithmException
and so needs to be wrapped in a try{} catch(){}
block.)
Next, we need the hashing code itself:
import java.security.NoSuchAlgorithmException;
import java.security.spec.InvalidKeySpecException;
import java.util.Arrays;
import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.PBEKeySpec;
private static final int ITERATIONS = 65536;
private static final int KEY_LENGTH = 512;
private static final String ALGORITHM = "PBKDF2WithHmacSHA512";
public static Optional<String> hashPassword (String password, String salt) {
char[] chars = password.toCharArray();
byte[] bytes = salt.getBytes();
PBEKeySpec spec = new PBEKeySpec(chars, bytes, ITERATIONS, KEY_LENGTH);
Arrays.fill(chars, Character.MIN_VALUE);
try {
SecretKeyFactory fac = SecretKeyFactory.getInstance(ALGORITHM);
byte[] securePassword = fac.generateSecret(spec).getEncoded();
return Optional.of(Base64.getEncoder().encodeToString(securePassword));
} catch (NoSuchAlgorithmException | InvalidKeySpecException ex) {
System.err.println("Exception encountered in hashPassword()");
return Optional.empty();
} finally {
spec.clearPassword();
}
}
...there's a lot going on here, so let me explain step-by-step:
public static Optional<String> hashPassword (String password, String salt) {
char[] chars = password.toCharArray();
byte[] bytes = salt.getBytes();
First, we ultimately need the password as a char[]
, but we have the user pass it in as a String
-- (how else would we get a password from the user?) -- so we must convert it to a char[]
at the outset. The salt is also passed in as a String
and must be converted to a byte[]
. The assumption here is that the hashed password and salt will be written to a database as character strings, so we want to generate the salt outside this algorithm as a String
and pass it in as a String
, as well.
Keeping the user's password in a String
is dangerous, because Java String
s are immutable -- once one's made, it can't be overwritten to hide the user's password. So it's best to gather the password, do what we need to do with it, and immediately toss the reference to the original password String
so it can be garbage collected. (You can suggest that the JVM garbage collect a dead reference with System.gc()
, but garbage collection occurs at unpredictable intervals and cannot be forced to occur.) Similarly, if we convert the password String
to a char[]
, we should clear out the array when we're finished with it (more on that later).
PBEKeySpec spec = new PBEKeySpec(chars, bytes, ITERATIONS, KEY_LENGTH);
Here, we're specifying how we're going to generate the hashed password. chars
is the plaintext password as a char[]
, bytes
is the salt String
converted to a byte[]
, ITERATIONS
is how many times we should perform the hashing algorithm, and KEY_LENGTH
is the desired length of the resulting cryptographic key, in bits.
When we perform the hashing algorithm, we take the plaintext password and the salt and generate some pseudorandom output string. Iterated hashing algorithms will then repeat that process some number of times, using the output from the first hash as the input to the second hash. This is also known as key stretching and it greatly increases the time required to perform brute-force attacks (having a large salt string also makes these kind of attacks more difficult). Note that while increasing the number of iterations increases the time required to hash the password and makes your database less vulnerable to brute-force attacks, it can also make you more vulnerable to DoS attacks because of the extra processing time required.
The length of the final hashed password is limited by the algorithm used. For example, "PBKDF2WithHmacSHA1" allows hashes of up to 160 bits, while "PBKDF2WithHmacSHA512" hashes can be as long as 512 bits. Specifying a KEY_LENGTH
longer than the maximum key length of the algorithm you've chosen will not lengthen the key beyond its specified maximum, and can actually slow down the algorithm.
Finally, note that spec
holds all of the information about the algorithm, the original plaintext password, and so on. We really want to delete all of that when we're finished.
Arrays.fill(chars, Character.MIN_VALUE);
We're done with the chars
array now, so we can clear it out. Here, we set all elements of the array to \000
(the null character).
try {
SecretKeyFactory fac = SecretKeyFactory.getInstance(ALGORITHM);
byte[] securePassword = fac.generateSecret(spec).getEncoded();
return Optional.of(Base64.getEncoder().encodeToString(securePassword));
} catch (NoSuchAlgorithmException | InvalidKeySpecException ex) {
System.err.println("Exception encountered in hashPassword()");
return Optional.empty();
} finally {
spec.clearPassword();
}
Our hashPassword()
method ends on this try{} catch(){}
block. In it, we first get the algorithm that we defined earlier ("PBKDF2WithHmacSHA512") and we use that algorithm to hash the plaintext password, according to the specifications laid out in spec
. generateSecret()
returns a SecretKey object, which is "an opaque representation of the cryptographic key", meaning that it contains only the hashed password and no other identifying information. We use getEncoded()
to get the hashed password as a byte[]
and save it as securePassword
.
If this all goes off without a hitch, we encode that byte[]
in base-64 (so it's composed only of printable ASCII characters) and return it as a String
. We do this so that the hashed password can be saved in a database as a character string without any encoding issues.
If there are any Exceptions
during the encryption process, we return an empty Optional
. Otherwise, we finish the method by clearing the password from spec
. Now, there are no references left to the original, plaintext password in this method. (Note that the finally
block is executed whether or not there is an Exception
and before anything is return
ed from any preceding try
or catch
blocks, so it's safe to have it at the end like this -- the password will be cleared from spec
.)
The last thing to do is write a small method which determines whether or not a given plaintext password generates the hashed password, when the same salt is used. In other words, we want a function that tells us if the entered password is correct or not:
public static boolean verifyPassword (String password, String key, String salt) {
Optional<String> optEncrypted = hashPassword(password, salt);
if (!optEncrypted.isPresent()) return false;
return optEncrypted.get().equals(key);
}
The above uses the original salt
string and a plaintext password
in question to generate a hashed password which is compared against the previously-generated hash key
. This method returns true
only if the password
is correct and there were no errors re-hashing the plaintext password.
Okay, now that we have all of this, let's test it! (Note: I have these methods defined in a utility class called PasswordUtils
, in a package called watson
.)
jshell> import watson.*
jshell> String salt = PasswordUtils.generateSalt(512).get()
salt ==> "DARMFcJcJDeNMmNMLkZN4rSnHV2OQPDd27yi5fYQ77r2vKTa ... Wt9QZog0wtkx8DQYEAOOwQVs="
jshell> String password = "Of Salesmen!"
password ==> "Of Salesmen!"
jshell> String key = PasswordUtils.hashPassword(password, salt).get()
key ==> "djaaKTM/+X14XZ6rxjN68l3Zx4+5WGkJo3nAs7KzjISiT6aa ... sN5DcmOeMfhqMGCNxq6TIhg=="
jshell> PasswordUtils.verifyPassword("Of Salesmen!", key, salt)
$5 ==> true
jshell> PasswordUtils.verifyPassword("By-Tor! And the Snow Dog!", key, salt)
$6 ==> false
Works like a charm! Go forth and hash!