Complex topics explained simply: Hashing
Why is it that websites require you to reset your password, instead of just reminding you what it is? After all, they check your password when you enter it, can't they just look it up? It turns out the answer is no - websites actually don't know your password, and instead use a technique called "hashing" to check if what you log in with matches what you set when you created your account.
This is strange! It seems like if I'm going to tell you whether or not something matches your password, I have to know what your password is, right? Well, what about this: say your password is "rabbit", but instead of just writing down that your password is "rabbit", I instead write down that your password:
- is an english word,
- starts with an 'r',
- has 6 letters,
- contains two b's,
- and the third letter comes just after the second letter in the alphabet.
Now if came and told me your password was "frog" I'd say nope, just like if you came and told me your password was "trophy" or "ribbed". But if you asked me what your password was, I couldn't tell you - I can only tell you whether some word you give me matches the formula I have written down. In essence, I've created a "hash" of your password.
Now, our "hash" isn't a particularly good one - if you're clever, you could probably come up with another word besides "rabbit" that would match the formula we wrote down (try it out!). In hashing terms, this is called a "collision". Interestingly, while computers use a lot of complex math that makes it very, very rare that there is a collision, it is possible that you could go to facebook, enter something other than your password, and get in. You'd could spend a million lifetimes trying and probably never find anything, but it's not impossible - after all, facebook doesn't know your exact password, it only knows the hash - a formula about what your password "looks like" to a computer.
The other reason why our "hash" isn't particularly good is that given the formula we wrote down, you could probably figure out what the original password was. In hashing terms, this is called "reversing" or "cracking" the hash. It's not as easy as you might think to reverse a hash though - the formula we wrote seems easy because you know that we started out with the password "rabbit," but try coming up with the password given this hash:
- is an english word,
- contains 4 of the 6 vowels,
- has 8 common english words "inside" it (i.e. "bit" is a word inside "rabbit", but "rat" is not),
- is 8 letters long,
- has no duplicate letters,
- there're no other 8 letter words that can be rearranging its letters,
See if you can come up with the answer yourself. As it turns out, the way that you probably tried to figure it out is much like how computers crack passwords: come up with a list of likely words and try them out to see if they match. Security folks call this a "dictionary attack" or a "rainbow table," and it's why people recommend that you use long, hard-to-guess passwords. Just like in trying to avoid collisions, computers use a lot of complex math to make it very, very difficult to reverse hashes, and many of the modern ones even protect against dictionary attacks by making it hard to guess a bunch of passwords at once. After all, if you could get your hands on a hash and reverse it, you could steal someone's password and log into their account!
There's many different ways to create a hash of a password, and the best methods make it incredibly unlikely to have a collision and unfathomably difficult to reverse. What they all have in common is that they allow computers to recognize passwords without having to store the password itself - so if you call your bank and ask them to remind you what you chose for your password, it's not that they just won't tell you your password, it's that they actually can't.