“The Incredible Non-Uniqueness of an MD5 Hash”
|From:||Seth Dillingham||In Response To:||Top of Thread.|
|Date Posted:||Tuesday, March 5, 2002 12:28:51 PM||Replies:||4|
Someone told me that he was concerned that MD5 hashes might not be unique enough for the code I'm writing for him.
Briefly: I need a fixed-length, simple string to refer to pages on the web. URL's are too long, especially with all of the search args that some pages require to be sure that you're referring to a specific page.
My solution was to hash the URL with an MD5 algorithm. The MD5 algorithm available in Frontier (and Radio) produces a 32 character hash.
As I said, this person was concerned that MD5 hashes might not be "unique enough," especially since the hash it produces is only 32 characters long.
How many different "strings" (bits of text of any length) can be represented by a 32 byte MD5 hash? What's really being asked is, "How likely is it that we'll have two URL's which produce the same MD5 hash value?" So that makes it a simple math problem: determine how many different 32-character combinations are possible with the characters used in an MD5 hash.
The hash uses 0-9 and a-f, for 16 characters. So, 32 places each containing any of 16 characters... or 16^32.
Approximately 340,282,366,920,938,463,463,374,607,431,768,211,000 unique combinations of a 32 byte MD5 hash (if I did my math right). My calculator shows "3.402824e+38". (Please don't ask me to "speak" that number. I stopped counting after the octillions.)
So, the odds of running into two URL's which produce the same MD5 hash are 1 in that loong number. Not very likely at all. Certainly not a guarantee, since there are an infinite number of possible URLs and therefore there are an infinite number which could all produce the same MD5 hash string. In reality its safe to say that we'll never see it happen.
There are no trackbacks.
is Seth Dillingham's
personal web site.
More than the sum of my parts.