TruerWords Logo
Google
 
Web www.truerwords.net

Search TruerWords

Welcome
Sign Up  Log On

“RE: The Incredible Non-Uniqueness of an MD5 Hash”

From: Seth Dillingham In Response To: 1747  Re: The Incredible Non-Uniqueness of an MD5 Hash
Date Posted: Wednesday, March 6, 2002 12:49:09 AM Replies: 1
   
Enclosures: None.

On 3/5/02, Greg Pierce said:

>You do have a bit of a catch-22 here, because two URLs could
>refer to the exact same page, if you're including args in
>your equation...ie,
>
>http://[host]/[path]?arg1=x&arg2=y
>
>is the programmatically the same thing as:
>
>http://[host]/[path]?arg2=y&arg1=x
>
>They would, however, generated different hashes...

That's not a catch 22, actually.

I didn't say that the object referred to by the ID had to be unique, I said only that the ID itself had to be unique.

This is a sort of web page indexing system. If more than one URL refers to exactly the same page, it's up to the crawler to determine that.

The reason I needed to use a hashing algorithm is that some URL's are too long for Frontier's table names. The names are limited to 255 characters, but (in spite of what others have said) there is no official limit to the length of a WWW URL.

Using the MD5 hash allows me to squeeze a URL of any length into a unique-in-practice string of 32 characters, which is *perfect* for my needs.

As I mentioned in the original piece, the random distribution of the characters in the MD5 also allows me to distribute the items evenly over a space of virtually any size. (Instead of a single table containing items for each ID, I'll have a table with 16 subtables (a-z and 0-9) and each of those will also have 16 subtables. If it's determined that more depth is needed, we could easily take it another level deep, and another, and another, etc.)

Wow, I'm amazed at the amount of traffic this subject generated. I don't even think anybody else pointed to it! Thank you for the discussion, everybody.

Seth


Discussion Thread:
Trackbacks:

There are no trackbacks.


Until August 31
My Amazon sales
benefit the PMC

Homepage Links

Apr 1 - Aug 31
Ad revenue
benefits the PMC


TruerWords
is Seth Dillingham's
personal web site.
Truer words were never spoken.