Creating your own Tinyurl style uid
Solution 1:
Why do you want to use a random function? I always assumed that tinyurl used a base 62 (0-9A-Za-z) representation of a sequential Id. No clashes and the urls are always as short as possible.
You would have a DB table like
Id URL
1 http://google.com
2 ...
... ...
156 ...
... ...
and the corresponding URLs would be:
http://example.com/1
http://example.com/2
...
http://example.com/2W
...
Solution 2:
Look up the Birthday Paradox, it's the exact problem that you're running into.
The question is: How many people do you need to get together in a room, so that you have a 50% chance of any two people having the same birthdate? The answer may surprise you.
Solution 3:
Some time ago I did exactly this, and I followed the way Sklivvz mentioned. The whole logic was developed with a SQL server stored procedure and a couple of UDF (user defined functions). The steps were:
- say that you want to shorten this url: Creating your own Tinyurl style uid
- Insert the URL in a table
- Obtain the @@identity value of the last insert (a numeric id)
- Transform the id in a corresponding alphanumeric value, based on a "domain" of letters and numbers (I actually used this set: "0123456789abcdefghijklmnopqrstuvwxyz")
- Return that value back, something like 'cc0'
The conversion was realized thru a couple of very short UDF.
Two conversion called one after the other would return "sequential" values like these:
select dbo.FX_CONV (123456) -- returns "1f5n"
select dbo.FX_CONV (123457) -- returns "1f5o"
If you are interested I can share the UDF's code.
Solution 4:
The probability of a collision against one specific ID is:
p = ( 0.5 * ( (0.5*1/10) + (0.5*1/26) ) )^6
which is around 1.7×10^-9.
The probability of a collision after generating n IDs is 1-p^n, so you'll have roughly a 0.17% chance of a collision for each new insertion after 1 million IDs have been inserted, around 1.7% after 10 million IDs, and around 16% after 100 million.
1000 IDs/minute works out to about 43 million/month, so as Sklivvz pointed out, using some incrementing ID is probably going to be a better way to go in this case.
EDIT:
To explain the math, he's essentially flipping a coin and then picking a number or letter 6 times. There's a 0.5 probability that the coin flip matches, and then 50% of the time there's a 1/10 chance of matching and a 50% chance of a 1/26 chance of matching. That happens 6 times independently, so you multiply those probabilities together.