Need a smaller alternative to GUID for DB ID but still unique and random for URL
I have looked all of the place for this and I can't seem to get a complete answer for this. So if the answer does already exist on stackoverflow then I apologize in advance.
I want a unique and random ID so that users in my website can't guess the next number and just hop to someone else's information. I plan to stick to a incrementing ID for the primary key but to also store a random and unique ID (sort of a hash) for that row in the DB and put an index on it.
From my searching I realize that I would like to avoid collisions and I have read some mentions of SHA1.
My basic requirements are
- Something smaller than a GUID. (Looks horrible in URL)
- Must be unique
- Avoid collisions
- Not a long list of strange characters that are unreadable.
An example of what I am looking for would be www.somesite.com/page.aspx?id=AF78FEB
I am not sure whether I should be implementing this in the database (I am using SQL Server 2005) or in the code (I am using C# ASP.Net)
EDIT:
From all the reading I have done I realize that this is security through obscurity. I do intend having proper authorization and authentication for access to the pages. I will use .Net's Authentication and authorization framework. But once a legitimate user has logged in and is accessing a legimate (but dynamically created page) filled with links to items that belong to him. For example a link might be www.site.com/page.aspx?item_id=123. What is stopping him from clicking on that link, then altering the URL above to go www.site.com/page.aspx?item_id=456 which does NOT belong to him? I know some Java technologies like Struts (I stand to be corrected) store everything in the session and somehow work it out from that but I have no idea how this is done.
Solution 1:
Raymond Chen has a good article on why you shouldn't use "half a guid", and offers a suitable solution to generating your own "not quite guid but good enough" type value here:
GUIDs are globally unique, but substrings of GUIDs aren't
His strategy (without a specific implementiation) was based on:
- Four bits to encode the computer number,
- 56 bits for the timestamp, and
- four bits as a uniquifier.
We can reduce the number of bits to make the computer unique since the number of computers in the cluster is bounded, and we can reduce the number of bits in the timestamp by assuming that the program won’t be in service 200 years from now.
You can get away with a four-bit uniquifier by assuming that the clock won’t drift more than an hour out of skew (say) and that the clock won’t reset more than sixteen times per hour.
Solution 2:
UPDATE (4 Feb 2017):
Walter Stabosz discovered a bug in the original code. Upon investigation there were further bugs discovered, however, extensive testing and reworking of the code by myself, the original author (CraigTP) has now fixed all of these issues. I've updated the code here with the correct working version, and you can also download a Visual Studio 2015 solution here which contains the "shortcode" generation code and a fairly comprehensive test suite to prove correctness.
One interesting mechanism I've used in the past is to internally just use an incrementing integer/long, but to "map" that integer to a alphanumeric "code".
Example
Console.WriteLine($"1371 as a shortcode is: {ShortCodes.LongToShortCode(1371)}");
Console.WriteLine($"12345 as a shortcode is: {ShortCodes.LongToShortCode(12345)}");
Console.WriteLine($"7422822196733609484 as a shortcode is: {ShortCodes.LongToShortCode(7422822196733609484)}");
Console.WriteLine($"abc as a long is: {ShortCodes.ShortCodeToLong("abc")}");
Console.WriteLine($"ir6 as a long is: {ShortCodes.ShortCodeToLong("ir6")}");
Console.WriteLine($"atnhb4evqqcyx as a long is: {ShortCodes.ShortCodeToLong("atnhb4evqqcyx")}");
// PLh7lX5fsEKqLgMrI9zCIA
Console.WriteLine(GuidToShortGuid( Guid.Parse("957bb83c-5f7e-42b0-aa2e-032b23dcc220") ) );
Code
The following code shows a simple class that will change a long to a "code" (and back again!):
public static class ShortCodes
{
// You may change the "shortcode_Keyspace" variable to contain as many or as few characters as you
// please. The more characters that are included in the "shortcode_Keyspace" constant, the shorter
// the codes you can produce for a given long.
private static string shortcodeKeyspace = "abcdefghijklmnopqrstuvwxyz0123456789";
public static string LongToShortCode(long number)
{
// Guard clause. If passed 0 as input
// we always return empty string.
if (number == 0)
{
return string.Empty;
}
var keyspaceLength = shortcodeKeyspace.Length;
var shortcodeResult = "";
var numberToEncode = number;
var i = 0;
do
{
i++;
var characterValue = numberToEncode % keyspaceLength == 0 ? keyspaceLength : numberToEncode % keyspaceLength;
var indexer = (int) characterValue - 1;
shortcodeResult = shortcodeKeyspace[indexer] + shortcodeResult;
numberToEncode = ((numberToEncode - characterValue) / keyspaceLength);
}
while (numberToEncode != 0);
return shortcodeResult;
}
public static long ShortCodeToLong(string shortcode)
{
var keyspaceLength = shortcodeKeyspace.Length;
long shortcodeResult = 0;
var shortcodeLength = shortcode.Length;
var codeToDecode = shortcode;
foreach (var character in codeToDecode)
{
shortcodeLength--;
var codeChar = character;
var codeCharIndex = shortcodeKeyspace.IndexOf(codeChar);
if (codeCharIndex < 0)
{
// The character is not part of the keyspace and so entire shortcode is invalid.
return 0;
}
try
{
checked
{
shortcodeResult += (codeCharIndex + 1) * (long) (Math.Pow(keyspaceLength, shortcodeLength));
}
}
catch(OverflowException)
{
// We've overflowed the maximum size for a long (possibly the shortcode is invalid or too long).
return 0;
}
}
return shortcodeResult;
}
}
}
This is essentially your own baseX numbering system (where the X is the number of unique characters in the shortCode_Keyspace constant.
To make things unpredicable, start your internal incrementing numbering at something other than 1 or 0 (i.e start at 184723) and also change the order of the characters in the shortCode_Keyspace constant (i.e. use the letters A-Z and the numbers 0-9, but scamble their order within the constant string. This will help make each code somewhat unpredictable.
If you're using this to "protect" anything, this is still security by obscurity, and if a given user can observe enough of these generated codes, they can predict the relevant code for a given long. The "security" (if you can call it that) of this is that the shortCode_Keyspace constant is scrambled, and remains secret.
EDIT: If you just want to generate a GUID, and transform it to something that is still unique, but contains a few less characters, this little function will do the trick:
public static string GuidToShortGuid(Guid gooid)
{
string encoded = Convert.ToBase64String(gooid.ToByteArray());
encoded = encoded.Replace("/", "_").Replace("+", "-");
return encoded.Substring(0, 22);
}