A fast hash function for string in C#
I want to hash a string of length up-to 30. What will be the best idea to do that if time is my concern. The function will be called over 100 million times. currently I am using the following code,
static UInt64 CalculateHash(string read, bool lowTolerance)
{
UInt64 hashedValue = 0;
int i = 0;
while (i < read.Length)
{
hashedValue += read.ElementAt(i) * (UInt64)Math.Pow(31, i);
if (lowTolerance) i += 2;
else i++;
}
return hashedValue;
}
Solution 1:
static UInt64 CalculateHash(string read)
{
UInt64 hashedValue = 3074457345618258791ul;
for(int i=0; i<read.Length; i++)
{
hashedValue += read[i];
hashedValue *= 3074457345618258799ul;
}
return hashedValue;
}
This is a Knuth hash. You can also use Jenkins.
Solution 2:
First of all, consider using GetHashCode()
.
A simple improvement on your existing implementation:
static UInt64 CalculateHash(string read, bool lowTolerance)
{
UInt64 hashedValue = 0;
int i = 0;
ulong multiplier = 1;
while (i < read.Length)
{
hashedValue += read[i] * multiplier;
multiplier *= 37;
if (lowTolerance) i += 2;
else i++;
}
return hashedValue;
}
It avoids the expensive floating point calculation, and the overhead of ElementAt
.
Btw (UInt64)Math.Pow(31, i)
doesn't work well for longer strings. Floating point rounding will lead to a multiplier of 0 for characters beyond 15 or so.
Solution 3:
To speed up your implementation, the (UInt64)Math.Pow(31, i)
call should be replaced by a lookup: pre-calculate a table of the first 30 powers of 31
, and use it at runtime. Since the limit on length is 30, you need only 31 element:
private static unsigned long[] Pow31 = new unsigned long[31];
static HashCalc() {
Pow31[0] = 1;
for (int i = 1 ; i != Pow31.Length ; i++) {
Pow31[i] = 31*Pow31[i-1];
}
}
// In your hash function...
hashedValue += read.ElementAt(i) * Pow31[i];