How to implement the Hashable Protocol in Swift for an Int array (a custom string struct)
I am making a structure that acts like a String
, except that it only deals with Unicode UTF-32 scalar values. Thus, it is an array of UInt32
. (See this question for more background.)
What I want to do
I want to be able to use my custom ScalarString
struct as a key in a dictionary. For example:
var suffixDictionary = [ScalarString: ScalarString]() // Unicode key, rendered glyph value
// populate dictionary
suffixDictionary[keyScalarString] = valueScalarString
// ...
// check if dictionary contains Unicode scalar string key
if let renderedSuffix = suffixDictionary[unicodeScalarString] {
// do something with value
}
Problem
In order to do that, ScalarString
needs to implement the Hashable Protocol. I thought I would be able to do something like this:
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
var hashValue : Int {
get {
return self.scalarArray.hashValue // error
}
}
}
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.hashValue == right.hashValue
}
but then I discovered that Swift arrays don't have a hashValue
.
What I read
The article Strategies for Implementing the Hashable Protocol in Swift had a lot of great ideas, but I didn't see any that seemed like they would work well in this case. Specifically,
-
Object property (array is does not have
hashValue
) - ID property (not sure how this could be implemented well)
- Formula (seems like any formula for a string of 32 bit integers would be processor heavy and have lots of integer overflow)
- ObjectIdentifier (I'm using a struct, not a class)
- Inheriting from NSObject (I'm using a struct, not a class)
Here are some other things I read:
- Implementing Swift's Hashable Protocol
- Swift Comparison Protocols
- Perfect hash function
- Membership of custom objects in Swift Arrays and Dictionaries
- How to implement Hashable for your custom class
- Writing a good Hashable implementation in Swift
Question
Swift Strings have a hashValue
property, so I know it is possible to do.
How would I create a hashValue
for my custom structure?
Updates
Update 1: I would like to do something that does not involve converting to String
and then using String
's hashValue
. My whole point for making my own structure was so that I could avoid doing lots of String
conversions. String
gets it's hashValue
from somewhere. It seems like I could get it using the same method.
Update 2: I've been looking into the implementation of string hash codes algorithms from other contexts. I'm having a little difficulty knowing which is best and expressing them in Swift, though.
- Java
hashCode
algorithm - C algorithms
- hash function for string (SO question and answers in C)
- Hashing tutorial (Virginia Tech Algorithm Visualization Research Group)
- General Purpose Hash Function Algorithms
Update 3
I would prefer not to import any external frameworks unless that is the recommended way to go for these things.
I submitted a possible solution using the DJB Hash Function.
Solution 1:
Update
Martin R writes:
As of Swift 4.1, the compiler can synthesize
Equatable
andHashable
for types conformance automatically, if all members conform to Equatable/Hashable (SE0185). And as of Swift 4.2, a high-quality hash combiner is built-in into the Swift standard library (SE-0206).Therefore there is no need anymore to define your own hashing function, it suffices to declare the conformance:
struct ScalarString: Hashable, ... { private var scalarArray: [UInt32] = [] // ... }
Thus, the answer below needs to be rewritten (yet again). Until that happens refer to Martin R's answer from the link above.
Old Answer:
This answer has been completely rewritten after submitting my original answer to code review.
How to implement to Hashable protocol
The Hashable protocol allows you to use your custom class or struct as a dictionary key. In order to implement this protocol you need to
- Implement the Equatable protocol (Hashable inherits from Equatable)
- Return a computed
hashValue
These points follow from the axiom given in the documentation:
x == y
impliesx.hashValue == y.hashValue
where x
and y
are values of some Type.
Implement the Equatable protocol
In order to implement the Equatable protocol, you define how your type uses the ==
(equivalence) operator. In your example, equivalence can be determined like this:
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
The ==
function is global so it goes outside of your class or struct.
Return a computed hashValue
Your custom class or struct must also have a computed hashValue
variable. A good hash algorithm will provide a wide range of hash values. However, it should be noted that you do not need to guarantee that the hash values are all unique. When two different values have identical hash values, this is called a hash collision. It requires some extra work when there is a collision (which is why a good distribution is desirable), but some collisions are to be expected. As I understand it, the ==
function does that extra work. (Update: It looks like ==
may do all the work.)
There are a number of ways to calculate the hash value. For example, you could do something as simple as returning the number of elements in the array.
var hashValue: Int {
return self.scalarArray.count
}
This would give a hash collision every time two arrays had the same number of elements but different values. NSArray
apparently uses this approach.
DJB Hash Function
A common hash function that works with strings is the DJB hash function. This is the one I will be using, but check out some others here.
A Swift implementation provided by @MartinR follows:
var hashValue: Int {
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
This is an improved version of my original implementation, but let me also include the older expanded form, which may be more readable for people not familiar with reduce
. This is equivalent, I believe:
var hashValue: Int {
// DJB Hash Function
var hash = 5381
for(var i = 0; i < self.scalarArray.count; i++)
{
hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
}
return hash
}
The &+
operator allows Int
to overflow and start over again for long strings.
Big Picture
We have looked at the pieces, but let me now show the whole example code as it relates to the Hashable protocol. ScalarString
is the custom type from the question. This will be different for different people, of course.
// Include the Hashable keyword after the class/struct name
struct ScalarString: Hashable {
private var scalarArray: [UInt32] = []
// required var for the Hashable protocol
var hashValue: Int {
// DJB hash function
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}
}
// required function for the Equatable protocol, which Hashable inheirits from
func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}
Other helpful reading
- Which hashing algorithm is best for uniqueness and speed?
- Overflow Operators
- Why are 5381 and 33 so important in the djb2 algorithm?
- How are hash collisions handled?
Credits
A big thanks to Martin R over in Code Review. My rewrite is largely based on his answer. If you found this helpful, then please give him an upvote.
Update
Swift is open source now so it is possible to see how hashValue
is implemented for String
from the source code. It appears to be more complex than the answer I have given here, and I have not taken the time to analyze it fully. Feel free to do so yourself.
Solution 2:
Edit (31 May '17): Please refer to the accepted answer. This answer is pretty much just a demonstration on how to use the CommonCrypto
Framework
Okay, I got ahead and extended all arrays with the Hashable
protocol by using the SHA-256 hashing algorithm from the CommonCrypto framework. You have to put
#import <CommonCrypto/CommonDigest.h>
into your bridging header for this to work. It's a shame that pointers have to be used though:
extension Array : Hashable, Equatable {
public var hashValue : Int {
var hash = [Int](count: Int(CC_SHA256_DIGEST_LENGTH) / sizeof(Int), repeatedValue: 0)
withUnsafeBufferPointer { ptr in
hash.withUnsafeMutableBufferPointer { (inout hPtr: UnsafeMutableBufferPointer<Int>) -> Void in
CC_SHA256(UnsafePointer<Void>(ptr.baseAddress), CC_LONG(count * sizeof(Element)), UnsafeMutablePointer<UInt8>(hPtr.baseAddress))
}
}
return hash[0]
}
}
Edit (31 May '17): Don't do this, even though SHA256 has pretty much no hash collisions, it's the wrong idea to define equality by hash equality
public func ==<T>(lhs: [T], rhs: [T]) -> Bool {
return lhs.hashValue == rhs.hashValue
}
This is as good as it gets with CommonCrypto
. It's ugly, but fast and not manypretty much no hash collisions for sure
Edit (15 July '15): I just made some speed tests:
Randomly filled Int
arrays of size n took on average over 1000 runs
n -> time
1000 -> 0.000037 s
10000 -> 0.000379 s
100000 -> 0.003402 s
Whereas with the string hashing method:
n -> time
1000 -> 0.001359 s
10000 -> 0.011036 s
100000 -> 0.122177 s
So the SHA-256 way is about 33 times faster than the string way. I'm not saying that using a string is a very good solution, but it's the only one we can compare it to right now
Solution 3:
It is not a very elegant solution but it works nicely:
"\(scalarArray)".hashValue
or
scalarArray.description.hashValue
Which just uses the textual representation as a hash source