Haskell type vs. newtype with respect to type safety [closed]
The main uses for newtypes are:
- For defining alternative instances for types.
- Documentation.
- Data/format correctness assurance.
I'm working on an application right now in which I use newtypes extensively. newtypes
in Haskell are a purely compile-time concept. E.g. with unwrappers below, unFilename (Filename "x")
compiled to the same code as "x". There is absolutely zero run-time hit. There is with data
types. This makes it a very nice way to achieve the above listed goals.
-- | A file name (not a file path).
newtype Filename = Filename { unFilename :: String }
deriving (Show,Eq)
I don't want to accidentally treat this as a file path. It's not a file path. It's the name of a conceptual file somewhere in the database.
It's very important for algorithms to refer to the right thing, newtypes help with this. It's also very important for security, for example, consider upload of files to a web application. I have these types:
-- | A sanitized (safe) filename.
newtype SanitizedFilename =
SanitizedFilename { unSafe :: String } deriving Show
-- | Unique, sanitized filename.
newtype UniqueFilename =
UniqueFilename { unUnique :: SanitizedFilename } deriving Show
-- | An uploaded file.
data File = File {
file_name :: String -- ^ Uploaded file.
,file_location :: UniqueFilename -- ^ Saved location.
,file_type :: String -- ^ File type.
} deriving (Show)
Suppose I have this function which cleans a filename from a file that's been uploaded:
-- | Sanitize a filename for saving to upload directory.
sanitizeFilename :: String -- ^ Arbitrary filename.
-> SanitizedFilename -- ^ Sanitized filename.
sanitizeFilename = SanitizedFilename . filter ok where
ok c = isDigit c || isLetter c || elem c "-_."
Now from that I generate a unique filename:
-- | Generate a unique filename.
uniqueFilename :: SanitizedFilename -- ^ Sanitized filename.
-> IO UniqueFilename -- ^ Unique filename.
It's dangerous to generate a unique filename from an arbitrary filename, it should be sanitized first. Likewise, a unique filename is thus always safe by extension. I can save the file to disk now and put that filename in my database if I want to.
But it can also be annoying to have to wrap/unwrap a lot. In the long run, I see it as worth it especially for avoiding value mismatches. ViewPatterns help somewhat:
-- | Get the form fields for a form.
formFields :: ConferenceId -> Controller [Field]
formFields (unConferenceId -> cid) = getFields where
... code using cid ..
Maybe you'll say that unwrapping it in a function is a problem -- what if you pass cid
to a function wrongly? Not an issue, all functions using a conference id will use the ConferenceId type. What emerges is a sort of function-to-function-level contract system that is forced at compile time. Pretty nice. So yeah I use it as often as I can, especially in big systems.
I think this is mostly a matter of the situation.
Consider pathnames. The standard prelude has "type FilePath = String" because, as a matter of convenience, you want to have access to all the string and list operations. If you had "newtype FilePath = FilePath String" then you would need filePathLength, filePathMap and so on, or else you would forever be using conversion functions.
On the other hand, consider SQL queries. SQL injection is a common security hole, so it makes sense to have something like
newtype Query = Query String
and then add extra functions that will convert a string into a query (or query fragment) by escaping quote characters, or fill in blanks in a template in the same way. That way you can't accidentally convert a user parameter into a query without going through the quote escaping function.