Creating unique labels in Haskell
I'm writing a compiler for a simple imperative language in Haskell, outputting Java bytecode. I've gotten to the point where I'm emitting an abstract representation of bytecodes.
While writing code for compiling if-statements I ran in to some trouble. To implement if-statements I need labels to jump to. Therefore I need to generate a name for that label, and that name needs to be unique.
My first thought was to thread some state through compileStatement
, i.e
compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
Of course, compilerStatement
is recursive, so using this method would require me to pass the state of the unique ID generator back upp from the recursive calls:
compileStatement :: Statement -> UniqueIDState -> (UniqueIdState, [AbstractInstruction])
This seems a bit clumsy, especially if I realize I need to carry around more state in the future; is there a more elegant way?
Solution 1:
You need a "unique supply". The usual way to do this in Haskell is by threading a counter through the State monad, which automates the plumbing problem you describe.
Solution 2:
I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
Abraham Maslow.
How about something different - a unique-supply that isn't a member of the Monad
class. As it happens, you were almost there with your original type signature:
compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
If the only requirement is that each label is unique - no need to count how many were used, providing the same identifiers given the same circumstances, etc - there's a less-invasive technique you can use.
In IO-free spillable splittable supplies, Luke Palmer shows how value supplies can be encapsulated:
runSupply :: (forall a . Eq a => Supply a -> b) -> b
This avoids having the monadic IO
type taint large parts of the programs which use them: nice! But that isn't the only problem - depending on how they're defined, the onus is on you to use such supplies correctly. For example, assuming:
data Statement =
... | If Statement Statement Statement | ...
then if:
compileStatement (If c t e) s =
case split s of
s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s1)
(compileStatement t s2)
(compileStatement e s3)
is mistakenly changed to:
compileStatement (If c t e) s =
case split s of
s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s)
(compileStatement t s)
(compileStatement e s)
not only are UniqueSupply
and Unique
values being erroneously reused, there's the potential for a space leak if any of the recursive calls to compileStatement
uses the supply intensively.
Unlike Clean, Haskell has no standard way of marking types as monousal. That leaves checks at runtime as the only option: definitely a job for an abstract data type!
Here's a thought - if that ADT was also spittable splittable, we could be able to use it to define an alternate value-supply type. All going well, the values of this new type would then have both properties: slippable splittable and monousal.
Looking at Data.Supply
reveals the use of a binary-tree type - the module and definitions seem to be based on the following example from the functional pearl [On generating unique names], written by Lennart Augustsson, Mikael Rittri and Dan Synek - from page 4 of 7:
module HideGensym(
Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply)
where
gensym :: a -> Int -- implemented in assembler
data Name = MkName Int deriving (Eq)
data NameSupply = MkNameSupply Name NameSupply NameSupply
initialNameSupply = gen ()
where gen x = MkNameSupply (MkName (gensym x)) (gen x) (gen x)
getNameDeplete (MkNameSupply n s1 _) = (n, s1)
splitNameSupply (MkNameSupply _ s1 s2) = (s1, s2)
...gensym
: we'll leave that one for now. Let's look at how we can insert the new ADT into NameSupply
...after we attend to a more mundane matter: - two identical calls to gen
- MkNameSupply ... (gen x) (gen x)
- to an optimising Haskell implementation, they're the same value:
-- same function, same argument, same result: what's the matter?
initialNameSupply = gen ()
where gen x = let s = gen x in
MkNameSupply (MkName (gensym x)) s s
Then again, maybe we can solve both problems at once:
initialNameSupply = runUO gen
where gen u = let !(u1, u2) = splitUO2 u in
MkNameSupply (MkName (gensym ())) (gen u1) (gen u2)
where UO
will be our new use-once split-ready abstract-data type:
module UO(
UO, initUO, splitUO, splitUO2, ...
) where
data UO s ...
runUO :: (forall s . UO s -> a) -> a
splitUO :: UO s -> [UO s]
splitUO2 :: UO s -> (UO s, UO s)
⋮
...which can also be encapsulated.
(Surely there must be a better word than spiltable splittable in the English language...)
Now for the gensym
problem - let's start with this cautionary remark about that HideGensym
module, also on page 4 of 7:
The
gensym
[thing] must be coded in assembler, and possibly also thegen
function.
...otherwise that single call to gensym
might be lifted all the way out: remember gen ()
?
{- WRONG! -}
initialNameSupply = runUO gen
where gen u = let !(u1, u2) = splitUO2 u in
MkNameSupply (MkName x) (gen u1) (gen u2)
x = gensym ()
Since gensym
(supposedly!) accepts any type of input:
gensym :: a -> Int -- implemented in assembler
this shouldn't break anything:
initialNameSupply = runUO gen
where gen u = let !(u1:u2:u3:_) = splitUO u in
MkNameSupply (MkName (gensym u1)) (gen u2) (gen u3)
As a bonus, we can make a slightly-more generic version of initialNameSupply
:
initialNameSupply = initialSupply gensym
initialSupply :: (UO s -> Int) -> NameSupply
initialSupply g = runUO gen
where gen u = let !(u1:u2:u3:_) = splitUO u in
MkNameSupply (MkName (g u1)) (gen u2) (gen u3)
(...alright, so gensym
is still there - at least now it's isolated.)
By now you've probably noticed the other example module OneTimeSupplies
, with its own cautionary remark:
It is referentially transparent only if each supply is used at most once.
In addition, back on page 3 of 7:
If a compile-time analysis of a program can guarantee that every name supply is used at most once, either to do
getNameDeplete
orsplitNameSupply
, the tree becomes unnecessary [...]
Since we're relying on UO
to provide the same guarantee, can we also go tree-free in our implementation and save some work?
To do that, supplyValue
and the split
s will need an upgrade:
-
the simplest option for
supplyValue
is to provide it with the generator (g
ininitialSupply
).data NameSupply = forall s . Supply (UO s -> Int) ... supplyValue :: NameSupply -> Name supplyValue (Supply g ...) = MkName (g ...)
-
as for the
split
s, they require anUO
value so they can obtain the newUO
values needed by the new supplies:data NameSupply = forall s . Supply (UO s) ... split :: NameSupply -> [NameSupply] split (Supply u ...) = [ Supply v ... | v <- splitUO u ] split2 :: NameSupply -> (NameSupply, NameSupply) split2 (Supply u ...) = let !(u1, u2) = splitUO2 u in (Supply u1 ..., Supply u2 ...)
That clearly suggests:
data NameSupply = forall s . Supply (UO s -> Int) (UO s)
supplyValue (Supply g u) = MkName (g u)
split (Supply g u) = [ Supply g v | v <- splitUO u ]
split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
(Supply g u1, Supply g u2)
But does it also work for initialNameSupply
?
initialNameSupply = initialSupply gensym
initialSupply :: (UO s -> Int) -> NameSupply
initialSupply = runUO . Supply
It gets better:
type NameSupply = Supply Name
data Name = MkName Int deriving (Eq)
initialNameSupply = initialSupply (MkName . gensym)
-- NameSupply --
-- ================ --
-- Supply --
data Supply a = forall s . Supply (UO s -> a) (UO s)
instance Functor Supply where
fmap f (Supply g u) = Supply (f . g) u
supplyValue :: Supply a -> a
supplyValue (Supply g u) = g u
split :: Supply a -> [Supply a]
split (Supply g u) = [ Supply g v | v <- splitUO u ]
split2 :: Supply a -> (Supply a, Supply a)
split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
(Supply g u1, Supply g u2)
initialSupply :: (UO s -> a) -> NameSupply
initialSupply = runUO . Supply
This is very promising, if UO
and associates can be defined as intended...
If you've read the post by Luke Palmer, you already know that he uses an ugly
unsafe
entity to define runSupply
. Well, right now (2022 Jan) runST
is defined in similar fashion:
runST :: (forall s. ST s a) -> a
runST (ST st_rep) = case runRW# st_rep of (# _, a #) -> a
where:
newtype ST s a = ST (STRep s a)
type STRep s a = State# s -> (# State# s, a #
runRW# :: STRep RealWorld a -> (# State# RealWorld, a #)
Can UO
be defined without resorting to such measures? That's probably worthy of a separate answer - for now, we'll just tolerate the ugliness:
{-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
module UO(
UO, runUO, splitUO, splitUO2,
useUO, asUO,
) where
import Prelude (String, Eq(..))
import Prelude ((.), ($), (++), error, all)
import Data.Char (isSpace)
import GHC.Base (State#, MutVar#)
import GHC.Base (runRW#, newMutVar#, noDuplicate#)
import GHC.Exts (atomicModifyMutVar#)
import GHC.ST (ST(..), STRep)
data UO s = UO (UO# s)
runUO :: (forall s . UO s -> a) -> a
runUO g = let (# _, r #) = runRW# (useUO# (g . UO)) in r
splitUO :: UO s -> [UO s]
splitUO u = let !(u1, u2) = splitUO2 u in u1 : splitUO u
splitUO2 :: UO s -> (UO s, UO s)
splitUO2 (UO h) = let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)
useUO :: (UO s -> a) -> ST s a
useUO g = ST (\s -> useUO# (g . UO) s)
asUO :: Eq a => String -> ST s a -> UO s -> a
asUO name (ST act) (UO h)
= asUO# name act h
-- local definitions --
type UO# s = String -> State# s
splitUO2# :: UO# s -> (# UO# s, UO# s #)
splitUO2# h = let !s = h "splitUO2"
(# s', h1 #) = dispense# s
(# _, h2 #) = dispense# s'
in (# h1, h2 #)
useUO# :: (UO# s -> a) -> STRep s a
useUO# g s = let (# s', h #) = dispense# s
!r = g h
in (# s', r #)
dispense# :: STRep s (UO# s)
dispense# s = let (# s', r #) = newMutVar# () s
in (# s', expire# s' r #)
expire# :: State# s -> MutVar# s () -> String -> State# s
expire# s r name = let (# s', () #) = atomicModifyMutVar# r use s
in s'
where
use x = (error nowUsed, x)
nowUsed = name' ++ ": already expired"
name' = if all isSpace name then "(unknown)"
else name
asUO# :: Eq a => String -> STRep s a -> UO# s -> a
asUO# name act h = let (# _, t #) = act (noDuplicate# (h name)) in t
It's a little more complicated than strictly necessary (e.g. rudimentary reuse-error reporting) but in exchange for that, UO
-based definitions can now manipulate local state...
There's one other definition in Data.Supply
to implement:
newSupply :: a -> (a -> a) -> IO (Supply a)
newSupply start next = gen =<< newIORef start
where gen r = unsafeInterleaveIO
$ do v <- unsafeInterleaveIO (atomicModifyIORef r upd)
ls <- gen r
rs <- gen r
return (Node v ls rs)
upd a = let b = next a in seq b (b, a)
as it would end the need for gensym
. It is vaguely similar to initialSupply
- can that be made more apparent?
-
gen
in the originalinitialNameSupply
doesn't have a reference parameterr
:newSupply start next = do r <- newIORef start let gen = unsafeInterleaveIO $ do v <- unsafeInterleaveIO (atomicModifyIORef r upd) ls <- gen rs <- gen return (Node v ls rs) gen where upd a = let b = next a in seq b (b, a)
-
the value-action
unsafeInterleaveIO (atomicModifyIORef r upd)
performs the role ofgensym
in the originalinitialNameSupply
:newSupply start next = do r <- newIORef start let gen = unsafeInterleaveIO $ do v <- genval ls <- gen rs <- gen return (Node v ls rs) genval = unsafeInterleaveIO (atomicModifyIORef r upd) gen where upd a = let b = next a in seq b (b, a)
-
gen
in the originalinitialNameSupply
had no need ofdo
-notation:newSupply start next = do r <- newIORef start let gen = unsafeInterleaveIO (liftM3 Node genval gen gen) genval = unsafeInterleaveIO (atomicModifyIORef r upd) gen where upd a = let b = next a in seq b (b, a)
-
does
genval
have to be in thatlet
-binding?newSupply start next = do r <- newIORef start let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen) gen where genval r = unsafeInterleaveIO (atomicModifyIORef r upd) upd a = let b = next a in seq b (b, a)
-
upd
is only used ingenval
:newSupply start next = do r <- newIORef start let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen) gen where genval r = let upd a = let b = next a in seq b (b, a) in unsafeInterleaveIO (atomicModifyIORef r upd)
-
can some content in
genval
be moved to a separate definition?newSupply start next = do r <- newIORef start let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen) gen where genval r = unsafeInterleaveIO (nextValue r next) nextValue :: IORef a -> (a -> a) -> IO a nextValue r next = let upd a = let b = next a in seq b (b, a) in atomicModifyIORef r upd
Now that it more clearly resembles the original initialNameSupply
, re-implementing newSupply
using our new Supply
type is relatively simple - first, a change of monadic type:
newSupply start next = do r <- newSTRef start
let gen = unsafeInterleaveST (liftM3 Node (genval r) gen gen)
gen
where genval r = unsafeInterleaveST (nextValue r next)
nextValue :: STRef s a -> (a -> a) -> ST s a
nextValue r next = let upd a = let b = next a in seq b (b, a)
in atomicModifyST r upd
No other changes are needed for nextValue
. As for newSupply
:
newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
newSupply start next = do r <- newSTRef start
let g = asUO "genval" (genval r)
useUO (Supply g)
where genval r = nextValue r next
which can then be used to define our version of runSupply
:
runSupply :: (forall a . Eq a => Supply a -> b) -> b
runSupply f = f (runST (newSupply (0 :: Int) succ))
Can we now, finally, expel gensym
from the NameSupply
type?
initialNameSupply :: NameSupply
initialNameSupply = fmap MkName (initialSupply 0 succ)
initialSupply :: Eq a => a -> (a -> a) -> Supply a
initialSupply start next = runST (newSupply start next)
Yes.
Here are all the pertinent definitions, arranged into modules:
-
ExpelGensym
, the replacement forHideGensym
on page 4 of 7:{-# LANGUAGE BangPatterns #-} module ExpelGensym( Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply ) where import Control.Monad (liftM) import Control.Monad.ST (runST) import Supply (Supply, newSupply, supplyValue, split2) data Name = MkName Int deriving (Eq) type NameSupply = Supply Name initialNameSupply :: Supply Name initialNameSupply = fmap MkName (initialSupply 0 succ) getNameDeplete :: NameSupply -> (Name, NameSupply) getNameDeplete s = let !(s1, s2) = split2 s in (supplyValue s1, s2) splitNameSupply :: NameSupply -> (NameSupply, NameSupply) splitNameSupply = split2 -- local definitions -- initialSupply :: Eq a => a -> (a -> a) -> Supply a initialSupply start next = runST (newSupply start next)
-
Supply
, our miniature implementation ofData.Supply
:{-# LANGUAGE BangPatterns, ExistentialQuantification, RankNTypes #-} module Supply( Supply, newSupply, runSupply, supplyValue, split, split2 ) where import Control.Monad.ST import Data.STRef import UO data Supply a = forall s . Supply (UO s -> a) (UO s) instance Functor Supply where fmap f (Supply g u) = Supply (f . g) u newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a) newSupply start next = do r <- newSTRef start let g = asUO "genval" (genval r) useUO (Supply g) where genval r = nextValue r next runSupply :: (forall a . Eq a => Supply a -> b) -> b runSupply f = f (runST (newSupply (0 :: Int) succ)) supplyValue :: Supply a -> a supplyValue (Supply g u) = g u split :: Supply a -> [Supply a] split (Supply g u) = [ Supply g v | v <- splitUO u ] split2 :: Supply a -> (Supply a, Supply a) split2 (Supply g u) = let !(u1, u2) = splitUO2 u in (Supply g u1, Supply g u2) -- local definitions -- nextValue :: STRef s a -> (a -> a) -> ST s a nextValue r next = let upd a = let b = next a in seq b (b, a) in atomicModifySTRef r upd {- -- if your Haskell installation doesn't define it -- atomicModifySTRef :: STRef s a -> (a -> (a, b)) -> ST s b atomicModifySTRef r f = do x <- readSTRef r let !(x', y) = f x writeSTRef r x' return y -}
-
UO
, that use-once split-ready abstract-data type:{-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-} module UO( UO, runUO, splitUO, splitUO2, useUO, asUO, ) where import Prelude (String, Eq(..)) import Prelude ((.), ($), (++), error, all) import Data.Char (isSpace) import GHC.Base (State#, MutVar#) import GHC.Base (runRW#, newMutVar#, noDuplicate#) import GHC.Exts (atomicModifyMutVar#) import GHC.ST (ST(..), STRep) data UO s = UO (UO# s) runUO :: (forall s . UO s -> a) -> a runUO g = let (# _, r #) = runRW# (useUO# (g . UO)) in r splitUO :: UO s -> [UO s] splitUO u = let !(u1, u2) = splitUO2 u in u1 : splitUO u splitUO2 :: UO s -> (UO s, UO s) splitUO2 (UO h) = let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2) useUO :: (UO s -> a) -> ST s a useUO g = ST (\s -> useUO# (g . UO) s) asUO :: Eq a => String -> ST s a -> UO s -> a asUO name (ST act) (UO h) = asUO# name act h -- local definitions -- type UO# s = String -> State# s splitUO2# :: UO# s -> (# UO# s, UO# s #) splitUO2# h = let !s = h "splitUO2" (# s', h1 #) = dispense# s (# _, h2 #) = dispense# s' in (# h1, h2 #) useUO# :: (UO# s -> a) -> STRep s a useUO# g s = let (# s', h #) = dispense# s !r = g h in (# s', r #) dispense# :: STRep s (UO# s) dispense# s = let (# s', r #) = newMutVar# () s in (# s', expire# s' r #) expire# :: State# s -> MutVar# s () -> String -> State# s expire# s r name = let (# s', () #) = atomicModifyMutVar# r use s in s' where use x = (error nowUsed, x) nowUsed = name' ++ ": already expired" name' = if all isSpace name then "(unknown)" else name asUO# :: Eq a => String -> STRep s a -> UO# s -> a asUO# name act h = let (# _, t #) = act (noDuplicate# (h name)) in t