I'm writing a compiler for a simple imperative language in Haskell, outputting Java bytecode. I've gotten to the point where I'm emitting an abstract representation of bytecodes.

While writing code for compiling if-statements I ran in to some trouble. To implement if-statements I need labels to jump to. Therefore I need to generate a name for that label, and that name needs to be unique.

My first thought was to thread some state through compileStatement, i.e

compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]

Of course, compilerStatement is recursive, so using this method would require me to pass the state of the unique ID generator back upp from the recursive calls:

compileStatement :: Statement -> UniqueIDState -> (UniqueIdState, [AbstractInstruction])

This seems a bit clumsy, especially if I realize I need to carry around more state in the future; is there a more elegant way?


Solution 1:

You need a "unique supply". The usual way to do this in Haskell is by threading a counter through the State monad, which automates the plumbing problem you describe.

Solution 2:

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.

Abraham Maslow.

How about something different - a unique-supply that isn't a member of the Monad class. As it happens, you were almost there with your original type signature:

    compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]

If the only requirement is that each label is unique - no need to count how many were used, providing the same identifiers given the same circumstances, etc - there's a less-invasive technique you can use.

In IO-free spillable splittable supplies, Luke Palmer shows how value supplies can be encapsulated:

    runSupply :: (forall a . Eq a => Supply a -> b) -> b

This avoids having the monadic IO type taint large parts of the programs which use them: nice! But that isn't the only problem - depending on how they're defined, the onus is on you to use such supplies correctly. For example, assuming:

    data Statement =
        ... | If Statement Statement Statement | ...

then if:

    compileStatement (If c t e) s =
        case split s of
          s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s1)
                                            (compileStatement t s2)
                                            (compileStatement e s3)

is mistakenly changed to:

    compileStatement (If c t e) s =
        case split s of
          s1 : s2 : s3 : _ -> buildCondJump (compileStatement c s)
                                            (compileStatement t s)
                                            (compileStatement e s)

not only are UniqueSupply and Unique values being erroneously reused, there's the potential for a space leak if any of the recursive calls to compileStatement uses the supply intensively.


Unlike Clean, Haskell has no standard way of marking types as monousal. That leaves checks at runtime as the only option: definitely a job for an abstract data type!

Here's a thought - if that ADT was also spittable splittable, we could be able to use it to define an alternate value-supply type. All going well, the values of this new type would then have both properties: slippable splittable and monousal.

Looking at Data.Supply reveals the use of a binary-tree type - the module and definitions seem to be based on the following example from the functional pearl [On generating unique names], written by Lennart Augustsson, Mikael Rittri and Dan Synek - from page 4 of 7:

    module HideGensym(
        Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply)
    where

        gensym :: a -> Int  -- implemented in assembler

        data Name = MkName Int deriving (Eq)
        data NameSupply = MkNameSupply Name NameSupply NameSupply

        initialNameSupply = gen ()
            where gen x = MkNameSupply (MkName (gensym x)) (gen x) (gen x)
        getNameDeplete (MkNameSupply n s1 _) = (n, s1)
        splitNameSupply (MkNameSupply _ s1 s2) = (s1, s2)

...gensym: we'll leave that one for now. Let's look at how we can insert the new ADT into NameSupply...after we attend to a more mundane matter: - two identical calls to gen- MkNameSupply ... (gen x) (gen x) - to an optimising Haskell implementation, they're the same value:

         -- same function, same argument, same result: what's the matter? 
        initialNameSupply = gen ()
            where gen x = let s = gen x in
                          MkNameSupply (MkName (gensym x)) s s

Then again, maybe we can solve both problems at once:

        initialNameSupply = runUO gen
            where gen u = let !(u1, u2) = splitUO2 u in
                          MkNameSupply (MkName (gensym ())) (gen u1) (gen u2)

where UO will be our new use-once split-ready abstract-data type:

    module UO(
        UO, initUO, splitUO, splitUO2, ...
    ) where

    data UO s ...
    runUO    :: (forall s . UO s -> a) -> a
    splitUO  :: UO s -> [UO s]
    splitUO2 :: UO s -> (UO s, UO s)   
               ⋮

...which can also be encapsulated.

(Surely there must be a better word than spiltable splittable in the English language...)


Now for the gensym problem - let's start with this cautionary remark about that HideGensym module, also on page 4 of 7:

The gensym [thing] must be coded in assembler, and possibly also the gen function.

...otherwise that single call to gensym might be lifted all the way out: remember gen ()?

         {- WRONG! -}
        initialNameSupply = runUO gen
            where gen u = let !(u1, u2) = splitUO2 u in
                          MkNameSupply (MkName x) (gen u1) (gen u2)
        x = gensym ()
 

Since gensym (supposedly!) accepts any type of input:

        gensym :: a -> Int  -- implemented in assembler

this shouldn't break anything:

        initialNameSupply = runUO gen
            where gen u = let !(u1:u2:u3:_) = splitUO u in
                          MkNameSupply (MkName (gensym u1)) (gen u2) (gen u3)

As a bonus, we can make a slightly-more generic version of initialNameSupply:

        initialNameSupply = initialSupply gensym

        initialSupply :: (UO s -> Int) -> NameSupply
        initialSupply g = runUO gen
            where gen u = let !(u1:u2:u3:_) = splitUO u in
                          MkNameSupply (MkName (g u1)) (gen u2) (gen u3)

(...alright, so gensym is still there - at least now it's isolated.)


By now you've probably noticed the other example module OneTimeSupplies, with its own cautionary remark:

It is referentially transparent only if each supply is used at most once.

In addition, back on page 3 of 7:

If a compile-time analysis of a program can guarantee that every name supply is used at most once, either to do getNameDeplete or splitNameSupply, the tree becomes unnecessary [...]

Since we're relying on UO to provide the same guarantee, can we also go tree-free in our implementation and save some work?

To do that, supplyValue and the splits will need an upgrade:

  • the simplest option for supplyValue is to provide it with the generator (g in initialSupply).

      data NameSupply = forall s . Supply (UO s -> Int) ...
    
      supplyValue :: NameSupply -> Name 
      supplyValue (Supply g ...) = MkName (g ...)
    
  • as for the splits, they require an UO value so they can obtain the new UO values needed by the new supplies:

      data NameSupply = forall s . Supply (UO s) ...
    
      split  :: NameSupply -> [NameSupply]
      split  (Supply u ...) = [ Supply v ... |  v <- splitUO u ]
    
      split2 :: NameSupply -> (NameSupply, NameSupply)
      split2 (Supply u ...) = let !(u1, u2) = splitUO2 u in
                              (Supply u1 ..., Supply u2 ...)
    

That clearly suggests:

    data NameSupply = forall s . Supply (UO s -> Int) (UO s)

    supplyValue (Supply g u) = MkName (g u)

    split  (Supply g u) = [ Supply g v |  v <- splitUO u ]

    split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
                          (Supply g u1, Supply g u2)

But does it also work for initialNameSupply?

    initialNameSupply = initialSupply gensym

    initialSupply :: (UO s -> Int) -> NameSupply
    initialSupply = runUO . Supply

It gets better:

    type NameSupply = Supply Name
    data Name = MkName Int deriving (Eq)
    initialNameSupply = initialSupply (MkName . gensym)

     --    NameSupply    --
     -- ================ --
     --      Supply      --

    data Supply a = forall s . Supply (UO s -> a) (UO s)      

    instance Functor Supply where
        fmap f (Supply g u) = Supply (f . g) u
    
    supplyValue :: Supply a -> a
    supplyValue (Supply g u) = g u

    split  :: Supply a -> [Supply a]
    split  (Supply g u) = [ Supply g v |  v <- splitUO u ]

    split2 :: Supply a -> (Supply a, Supply a)
    split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
                          (Supply g u1, Supply g u2)

    initialSupply :: (UO s -> a) -> NameSupply
    initialSupply = runUO . Supply

This is very promising, if UO and associates can be defined as intended...


If you've read the post by Luke Palmer, you already know that he uses an ugly unsafe entity to define runSupply. Well, right now (2022 Jan) runST is defined in similar fashion:

    runST :: (forall s. ST s a) -> a
    runST (ST st_rep) = case runRW# st_rep of (# _, a #) -> a

where:

    newtype ST s a = ST (STRep s a)
    type STRep s a = State# s -> (# State# s, a #
    runRW# :: STRep RealWorld a -> (# State# RealWorld, a #)

Can UO be defined without resorting to such measures? That's probably worthy of a separate answer - for now, we'll just tolerate the ugliness:

    {-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
    module UO(
        UO, runUO, splitUO, splitUO2,
        useUO, asUO,
    ) where

    import Prelude   (String, Eq(..))
    import Prelude   ((.), ($), (++), error, all)
    import Data.Char (isSpace)
    import GHC.Base  (State#, MutVar#)
    import GHC.Base  (runRW#, newMutVar#, noDuplicate#)
    import GHC.Exts  (atomicModifyMutVar#)
    import GHC.ST    (ST(..), STRep)

    data UO s               =  UO (UO# s)

    runUO                   :: (forall s . UO s -> a) -> a
    runUO g                 =  let (# _, r #) = runRW# (useUO# (g . UO)) in r

    splitUO                 :: UO s -> [UO s]
    splitUO u               =  let !(u1, u2) = splitUO2 u in u1 : splitUO u

    splitUO2                :: UO s -> (UO s, UO s)
    splitUO2 (UO h)         =  let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)

    useUO                   :: (UO s -> a) -> ST s a
    useUO g                 =  ST (\s -> useUO# (g . UO) s)

    asUO                    :: Eq a => String -> ST s a -> UO s -> a
    asUO name (ST act) (UO h)
                            =  asUO# name act h

     -- local definitions --
    type UO# s              =  String -> State# s

    splitUO2#               :: UO# s -> (# UO# s, UO# s #)
    splitUO2# h             =  let !s           = h "splitUO2"
                                   (# s', h1 #) = dispense# s
                                   (# _,  h2 #) = dispense# s'
                               in  (# h1, h2 #)

    useUO#                  :: (UO# s -> a) -> STRep s a
    useUO# g s              =  let (# s', h #) = dispense# s
                                   !r           = g h
                               in  (# s', r #)

    dispense#               :: STRep s (UO# s)
    dispense# s             =  let (# s', r #) = newMutVar# () s
                               in  (# s', expire# s' r #)

    expire#                 :: State# s -> MutVar# s () -> String -> State# s
    expire# s r name        =  let (# s', () #) = atomicModifyMutVar# r use s
                               in  s'
                               where
                                   use x   =  (error nowUsed, x)
                                   nowUsed =  name' ++ ": already expired"
                                   name'   =  if all isSpace name then "(unknown)"
                                              else name

    asUO#                   :: Eq a => String -> STRep s a -> UO# s -> a
    asUO# name act h        =  let (# _, t #) = act (noDuplicate# (h name)) in t

It's a little more complicated than strictly necessary (e.g. rudimentary reuse-error reporting) but in exchange for that, UO-based definitions can now manipulate local state...


There's one other definition in Data.Supply to implement:

    newSupply :: a -> (a -> a) -> IO (Supply a)
    newSupply start next = gen =<< newIORef start
      where gen r = unsafeInterleaveIO
                  $ do v  <- unsafeInterleaveIO (atomicModifyIORef r upd)
                       ls <- gen r
                       rs <- gen r
                       return (Node v ls rs)
            upd a = let b = next a in seq b (b, a)

as it would end the need for gensym. It is vaguely similar to initialSupply - can that be made more apparent?

  • gen in the original initialNameSupply doesn't have a reference parameter r:

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO $
                                            do v  <- unsafeInterleaveIO (atomicModifyIORef r upd)
                                               ls <- gen
                                               rs <- gen
                                               return (Node v ls rs)
                                  gen
          where upd a = let b = next a in seq b (b, a)
    
  • the value-action unsafeInterleaveIO (atomicModifyIORef r upd) performs the role of gensym in the original initialNameSupply:

        newSupply start next = do r <- newIORef start
                                  let gen    = unsafeInterleaveIO $
                                               do v  <- genval
                                                  ls <- gen
                                                  rs <- gen
                                                  return (Node v ls rs)
                                      genval = unsafeInterleaveIO (atomicModifyIORef r upd)
                                  gen
          where upd a = let b = next a in seq b (b, a)
    
  • gen in the original initialNameSupply had no need of do-notation:

        newSupply start next = do r <- newIORef start
                                  let gen    = unsafeInterleaveIO (liftM3 Node genval gen gen)
                                      genval = unsafeInterleaveIO (atomicModifyIORef r upd)
                                  gen
          where upd a = let b = next a in seq b (b, a)
    
  • does genval have to be in that let-binding?

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
                                  gen
          where genval r = unsafeInterleaveIO (atomicModifyIORef r upd)
                upd a    = let b = next a in seq b (b, a)
    
  • upd is only used in genval:

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
                                  gen
          where genval r = let upd a = let b = next a in seq b (b, a)
                           in unsafeInterleaveIO (atomicModifyIORef r upd)
    
  • can some content in genval be moved to a separate definition?

        newSupply start next = do r <- newIORef start
                                  let gen = unsafeInterleaveIO (liftM3 Node (genval r) gen gen)
                                  gen
          where genval r = unsafeInterleaveIO (nextValue r next)
    
        nextValue :: IORef a -> (a -> a) -> IO a
        nextValue r next = let upd a = let b = next a in seq b (b, a)
                           in atomicModifyIORef r upd
    

Now that it more clearly resembles the original initialNameSupply, re-implementing newSupply using our new Supply type is relatively simple - first, a change of monadic type:

    newSupply start next = do r <- newSTRef start
                              let gen = unsafeInterleaveST (liftM3 Node (genval r) gen gen)
                              gen
      where genval r = unsafeInterleaveST (nextValue r next)

    nextValue :: STRef s a -> (a -> a) -> ST s a
    nextValue r next = let upd a = let b = next a in seq b (b, a)
                       in atomicModifyST r upd

No other changes are needed for nextValue. As for newSupply:

    newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
    newSupply start next = do r <- newSTRef start
                              let g = asUO "genval" (genval r)
                              useUO (Supply g)
      where genval r = nextValue r next

which can then be used to define our version of runSupply:

    runSupply :: (forall a . Eq a => Supply a -> b) -> b
    runSupply f = f (runST (newSupply (0 :: Int) succ))

Can we now, finally, expel gensym from the NameSupply type?

    initialNameSupply :: NameSupply
    initialNameSupply = fmap MkName (initialSupply 0 succ)

    initialSupply :: Eq a => a -> (a -> a) -> Supply a
    initialSupply start next = runST (newSupply start next)

Yes.


Here are all the pertinent definitions, arranged into modules:

  • ExpelGensym, the replacement for HideGensym on page 4 of 7:

      {-# LANGUAGE BangPatterns #-}
      module ExpelGensym(
          Name, NameSupply, initialNameSupply, getNameDeplete, splitNameSupply
      ) where
      import Control.Monad    (liftM)
      import Control.Monad.ST (runST)
      import Supply           (Supply, newSupply, supplyValue, split2)
    
      data Name = MkName Int deriving (Eq)
      type NameSupply = Supply Name
    
      initialNameSupply :: Supply Name
      initialNameSupply = fmap MkName (initialSupply 0 succ)
    
      getNameDeplete :: NameSupply -> (Name, NameSupply)    
      getNameDeplete s = let !(s1, s2) = split2 s in (supplyValue s1, s2)
    
      splitNameSupply :: NameSupply -> (NameSupply, NameSupply)
      splitNameSupply = split2
    
       -- local definitions --
      initialSupply :: Eq a => a -> (a -> a) -> Supply a
      initialSupply start next = runST (newSupply start next)
    
  • Supply, our miniature implementation of Data.Supply:

      {-# LANGUAGE BangPatterns, ExistentialQuantification, RankNTypes #-}
      module Supply(
          Supply, newSupply, runSupply, supplyValue, split, split2
      ) where
      import Control.Monad.ST
      import Data.STRef
      import UO
    
      data Supply a = forall s . Supply (UO s -> a) (UO s)      
    
      instance Functor Supply where
          fmap f (Supply g u) = Supply (f . g) u
    
      newSupply :: Eq a => a -> (a -> a) -> ST s (Supply a)
      newSupply start next = do r <- newSTRef start
                                let g = asUO "genval" (genval r)
                                useUO (Supply g)
        where genval r = nextValue r next
    
      runSupply :: (forall a . Eq a => Supply a -> b) -> b
      runSupply f = f (runST (newSupply (0 :: Int) succ))
    
      supplyValue :: Supply a -> a
      supplyValue (Supply g u) = g u
    
      split  :: Supply a -> [Supply a]
      split  (Supply g u) = [ Supply g v |  v <- splitUO u ]
    
      split2 :: Supply a -> (Supply a, Supply a)
      split2 (Supply g u) = let !(u1, u2) = splitUO2 u in
                            (Supply g u1, Supply g u2)
    
       -- local definitions --
      nextValue :: STRef s a -> (a -> a) -> ST s a
      nextValue r next = let upd a = let b = next a in seq b (b, a)
                         in atomicModifySTRef r upd
    
      {-
       -- if your Haskell installation doesn't define it --
      atomicModifySTRef :: STRef s a -> (a -> (a, b)) -> ST s b
      atomicModifySTRef r f = do x <- readSTRef r
                                 let !(x', y) = f x
                                 writeSTRef r x'
                                 return y
      -}
    
  • UO, that use-once split-ready abstract-data type:

      {-# LANGUAGE BangPatterns, RankNTypes, UnboxedTuples, MagicHash #-}
      module UO(
          UO, runUO, splitUO, splitUO2,
          useUO, asUO,
      ) where
    
      import Prelude   (String, Eq(..))
      import Prelude   ((.), ($), (++), error, all)
      import Data.Char (isSpace)
      import GHC.Base  (State#, MutVar#)
      import GHC.Base  (runRW#, newMutVar#, noDuplicate#)
      import GHC.Exts  (atomicModifyMutVar#)
      import GHC.ST    (ST(..), STRep)
    
      data UO s               =  UO (UO# s)
    
      runUO                   :: (forall s . UO s -> a) -> a
      runUO g                 =  let (# _, r #) = runRW# (useUO# (g . UO)) in r
    
      splitUO                 :: UO s -> [UO s]
      splitUO u               =  let !(u1, u2) = splitUO2 u in u1 : splitUO u
    
      splitUO2                :: UO s -> (UO s, UO s)
      splitUO2 (UO h)         =  let (# h1, h2 #) = splitUO2# h in (UO h1, UO h2)
    
      useUO                   :: (UO s -> a) -> ST s a
      useUO g                 =  ST (\s -> useUO# (g . UO) s)
    
      asUO                    :: Eq a => String -> ST s a -> UO s -> a
      asUO name (ST act) (UO h)
                              =  asUO# name act h
    
       -- local definitions --
      type UO# s              =  String -> State# s
    
      splitUO2#               :: UO# s -> (# UO# s, UO# s #)
      splitUO2# h             =  let !s           = h "splitUO2"
                                     (# s', h1 #) = dispense# s
                                     (# _,  h2 #) = dispense# s'
                                 in  (# h1, h2 #)
    
      useUO#                  :: (UO# s -> a) -> STRep s a
      useUO# g s              =  let (# s', h #) = dispense# s
                                     !r           = g h
                                 in  (# s', r #)
    
      dispense#               :: STRep s (UO# s)
      dispense# s             =  let (# s', r #) = newMutVar# () s
                                 in  (# s', expire# s' r #)
    
      expire#                 :: State# s -> MutVar# s () -> String -> State# s
      expire# s r name        =  let (# s', () #) = atomicModifyMutVar# r use s
                                 in  s'
                                 where
                                     use x   =  (error nowUsed, x)
                                     nowUsed =  name' ++ ": already expired"
                                     name'   =  if all isSpace name then "(unknown)"
                                                else name
    
      asUO#                   :: Eq a => String -> STRep s a -> UO# s -> a
      asUO# name act h        =  let (# _, t #) = act (noDuplicate# (h name)) in t