How do you write data structures that are as efficient as possible in GHC? [closed]
So sometimes I need to write a data structure I can't find on Hackage, or what I find isn't tested or quality enough for me to trust, or it's just something I don't want to be a dependency. I am reading Okasaki's book right now, and it's quite good at explaining how to design asymptotically fast data structures.
However, I am working specifically with GHC. Constant factors are a big deal for my applications. Memory usage is also a big deal for me. So I have questions specifically about GHC.
In particular
- How to maximize sharing of nodes
- How to reduce memory footprint
- How to avoid space leaks due to improper strictness/laziness
- How to get GHC to produce tight inner loops for important sections of code
I've looked around various places on the web, and I have a vague idea of how to work with GHC, for example, looking at core output, using UNPACK
pragmas, and the like. But I'm not sure I get it.
So I popped open my favorite data structures library, containers, and looked at the Data.Sequence module. I can't say I understand a lot of what they're doing to make Seq fast.
The first thing that catches my eye is the definition of FingerTree a
. I assume that's just me being unfamiliar with finger trees though. The second thing that catches my eye is all the SPECIALIZE
pragmas. I have no idea what's going on here, and I'm very curious, as these are littered all over the code.
Many functions also have an INLINE
pragma associated with them. I can guess what that means, but how do I make a judgement call on when to INLINE
functions?
Things get really interesting around line ~475, a section headered as 'Applicative Construction'. They define a newtype wrapper to represent the Identity monad, they write their own copy of the strict state monad, and they have a function defined called applicativeTree
which, apparently is specialized to the Identity monad and this increases sharing of the output of the function. I have no idea what's going on here. What sorcery is being used to increase sharing?
Anyway, I'm not sure there's much to learn from Data.Sequence. Are there other 'model programs' I can read to gain wisdom? I'd really like to know how to soup up my data structures when I really need them to go faster. One thing in particular is writing data structures that make fusion easy, and how to go about writing good fusion rules.
That's a big topic! Most has been explained elsewhere, so I won't try to write a book chapter right here. Instead:
- Real World Haskell, ch 25, "Performance" - discusses profiling, simple specialization and unpacking, reading Core, and some optimizations.
Johan Tibell is writing a lot on this topic:
- Computing the size of a data structure
- Memory footprints of common data types
- Faster persistent structures through hashing
- Reasoning about laziness
And some things from here:
- Reading GHC Core
- How GHC does optimization
- Profiling for performance
- Tweaking GC settings
- General improvements
- More on unpacking
- Unboxing and strictness
And some other things:
- Intro to specialization of code and data
- Code improvement flags
applicativeTree
is quite fancy, but mainly in a way which has to do with FingerTrees in particular, which are quite a fancy data structure themselves. We had some discussion of the intricacies over at cstheory. Note that applicativeTree
is written to work over any Applicative. It just so happens that when it is specialized to Id
then it can share nodes in a manner that it otherwise couldn't. You can work through the specialization yourself by inlining the Id
methods and seeing what happens. Note that this specialization is used in only one place -- the O(log n) replicate
function. The fact that the more general function specializes neatly to the constant case is a very clever bit of code reuse, but that's really all.
In general, Sequence
teaches more about designing persistent data structures than about all the tricks for eeking out performance, I think. Dons' suggestions are of course excellent. I'd also just browse through the source of the really canonical and tuned libs -- Map
, IntMap
, Set
, and IntSet
in particular. Along with those, its worth taking a look at Milan's paper on his improvements to containers.