designing large projects in OCaml [closed]

I am going to answer for a medium-sized project in the conditions that I am familiar with, that is between 100K and 1M lines of source code and up to 10 developers. This is what we are using now, for a project started two months ago in August 2013.

Build system and code organization:

  • one source-able shell script defines PATH and other variables for our project
  • one .ocamlinit file at the root of our project loads a bunch of libraries when starting a toplevel session
  • omake, which is fast (with -j option for parallel builds); but we avoid making crazy custom omake plugins
  • one root Makefile contains all the essential targets (setup, build, test, clean, and more)
  • one level of subdirectories, not two
  • most subdirectories build into an OCaml library
  • some subdirectories contain other things (setup, scripts, etc.)
  • OCAMLPATH contains the root of the project; each library subdirectory produces a META file, making all OCaml parts of the projects accessible from the toplevel using #require.
  • only one OCaml executable is built for the whole project (saves a lot of linking time; still not sure why)
  • libraries are installed via a setup script using opam
  • local opam packages are made for software that it not in the official opam repository
  • we use an opam switch which is an alias named after our project, avoiding conflicts with other projects on the same machine

Source-code editing:

  • emacs with opam packages ocp-indent and ocp-index

Source control and management:

  • we use git and github
  • all new code is peer-reviewed via github pull requests
  • tarballs for non-opam non-github libraries are stored in a separate git repository (that can be blown away if history gets too big)
  • bleeding-edge libraries existing on github are forked into our github account and installed via our own local opam package

Use of OCaml:

  • OCaml will not compensate for bad programming practices; teaching good taste is beyond the scope of this answer. http://ocaml.org/learn/tutorials/guidelines.html is a good starting point.
  • OCaml 4.01.0 makes it much easier than before to reuse record field labels and variant constructors (i.e. type t1 = {x:int} type t2 = {x:int;y:int} let t1_of_t2 ({x}:t2) : t1 = {x} now works)
  • we try to not use camlp4 syntax extensions in our own code
  • we do not use classes and objects unless mandated by some external library
  • in theory since OCaml 4.01.0 we should prefer classic variants over polymorphic variants
  • we use exceptions to indicate errors and let them go through happily until our main server loop catches them and interprets them as "internal error" (default), "bad request", or something else
  • exceptions such as Exit or Not_found can be used locally when it makes sense, but in module interfaces we prefer to use options.

Libraries, protocols, frameworks:

  • we use Batteries for all commodity functions that are missing from OCaml's standard library; for the rest we have a "util" library
  • we use Lwt for asynchronous programming, without the syntax extensions, and the bind operator (>>=) is the only operator that we use (if you have to know, we do reluctantly use camlp4 preprocessing for better exception tracking on bind points).
  • we use HTTP and JSON to communicate with 3rd-party software and we expect every modern service to provide such APIs
  • for serving HTTP, we run our own SCGI server (ocaml-scgi) behind nginx
  • as an HTTP client we use Cohttp
  • for JSON serialization we use atdgen

"Cloud" services:

  • we use quite a lot of them as they are usually cheap, easy to interact with, and solve scalability and maintenance problems for us.

Testing:

  • we have one make/omake target for fast tests and one for slow tests
  • fast tests are unit tests; each module may provide a "test" function; a test.ml file runs the list of tests
  • slow tests are those that involve running multiple services; these are crafted specifically for our project, but they cover as much as possible as a production service. Everything runs locally either on Linux or MacOS, except for cloud services for which we find ways to not interfere with production.

Setting this all up is quite a bit of work, especially for someone not familiar with OCaml. There is no framework taking care of all that yet, but at least you get the choice of the tools.


OASIS

To add to Pavel answer:

Disclaimer: I am the author of OASIS.

OASIS also has oasis2opam that can help to create OPAM package quickly and oasis2debian to create Debian packages. This is extremly useful if you want to create a 'release' target that automate most of the tasks to upload a package.

OASIS is also shipped with a script called oasis-dist.ml that creates automatically tarball for upload.

Look all this in https://github.com/ocaml.org.

Testing

I use OUnit to do all my tests. This is simple and pretty efficient if you are used to xUnit testing.

Source control/management

Disclaimer: I am the owner/maintainer of forge.ocamlcore.org (aka forge.o.o)

If you want to use git, I recommend to use github. This is really efficient for review.

If you use darcs or subversion, you can create an account on forge.o.o.

In both case having a public mailing list where you send all commit notification is a must have, so that everyone can see them and review them. You can use either Google groups or a mailing list on forge.o.o.

I recommend to have a nice web (github or forge.o.o) page with OCamldoc documentation build everytime you commit. If you have a huge code base this will help you to use the OCamldoc generated documentation right from the beginning (and fix it quickly).

I recommend to create tarballs when you reach a stable stage. Don't just rely on checking out the latest git/svn version. This tip has saved me hours of work in the past. As said by Martin, store all your tarballs in a central place (a git repository is a good idea for that).


This one probably doesn't answer your question completely, but here is my experience regarding build environment:

I really appreciate OASIS. It has a nice set of features, helping not only to build the project, but also to write documentation and support test environment.

Build system

  • OASIS generates setup.ml file from the specification (_oasis file), which works basically as a building script. It accepts -configure, -build, -test, -distclean flags. I quite used to them while working with different GNU and other projects that usually use Makefiles and I find it convenient that it is possible to use all of them automatically here.
  • Makefiles. Instead of generating setup.ml, it is also possible to generate Makefile with all options described above available.

Structure

Usually my project that is built by OASIS has at least three directories: src, _build, scripts and tests.

  • In the former directory all source files are stored in one directory: source (.ml) and interface (.mli) files are stored together. May be if the project is too large, it is worth introducing more subdirectories.
  • The _build directory is under the influence of OASIS build system. It stores both source and object files there and I like that build files are not interfered with source files, so I can easily delete it in case something goes wrong.
  • I store multiple shell scripts in the scripts directory. Some of them are for test execution and interface file generation.
  • All input and output files for tests I store in a separate directory.

Interfaces/Documentation

The use of interface files (.mli) has both advantages and drawbacks for me. It really helps to find type errors, but if you have them, you have to edit them as well when making changes or improvements in your code. Sometimes forgetting this causes nasty errors.

But the main reason why I like interface files is documentation. I use ocamldoc to generate (OASIS supports this feature with -doc flag) html pages with documentation automatically. In my opinion it is enough to write comments describing each function in the interface and not to insert comments in the middle of code. In OCaml functions are usually short and concise and if there is a necessity to insert extra comments there, may be it is better to split the function.

Also be aware of -i flag for ocamlc. The compiler can automatically generate interface file for a module.

Tests

I didn't find a reasonable solution for supporting tests (I would like to have some ocamltest application), that's why I am using my own scripts for executing and verifying use cases. Fortunately, OASIS supports executing custom commands when setup.ml is run with -test flag.

I don't use OASIS for a long time and if anyone knows any other cool features, I would like also to know about them.

Also, it you are not aware of OPAM, it is definitely worth looking at. Without it installing and managing new packages is a nightmare.