What's a good way to organize projects with shared dependencies in Mercurial? [closed]

Currently, I'm moving from a legacy version control system and moving my group's project to mercurial. As one example of the kinds of code I'm moving, I have a 25+ project Visual Studio solution, containing several separate application areas that all depend on common code. Looking over Stack Overflow, the closest question I found was this one, but it merely mentioned version control. I'm looking for a bit further advice, on specific implementation techniques of using Mercurial to manage these dependencies.

A simplified view of the dependencies look something like the following. (This is for illustration and example only; the actual dependencies are significantly more complex but similar in nature.)

                     Common Lib 1
                    /      |      \
                ----       |       -----   
               /           |        \   \
            App 1    Common Lib 2    \ App 2
                       /   |   \      \
                -------    |    ------ |
               /           |          \|
             App 3       App 4      App 5

The Common Lib modules would be shared code - this would be a DLL or SO or some other library that would be used between all the apps simultaneously - both at compile and run time. Applications would otherwise be able to run independently from each other.

I have a couple goals with setting up my mercurial repositories:

  • Give each significant application or component group its own repository.
  • Make each repository self contained.
  • Make the sum total of the project self contained.
  • Make it easy to build the entire codebase at once. (eventually, all these programs and libraries end up in a single installer.)
  • Keep it simple.

One other point is that I have a server set up where I have separate repositories for each of these projects.

I see a couple ways of laying these projects out.

1. Create a "Shell" repository that contains everything.

This would use url-based subrepos (eg, in the .hgsub, I'd do something like App1 = https://my.server/repo/app1.) Laid out, it would look like the following:

+---------------------------+
| Main Repository           |
| | +---------------------+ |
| +-| Build               | |
| | +---------------------+ |
| | +---------------------+ |
| +-| Common Lib 1        | |
| | +---------------------+ |
| | +---------------------+ |
| +-| Common Lib 2        | |
| | +---------------------+ |
| | +---------------------+ |
| +-| App 1               | |
| | +---------------------+ |
| | +---------------------+ |
| +-| App 2               | |
| | +---------------------+ |
| | +---------------------+ |
| +-| App 3               | |
| | +---------------------+ |
| | +---------------------+ |
| +-| App 4               | |
| | +---------------------+ |
| | +---------------------+ |
| +-| App 5               | |
|   +---------------------+ |
+---------------------------+

Each main folder in the shell repository would contain a subrepo, one for each project area. Dependencies would be relative: Eg, since App 4 needs Common Lib 2, it would simply use relative paths to reference that common library.

Pros of this approach:

  • Each library is pulled down once and only once.
  • Mercurial's subreos will ensure that the same version of the library is used across all project automatically, since only one version of that subrepo exists in the project.
  • It's easy to find each resource.

Cons of this approach:

  • I can't work on an App independently. Eg, if I work on App 2, and it needs a change to the common libraries, all of the other apps will have to take those changes right now.
  • If I pull an App repo by itself, I have to figure out (or know) what other dependent repos it requires by hand if I want to build it.
  • Dependencies are not strongly separated - it would be tempting to insert a new feature anywhere since it was easy to get at all the features.

2. Have dependent subrepos be wholly contained.

In this approach, each application would have its own repository (as before) but this time also contain subrepositories: one for its own source, and one for each dependent subrepository. An overall repository would then contain each of these project repositories, and know how to build the entire solution. This would look like the following:

+-----------------------------------------------------------------------+
| Main Repository                                                       |
| +--------------------+ +--------------------+ +--------------------+  |
| | Build              | | Common Lib 1       | | Common Lib 2       |  |
| +--------------------+ | | +--------------+ | | | +--------------+ |  |
|                        | +-| Lib 1 Source | | | +-| Common Lib 1 | |  |
|                        |   +--------------+ | | | +--------------+ |  |
|                        |                    | | | +--------------+ |  |
|                        |                    | | +-| Lib 2 Source | |  |
|                        |                    | |   +--------------+ |  |
|                        +--------------------+ +--------------------+  |
| +--------------------+ +--------------------+ +---------------------+ |
| | App 1              | | App 2              | |  App 3              | |
| | | +--------------+ | | | +--------------+ | |  | +--------------+ | |
| | +-| Common Lib 1 | | | +-| Common Lib 1 | | |  +-| Common Lib 2 | | |
| | | +--------------+ | | | +--------------+ | |  | +--------------+ | |
| | | +--------------+ | | | +--------------+ | |  | +--------------+ | |
| | +-| App 1 Source | | | +-| App 2 Source | | |  +-| App 3 Source | | |
| |   +--------------+ | |   +--------------+ | |    +--------------+ | |
| +--------------------+ +--------------------+ +---------------------+ |
| +--------------------+ +--------------------+                         |
| | App 4              | | App 5              |                         |
| | | +--------------+ | | | +--------------+ |                         |
| | +-| Common Lib 2 | | | +-| Common Lib 1 | |                         |
| | | +--------------+ | | | +--------------+ |                         |
| | | +--------------+ | | | +--------------+ |                         |
| | +-| App 4 Source | | | +-| Common Lib 2 | |                         |
| |   +--------------+ | | | +--------------+ |                         |
| +--------------------+ + | +--------------+ |                         |
|                        | +-| App 5 Source | |                         |
|                        |   +--------------+ |                         |
|                        +--------------------+                         |
+-----------------------------------------------------------------------+

Pros:

  • Each application can be built by itself, independent of each other.
  • Dependent versions of libraries can be tracked per-app, instead of globally. It takes an explicit act of inserting a subrepo into the project to add a new dependency.

Cons:

  • When doing the final build, each app might be using a different version of a shared library. (might need to write tools to sync the common lib subrepos. Eww.)
  • If I want to build the entire source, I end up pulling down shared libraries multiple times. In the case of Common Lib 1, I would have to pull it eight (!) times.

3. Don't include dependencies at all as subrepos - bring them in as part of the build.

This approach would look much like approach 1, except the common libraries would only be pulled as part of the build. Each app would know what repos it needed, and put them in the common location.

Pros:

  • Each app could build by itself.
  • Common libraries would only need to be pulled once.

Cons:

  • We'd have to keep track of versions of libraries currently used by each app. This duplicates subrepo features.
  • We'd have to build an infrastructure to support this, which means more stuff going into build scripts. Ugh.

4. What else?

Is there another way of handling it? A better way? What ways have you tried and succeeded, what ways have you tried but hated? I'm currently leaning towards 1, but the lack of application independence, when it should be able to, really bothers me. Is there a way to get the nice separation of method 2 without the massive duplicate code pull and dependency maintenance nightmare, while not having to write scripts to handle it (like in option 3)?


Solution 1:

Dependencies management is an important aspect of a project's organization, to my eyes. You exposed in great details various solutions, based on the subrepos feature of Mercurial, and I agree with all the pros/cons that you gave.

I think SCMs are not well suited for dependencies management. I prefer having a dedicated tool for that (this would be your solution n°3).

My current project is in Java. It was built with Apache Ant, and I first set up Apache Ivy as a dependencies management tool. In the end, the setup consisted of some Ivy configuration files in a shared directory, and one XML file listing the dependencies for each module of the project. Ivy can be invoked by Ant targets, so I added two new actions in each module : "resolve dependencies", and "deploy the built artifact". The deployment adds the result of the buid (called an artifact) in the shared directory. The dependencies resolution means transitively resolving the dependencies of the module, and copying the resolved artifacts in the "lib" folder of the module's sources.

This solution is applicable to a C++ project, since Ivy is not specific to managing Java dependencies : artifacts can be anything. In C++, the artifacts produced by a module would be :

  1. a so/dll at runtime
  2. the header files at compile time.

This is not a perfect solution: Ivy is not easy to set up, you still have to tell your build script what dependencies to use, and you do not have direct access to the sources of the dependencies for debugging purpose. But you do end up with independent SCM repositories.

In our project, we then switched form Ant+Ivy to Apache Maven, which takes care of both the build and the dependencies management. The artifacts are deployed in an Apache Archiva instead of a shared folder. This is a huge improvement, but it will work well for Java projects only.

Solution 2:

What you want to do is have each project in its own directory like in (1). Then you tag working versions of your dependencies and save the tag in some file for build like:

App1/.dependencies:
CommonLib1 tag-20100515
CommonLib2 tag-20100510

App2/.dependencies:
CommonLib1 tag-20100510
CommonLib2 tag-20100510

Then you use your build scripts to build the libraries based on the specific tag and include those built libraries as derived objects for your applications. If build time is an issue, you can have the tagged version that are in use for those libraries pre-built and saved somewhere.

Note (design principles are same if designing database schema, object model or product build):

  • Do not link to the code in other projects (breaks encapsulation)
  • Do not have multiple copies of the libraries in your repository (modularity)