What tools do distributed programmers lack?

Solution 1:

OK, let me start.

A distributed logger with a high-precision global time axis - allowing to register events from different machines in a distributed system with high precision and independent on the clock offset and drift; with sufficient scalability to handle the load of several hundred machines and several thousand logging processes. Such a logger allows to find transport-level latency bottlenecks in a distributed system by seeing, for example, how many milliseconds it actually takes for a message to travel from the publisher to the subscriber through a message queue, etc.

Syslog is not ok because it's not scalable enough - 50000 logging events per second will be too much for it, and timestamp precision will suffer greatly under such load.

Facebook's Scribe is not ok because it doesn't provide a global time axis.

Actually, both syslog and scribe register events under arrival timestamps, not under occurence timestamps.

Honestly, I don't lack such a tool - I've written one for myself, I'm greatly pleased with it and I'm going to open-source it. But others might.

P.S. I've open-sourced it: http://code.google.com/p/greg

Solution 2:

Dear Santa, I would like visualizations of the interactions between components in the distributed system.

I would like a visual representation showing:

  • The interactions among components, either as a UML collaboration diagram or sequence diagram.
  • Component shutdown and startup times as self-interactions.
  • On which hosts components are currently running.
  • Location of those hosts, if available, within a building or geographically.
  • Host shutdown and startup times.

I would like to be able to:

  • Filter the components and/or interactions displayed to show only those of interest.
  • Record interactions.
  • Display a desired range of time in a static diagram.
  • Play back the interactions in an animation, with typical video controls for playing, pausing, rewinding, fast-forwarding.

I've been a good developer all year, and would really like this.

Solution 3:

Then again, see this question - How to visualize the behavior of many concurrent multi-stage processes?.

alt text

(I'm shamelessly refering to my own stuff, but that's because the problems solved by this stuff were important for me, and the current question is precisely about problems that are important for someone).