Real time collaborative editing - how does it work?

I'm writing an application in which I'd like to have near real time collaborative editing features for documents (Very similar to Google Documents style editing).

I'm aware of how to keep track of cursor position, that's simple. Just poll the server ever half second or second with the current user id, filename, line number and row number which can be stored in a database, and the return value of this polling request is the position of other user's cursors.

What I don't know how to do is update the document in such a way that it won't throw your cursor off and force a full reload as that would be far to slow for my purposes.

This really only has to work in Google Chrome, preferably Firefox as well. I don't need to support any other browser.


Solution 1:

The algorithm used behind the scenes for merging collaborative edits from multiple peers is called operational transformation. It's not trivial to implement though.

See also this question for useful links.

Solution 2:

Real time collaborative editing requires several things to be effective. Most of the other answers here focus on only one aspect of the problem; namely distributed state (aka shared-mutable-state). Operational Transformation (OT), Conflict-Free Replicated Data Types (CRDT), Differential Synchronization, and other related technologies are all approaches to achieving near-real-time distributed state. Most focus on eventual consistency, which allow temporary divergences of each of the participants state, but guarantee that each participants state will eventually converge when editing stops. Other answers have mentioned several implementations of these technologies.

However, once you have shared mutable state, you need several other features to provide a reasonable user experience. Examples of these additional concepts include:

  • Identity: Who the people you are collaborating with are.
  • Presence: Who is currently "here" editing with you now.
  • Communication: Chat, audio, video, etc., that allow users to coordinate actions
  • Collaborative Cueing: Features that give indications as to what the other participants are doing and/or are about to do.

Shared cursors and selections are examples of Collaborative Cueing (a.k.a Collaboration Awareness). They help users understand the intentions and likely next actions of the other participants. The original poster was partly asking about the interplay between shared mutable state and collaborative cueing. This is important because the location of a cursor or selection in a document is typically described via locations within the document. The issue is that the location of a cursor (for example) is dependent on the context of the document. When I say my cursor is at index 37, that means character 37 in the document I am looking at. The document you may have right now may be different than mine, due to your edits or those of other users, and therefore index 37 in your document may not be correct.

So the mechanism you use to distribute cursor locations must be somehow integrated into or at least aware of the mechanism of the system that provides concurrency control over the shared mutable state. One of the challenges today is that while there are many OT / CRDT, bidirectional messaging, chat, and other libraries out there, they are isolated solutions that are not integrated. This makes it hard to build an end user system that provides a good user experience, and often results in technical challenges left to the developer to figure out.

Ultimately, to implement an effective real time collaborative editing system, you need to consider all of these aspects; and we haven't even discussed history, authorization, application level conflict resolution, and many other facets. You must build or find technologies that support each of these concepts in a way that make sense for your use case. Then you must integrate them.

The good news is that applications that support collaborative editing are becoming much more popular. Technologies that support building them are maturing and new ones are becoming available every month. Firebase was one of the first solutions that tried to wrap in many of these concepts into an easy to use API. A new-comer Convergence (full disclosure, I am a founder of Convergence Labs), provides an all-in-one API that supports the majority of these collaborative editing facets and can significantly reduce the time, cost, and complexity of building real time collaborative editing apps.

Solution 3:

You don't need xmpp or wave for this necessarily. Most of the work on an opensource implementation called infinote already have been done with jinfinote ( https://github.com/sveith/jinfinote). Jinfinote was recently also ported to python ( https://github.com/phrearch/py-infinote) to handle concurrency and document state centrally. I currently use both within the hwios project ( https://github.com/phrearch/hwios), which relies on websockets and json transport. You don't want really want to use polling for these kind of applications. Also xmpp seems to complicate things unnecessarily imo.