What is the difference between a saga, a process manager and a document-based approach?

What I understand is that all three concepts are related to long-running transactions.

A process manager is, to my understanding, a finite state machine which simply reacts on events and emits commands. It does not contain any business logic, it just does routing. Its goal is to bring you to a final state, where you know that your transaction has succeeded or failed.

So far, so good.

But now my problems in understand start:

  • What is a saga in contrast to a process manager?
  • There is also the document-based approach, as mentioned in CQRS sagas - did I understand them right? … as I understand it, a document is just a "piece of paper" where you take notes and hand it around. How does that fit into the concept of commands and events?

Can anybody please explain the differences, and - what I'd be especially interested in - which of these concepts is good for what, and when you do need what. Are they mutually exclusive? Can you go along all the way with only one of them? Are there scenarios where you need more than one? …?


What is a saga in contrast to a process manager?

The intent of these patterns is different. A process manager is a workflow pattern which can, as you say, be built on top of a state machine. A process manager will retain state between messages, and will contain logic in order to determine which action should be taken in response to a message (for example, transitioning state or sending another message). Some frameworks incorrectly refer to these as sagas.

By contrast, a saga (according to the original definitions) is a pattern intended to help manage failures. It involves multiple workflows across systems, where each will allow some form of compensating action to be taken in a later transaction in the case of a failure elsewhere.

This compensation is the defining characteristic of a saga. Note that the saga itself does't know what the compensating action might be. Sagas are often implemented using the routing slip pattern.

Are they mutually exclusive? Can you go along all the way with only one of them?

They aren't mutually exclusive - it's likely that, for example, a system participating in a saga might use a process manager to actually handle its piece of processing.

Other Resources

Some of these posts may help provide more detail and provide examples:

  • Sagas and Workflows - same thing with different names or not (Arnon Rotem-gal-oz)
  • Clarifying the Saga pattern (Kelly Sommers)
  • Sagas (Clemens Vasters)

Take a look at CQRS Journey project on MSDN:

http://msdn.microsoft.com/en-us/library/jj591569.aspx

Clarifying the terminology

The term saga is commonly used in discussions of CQRS to refer to a piece of code that coordinates and routes messages between bounded contexts and aggregates. However, for the purposes of this guidance we prefer to use the term process manager to refer to this type of code artifact. There are two reasons for this:

There is a well-known, pre-existing definition of the term saga that has a different meaning from the one generally understood in relation to CQRS. The term process manager is a better description of the role performed by this type of code artifact.

Although the term saga is often used in the context of the CQRS pattern, it has a pre-existing definition. We have chosen to use the term process manager in this guidance to avoid confusion with this pre-existing definition.

The term saga, in relation to distributed systems, was originally defined in the paper "Sagas" by Hector Garcia-Molina and Kenneth Salem. This paper proposes a mechanism that it calls a saga as an alternative to using a distributed transaction for managing a long-running business process. The paper recognizes that business processes are often comprised of multiple steps, each of which involves a transaction, and that overall consistency can be achieved by grouping these individual transactions into a distributed transaction. However, in long-running business processes, using distributed transactions can impact on the performance and concurrency of the system because of the locks that must be held for the duration of the distributed transaction.