The Saga pattern and that architecture vs. design thing
Warning: file_get_contents(http://graph.facebook.com/http://arnon.me/2013/01/saga-pattern-architecture-design/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/arnon920/public_html/wp-content/plugins/custom-share-buttons/custom-share-buttons.php on line 873
It has been few months since SOA Patterns was published and so far the book sold somewhere between 2K-3K copies which I guess is not bad for an unknown author – so first off, thanks to all of you who bought a copy (by the way, if you found the book useful I’d be grateful if you could also rate it on Amazon so that others would know about it too)
I know at least a few of you actually read the book as from time to time I get questions about it :). Not all the questions are interesting to “the general public” but some are. One interesting question I got is about the so called “Canonical schema pattern“. I have a post in the making (for too long now,sorry about that Bill) that explains why I don’t consider it a pattern and why I think it verges on being an anti-pattern. Another question I got more recently, which is also the subject of this post, was about the Saga pattern.
Here is (most of) the email I got from Ashic :
“Garcia-Molina’s paper focuses on failure management and compensation so as to prevent partial success. It discusses a variety of approaches – with an SEC, with application code outside of the database, backward-forward and even forward-only (the latter having no “compensate” step per activity, rather a forward flow that takes care of the partial success). Nowadays, I see two viewpoints regarding sagas:
1. People calling process managers sagas, which is obviously incorrect. [e.g. NServiceBus “sagas”.]
2. People focusing very strongly on a “context” of work, whereby the context gets passed around from activity to activity. For linear up front workflows, routing slips are an easy solution. An example of this can be found at Clemens’s post here: http://vasters.com/clemensv/2012/09/01/Sagas.aspx . For more complicated workflows, graph-like slips may be used.
After discussing with some enthusiasts, they seem very keen to suggest that the context has to move along. They seem to reject the notion of a saga where a central coordinator controls the process. In other words, even if a process manager takes care of only routing messages, and that routing includes compensations to alleviate partial successes, they are unwilling (sometimes vehemently) to call that a saga. They acknowledge it can be useful, but say that is not a saga. I find this to be confusing. In this case the process manager acts as the SEC would in a Garcia-Molina saga capable database. This approach still allows interleaved transactions (or steps) without a global lock. Why would this not be a saga?
In your book, I did see you mentioned orchestration as a way of implementing sagas. However, when this was brought up, the proponents of point 2 suggest that that is not what you really mean. To me it seems quite clear, and it aligns with Hector’s paper. I just want to make sure I have this right. I’d love your thoughts on this.”
Let’s start with the answer to the question:
When I think about the Saga pattern I see it as the application of the notions in the Garcia-Molina paper (which talked about databases) to SOA. In other words, I see sagas as the notion of getting distributed agreement of a process with reduced guarantees (vs. distributed transactions that propose ACID guarantees across systems). – So,basically, a Saga is loose transaction-like flow where, in case of failures, involved services perform compensation steps (which may be nothing, a complete undo or something else entirely). The Saga pattern can augment this process with temporary promises (which I call reservations).
Under this definition both centrally managed processes and a “choreographed” processes are Sagas – as long as the semantics and intent mentioned above are kept. The centrally managed orchestration provides visibility of processes, ease of management etc; The cooperative event based, context shared sagas provide flexibility and allow serendipity of new processes; Both have merit and both have a place, at least in my opinion :)
The main reason both of these, very different, approaches are valid designs and implementations for the Saga pattern is that the Saga pattern (like others in the book) is an Architectural pattern and not a Design pattern. Which brings us to the second reason for this post, the difference between “Architecture” and “Design”. In a nutshell, architecture is a type of design where the focus is quality attributes and wide(er) scope whereas design focuses on functional requirements and more localized concerns.
The Saga pattern is an architectural pattern that focused on the integrity reliability quality attributes and it pertains to the communication patterns between services. When it comes to design the implementation of the pattern. you need to decide how to implement the concerns and roles defined in the pattern -e.g. controlling the flow and the status of the saga. One decision can be to implement it centrally and use orchestration another decision can be to decentralize it and use context…
Design decision can be very meaningful sometimes it can be hard to find what’s left of the architecture – consider for example the whole idea behind blogging and RSS feeds. The architectural notion is a publish/subscribe system where the blog writer publish an “event” (a new post) and subscribers get a copy. When it came to design and implementation, considering it was implemented on top of HTTP and REST where there is no publish/subscribe capability it was actually designed as a pull system where the publisher provides a list of recent changes (the feed) and subscribers sample it and check if anything changed since the last time. So architecturally pub/sub, design pull a centralized server that exposes latest changes – a really big difference
Does it matter at all? I think yes. Architecture lets us think about the system at a higher level of abstraction and thus tackle more complex systems. When we design and focus on more local issues we can tackle the nitty gritty details and make sure things actually work. we need to check the effects of design on architecture and vice versa to make sure the whole thing sticks together and actually does what we want/need.
Note that architecture and design are not the complete story – another variable is the technology (e.g. HTTP in the example above) which affects the design decision and thus also the architecture (you can read a little more about it in my posts on SAF)
illustration by Mads Boedker