Although the author does not say it explicitly, there are a lot of similarities between presented concepts and Aggregates from Domain-Driven Design. Pat's paper was written in 2007, so my wild guess is that he might have been familiar with Eric's book, which was published in 2004.
Since there are still a lot of questions regarding Aggregate's boundaries (and technical difficulties arising around them) in DDD world, I recommend Pat's article as a complementary reading, which takes more infrastructure-oriented approach and may clarify some things.
Pat concentrates on technical aspects of almost-infinite application scaling and takes bottom-up approach by dividing the application into two layers (at least), which differ in their perception of scaling.
- The Lower Layer (scale-aware) knows about the fact that there will be more than one machine that the system will run on.
- The Upper Layer (scale-agnostic) does not know about it and only uses the lower layer's abstractions.
We can see similar, yet not that concrete division in DDD. Based on the proper Business Domain decomposition, DDD approaches the problem from the other, top-down, side
- The Domain Model (scale-agnostic) is the place where business logic resides. It is not aware of the lower system layers and declares interfaces to be implemented.
- The Infrastructure code (scale-aware) that provides the implementation of those interfaces in a way, that the system may be deployed on multiple machines, if need to.
In the article, author defines Entity as a data being manipulated by the Upper Layer, which has to have a globally (in the system) unique identifier or key. This identifier needs to identify exactly one Entity and the data contained within that Entity. The size of the Entity does not matter as long, as it can fit in one scope of serializability (one machine, or one cluster). The Data in one Entity must be disjoint from the data of the other Entities.
The Entity may be stored in many shapes: as a SQL records, a document in document database, or anything else that fits (even as an event stream). The Upper Layer does not have to know about that infrastructure concern.
Does that sound familiar?
The Aggregates from DDD are identified by unique (in the system) identifier. They may be of any size as long as they cover the true business invariants and nothing more (usually the smaller the better. Their data should not leak to the other instances and can be stored in many forms. The Domain Model does not have to know about such infrastructure details.
ACID vs BASE Transactions
Both, Pat and Eric, aggree that the Entity (or in DDD world, the Aggregate) has to be the boundary of atomicity.
Pat explicitly says that "a scale-agnostic programming abstraction must have the notion of entity as the boundary of atomicity". Not following that rule will result in a random system inconsistencies, when the large-scale system redeploys the Entity on a different machine. In other words, the changes in the Entity's state have to be performed in an ACID transaction.
The communication between Entities should be done via messaging, embracing Soft State and Eventual Consistency, as in a BASE transaction. Of course, when you deal with messages, there are some issues to be addressed. Since "exactly-once-in-order delivery" is rarely available in scalable systems, the developers should use "at-least-once delivery" strategy, and deal with redelivery and out-of-order messages by themself. Pat describes a couple of useful patterns to help with that.
Eric says, that Aggregates guard the true business invariants, so that each change in their state has to be always consistent. Not following that rule may lead to overcomplicated models, which are hard to change and maintain in the long run.
Since one change in the Aggregate's state should be done in one ACID transaction, the communication between Aggregates should be done in a BASE one. This should not be hard to achieve, once you have the Domain Model right, because most of the business processes are usually eventually consistent anyway. This is explained in the "Effective Aggregate Design" essay by Vaughn Vernon.
Since Pat advocates the "at-least-one delivery" strategy for dealing with messages, the Entity has to cope with message retries and reordering. Of course, this should be achieved via Idempotency, which is described as "the processing of a message when a subsequent execution of the processing does not perform a substantive change to the entity". This leaves open door for defining what a substantive change is, and it will differ for different systems and Entities.
There are some operations, that are idempotent by itself, but most of the time we will have to guarantee idempotency by ourselves. The author introduces a new concept to help us with that: Activity. It Is responsible for remembering the history of the interactions of given Entity with the other Entities.
This history does not have to be full, but it needs to be able to answer the question: "did I process that message already?"
In DDD words, those Activities should be part of Aggregates, if we want to achieve a scalable system.
Complementary reading for DDD practitioners
As I stated at the beginning, the Pat's article is a complementary reading for all DDD practitioners, who struggle with defining correct Aggregate boundaries, because of the infrastructure complexity in scalable systems.
Pat describes the Entities (which in DDD lingo are the Aggregates) from the infrastructure perspective, which itself can give a new insight to some software developers.
The author also introduces a new concept of dealing with idempotency: Activity. This might be a missing part for many DDD practicioners, since it explicitly shows how to guarantee proper communication between the Aggregates outside of the ACID transaction.
I enjoyed the essay and I highly recommend it for all Domain-Driven Designers.
And what about you, dear reader?
Did you find more similarities between the Pat's Entity and the Eric's Aggregate? Or maybe you disagree, and see more differences?