DDD - the rule that Entities can't access Repositories directly
In Domain Driven Design, there seems to be lots of agreement that Entities should not access Repositories directly.
Did this come from Eric Evans Domain Driven Design book, or did it come from elsewhere?
Where are there some good explanations for the reasoning behind it?
edit: To clarify: I'm not talking about the classic OO practice of separating data access off into a separate layer from the business logic - I'm talking about the specific arrangement whereby in DDD, Entities are not supposed to talk to the data access layer at all (i.e. they are not supposed to hold references to Repository objects)
update: I gave the bounty to BacceSR because his answer seemed closest, but I'm still pretty in the dark about this. If its such an important principle, there should be some good articles about it online somewhere, surely?
update: March 2013, the upvotes on the question imply there's a lot of interest in this, and even though there's been lots of answers, I still think there's room for more if people have ideas about this.
Solution 1:
There's a bit of a confusion here. Repositories access aggregate roots. Aggregate roots are entities. The reason for this is separation of concerns and good layering. This doesn't make sense on small projects, but if you're on a large team you want to say, "You access a product through the Product Repository. Product is an aggregate root for a collection of entities, including the ProductCatalog object. If you want to update the ProductCatalog you must go through the ProductRepository."
In this way you have very, very clear separation on the business logic and where things get updated. You don't have some kid who is off by himself and writes this entire program that does all these complicated things to the product catalog and when it comes to integrate it to the upstream project, you're sitting there looking at it and realize it all has to be ditched. It also means when people join the team, add new features, they know where to go and how to structure the program.
But wait! Repository also refers to the persistence layer, as in the Repository Pattern. In a better world an Eric Evans' Repository and the Repository Pattern would have separate names, because they tend to overlap quite a bit. To get the repository pattern you have contrast with other ways in which data is accessed, with a service bus or an event model system. Usually when you get to this level, the Eric Evans' Repository definition goes by the way side and you start talking about a bounded context. Each bounded context is essentially its own application. You might have a sophisticated approval system for getting things into the product catalog. In your original design the product was the center piece but in this bounded context the product catalog is. You still might access product information and update product via a service bus, but you must realize that a product catalog outside the bounded context might mean something completely different.
Back to your original question. If you're accessing a repository from within an entity it means the entity is really not a business entity but probably something that should exist in a service layer. This is because entities are business object and should concern themselves with being as much like a DSL (domain specific language) as possible. Only have business information in this layer. If you're troubleshooting a performance issue, you'll know to look elsewhere since only business information should be here. If suddenly, you have application issues here, you're making it very hard to extend and maintain an application, which is really the heart of DDD: making maintainable software.
Response to Comment 1: Right, good question. So not all validation occurs in the domain layer. Sharp has an attribute "DomainSignature" that does what you want. It is persistence aware, but being an attribute keeps the domain layer clean. It ensures that you don't have a duplicate entity with, in your example the same name.
But let's talk about more complicated validation rules. Let's say you're Amazon.com. Have you ever ordered something with an expired credit card? I have, where I haven't updated the card and bought something. It accepts the order and the UI informs me that everything is peachy. About 15 minutes later, I'll get an e-mail saying there's a problem with my order, my credit card is invalid. What's happening here is that, ideally, there's some regex validation in the domain layer. Is this a correct credit card number? If yes, persist the order. However, there's additional validation at the application tasks layer, where an external service is queried to see if payment can be made on the credit card. If not, don't actually ship anything, suspend the order and wait for the customer. This should all take place in a service layer.
Don't be afraid to create validation objects at the service layer that can access repositories. Just keep it out of the domain layer.
Solution 2:
At first, I was of the persuasion to allow some of my entities access to repositories (i.e. lazy loading without an ORM). Later I came to the conclusion that I shouldn't and that I could find alternate ways:
- We should know our intentions in a request and what we want from the domain, therefore we can make repository calls before constructing or invoking Aggregate behavior. This also helps avoid the problem of inconsistent in-memory state and the need for lazy loading (see this article). The smell is that you cannot create an in memory instance of your entity anymore without worrying about data access.
- CQS can help reduce the need for wanting to call the repository for things in our entities.
- We can use a specification to encapsulate and communicate domain logic needs and pass that to the repository instead (a service can orchestrate these things for us). The specification can come from the entity that is in charge of maintaining that invariant. The repository will interpret parts of the specification into it's own query implementation and apply rules from the specification on query results. This aims to keep domain logic in the domain layer. It also serves the Ubiquitous Language and communication better. Imagine saying "overdue order specification" versus saying "filter order from tbl_order where placed_at is less than 30 minutes before sysdate" (see this answer).
- It makes reasoning about the behavior of entities more difficult since the Single-Responsibility Principle is violated. If you need to work out storage/persistence issues you know where to go and where not to go.
- It avoids the danger of giving an entity bi-directional access to global state (via the repository and domain services). You also don't want to break your transaction boundary.
Vernon Vaughn in the red book Implementing Domain-Driven Design refers to this issue in two places that I know of (note: this book is fully endorsed by Evans as you can read in the foreword). In Chapter 7 on Services, he uses a domain service and a specification to work around the need for an aggregate to use a repository and another aggregate to determine if a user is authenticated. He's quoted as saying:
As a rule of thumb, we should try to avoid the use of Repositories (12) from inside Aggregates, if at all possible.
Vernon, Vaughn (2013-02-06). Implementing Domain-Driven Design (Kindle Location 6089). Pearson Education. Kindle Edition.
And in Chapter 10 on Aggregates, in the section titled "Model Navigation" he says (just after he recommends the use of global unique IDs for referencing other aggregate roots):
Reference by identity doesn’t completely prevent navigation through the model. Some will use a Repository (12) from inside an Aggregate for lookup. This technique is called Disconnected Domain Model, and it’s actually a form of lazy loading. There’s a different recommended approach, however: Use a Repository or Domain Service (7) to look up dependent objects ahead of invoking the Aggregate behavior. A client Application Service may control this, then dispatch to the Aggregate:
He goes onto show an example of this in code:
public class ProductBacklogItemService ... {
...
@Transactional
public void assignTeamMemberToTask(
String aTenantId,
String aBacklogItemId,
String aTaskId,
String aTeamMemberId) {
BacklogItem backlogItem = backlogItemRepository.backlogItemOfId(
new TenantId(aTenantId),
new BacklogItemId(aBacklogItemId));
Team ofTeam = teamRepository.teamOfId(
backlogItem.tenantId(),
backlogItem.teamId());
backlogItem.assignTeamMemberToTask(
new TeamMemberId( aTeamMemberId),
ofTeam,
new TaskId( aTaskId));
}
...
}
He goes on to also mention yet another solution of how a domain service can be used in an Aggregate command method along with double-dispatch. (I can't recommend enough how beneficial it is to read his book. After you have tired from endlessly rummaging through the internet, fork over the well deserved money and read the book.)
I then had some discussion with the always gracious Marco Pivetta @Ocramius who showed me a bit of code on pulling out a specification from the domain and using that:
- This is not recommended:
$user->mountFriends(); // <-- has a repository call inside that loads friends?
- In a domain service, this is good:
public function mountYourFriends(MountFriendsCommand $mount) {
$user = $this->users->get($mount->userId());
$friends = $this->users->findBySpecification($user->getFriendsSpecification());
array_map([$user, 'mount'], $friends);
}
Solution 3:
Its a very good question. I will look forward to some discussion about this. But I think it's mentioned in several DDD books and Jimmy nilssons and Eric Evans. I guess it's also visible through examples how to use the reposistory pattern.
BUT lets discuss. I think a very valid thought is why should an entity know about how to persist another entity? Important with DDD is that each entity has a responsibility to manage its own "knowledge-sphere" and shouldn't know anything about how to read or write other entities. Sure you can probably just add a repository interface to Entity A for reading Entities B. But the risk is that you expose knowledge for how to persist B. Will entity A also do validation on B before persisting B into db?
As you can see entity A can get more involved into entity B's lifecycle and that can add more complexity to the model.
I guess (without any example) that unit-testing will be more complex.
But I'm sure there will always be scenarios where you're tempted to use repositories via entities. You have to look at each scenario to make a valid judgement. Pros and Cons. But the repository-entity solution in my opinion starts with a lot of Cons. It must be a very special scenario with Pros that balance up the Cons....
Solution 4:
What an excellent question. I am on the same path of discovery, and most answers throughout the internet seem to bring as many problems as they bring solutions.
So (at the risk of writing something that I disagree with a year from now) here are my discoveries so far.
First of all, we like a rich domain model, which gives us high discoverability (of what we can do with an aggregate) and readability (expressive method calls).
// Entity
public class Invoice
{
...
public void SetStatus(StatusCode statusCode, DateTime dateTime) { ... }
public void CreateCreditNote(decimal amount) { ... }
...
}
We want to achieve this without injecting any services into an entity's constructor, because:
- Introduction of a new behavior (that uses a new service) could lead to a constructor change, meaning the change affects every line that instantiates the entity!
- These services are not part of the model, but constructor-injection would suggest that they were.
- Often a service (even its interface) is an implementation detail rather than part of the domain. The domain model would have an outward-facing dependency.
- It can be confusing why the entity cannot exist without these dependencies. (A credit note service, you say? I am not even going to do anything with credit notes...)
- It would make it hard instantiate, thus hard to test.
- The problem spreads easily, because other entities containing this one would get the same dependencies - which on them may look like very unnatural dependencies.
How, then, can we do this? My conclusion so far is that method dependencies and double dispatch provide a decent solution.
public class Invoice
{
...
// Simple method injection
public void SetStatus(IInvoiceLogger logger, StatusCode statusCode, DateTime dateTime)
{ ... }
// Double dispatch
public void CreateCreditNote(ICreditNoteService creditNoteService, decimal amount)
{
creditNoteService.CreateCreditNote(this, amount);
}
...
}
CreateCreditNote()
now requires a service that is responsible for creating credit notes. It uses double dispatch, fully offloading the work to the responsible service, while maintaining discoverability from the Invoice
entity.
SetStatus()
now has a simple dependency on a logger, which obviously will perform part of the work.
For the latter, to make things easier on the client code, we might instead log through an IInvoiceService
. After all, invoice logging seems pretty intrinsic to an invoice. Such a single IInvoiceService
helps avoid the need for all sorts of mini-services for various operations. The downside is that it becomes obscure what exactly that service will do. It might even start to look like double dispatch, while most of the work is really still done in SetStatus()
itself.
We could still name the parameter 'logger', in hopes of revealing our intent. Seems a bit weak, though.
Instead, I would opt to ask for an IInvoiceLogger
(as we already do in the code sample) and have IInvoiceService
implement that interface. The client code can simply use its single IInvoiceService
for all Invoice
methods that ask for any such a very particular, invoice-intrinsic 'mini-service', while the method signatures still make abundantly clear what they are asking for.
I notice that I have not addressed repositories exlicitly. Well, the logger is or uses a repository, but let me also provide a more explicit example. We can use the same approach, if the repository is needed in just a method or two.
public class Invoice
{
public IEnumerable<CreditNote> GetCreditNotes(ICreditNoteRepository repository)
{ ... }
}
In fact, this provides an alternative to the ever-troublesome lazy loads.
Update: I have left the text below for historical purposes, but I suggest steering clear of lazy loads 100%.
For true, property-based lazy loads, I do currently use constructor injection, but in a persistence-ignorant way.
public class Invoice
{
// Lazy could use an interface (for contravariance if nothing else), but I digress
public Lazy<IEnumerable<CreditNote>> CreditNotes { get; }
// Give me something that will provide my credit notes
public Invoice(Func<Invoice, IEnumerable<CreditNote>> lazyCreditNotes)
{
this.CreditNotes = new Lazy<IEnumerable<CreditNotes>>() => lazyCreditNotes(this));
}
}
On the one hand, a repository that loads an Invoice
from the database can have free access to a function that will load the corresponding credit notes, and inject that function into the Invoice
.
On the other hand, code that creates an actual new Invoice
will merely pass a function that returns an empty list:
new Invoice(inv => new List<CreditNote>() as IEnumerable<CreditNote>)
(A custom ILazy<out T>
could rid us of the ugly cast to IEnumerable
, but that would complicate the discussion.)
// Or just an empty IEnumerable
new Invoice(inv => IEnumerable.Empty<CreditNote>())
I'd be happy to hear your opinions, preferences, and improvements!
Solution 5:
Why separate out data access?
From the book, I think the first two pages of the chapter Model Driven Design gives some justification for why you want to abstract out technical implementation details from the implementation of the domain model.
- You want to keep a tight connection between the domain model and the code
- Separating technical concerns helps prove the model is practical for implementation
- You want the ubiquitous language to permeate through to the design of the system
This seems to be all for the purpose of avoiding a separate "analysis model" that becomes divorced from the actual implementation of the system.
From what I understand of the book, it says this "analysis model" can end up being designed without considering software implementation. Once developers try to implement the model understood by the business side they form their own abstractions due to necessity, causing a wall in communication and understanding.
In the other direction, developers introducing too many technical concerns into the domain model can cause this divide as well.
So you could consider that practicing separation of concerns such as persistence can help safeguard against these design an analysis models diverging. If it feels necessary to introduce things like persistence into the model then it is a red flag. Maybe the model is not practical for implementation.
Quoting:
"The single model reduces the chances of error, because the design is now a direct outgrowth of the carefully considered model. The design, and even the code itself, has the communicativeness of a model."
The way I'm interpreting this, if you ended up with more lines of code dealing with things like database access, you lose that communicativeness.
If the need for accessing a database is for things like checking uniqueness, have a look at:
Udi Dahan: the biggest mistakes teams make when applying DDD
http://gojko.net/2010/06/11/udi-dahan-the-biggest-mistakes-teams-make-when-applying-ddd/
under "All rules aren't created equal"
and
Employing the Domain Model Pattern
http://msdn.microsoft.com/en-us/magazine/ee236415.aspx#id0400119
under "Scenarios for Not Using the Domain Model", which touches on the same subject.
How to separate out data access
Loading data through an interface
The "data access layer" has been abstracted through an interface, which you call in order to retrieve required data:
var orderLines = OrderRepository.GetOrderLines(orderId);
foreach (var line in orderLines)
{
total += line.Price;
}
Pros: The interface separates out the "data access" plumbing code, allowing you to still write tests. Data access can be handled on a case by case basis allowing better performance than a generic strategy.
Cons: The calling code must assume what has been loaded and what hasn't.
Say GetOrderLines returns OrderLine objects with a null ProductInfo property for performance reasons. The developer must have intimate knowledge of the code behind the interface.
I've tried this method on real systems. You end up changing the scope of what is loaded all the time in an attempt to fix performance problems. You end up peeking behind the interface to look at the data access code to see what is and isn't being loaded.
Now, separation of concerns should allow the developer to focus on one aspect of the code at one time, as much as is possible. The interface technique removes the HOW is this data loaded, but not HOW MUCH data is loaded, WHEN it is loaded, and WHERE it is loaded.
Conclusion: Fairly low separation!
Lazy Loading
Data is loaded on demand. Calls to load data is hidden within the object graph itself, where accessing a property can cause a sql query to execute before returning the result.
foreach (var line in order.OrderLines)
{
total += line.Price;
}
Pros: The 'WHEN, WHERE, and HOW' of data access is hidden from the developer focusing on domain logic. There is no code in the aggregate that deals with loading data. The amount of data loaded can be the exact amount required by the code.
Cons: When you are hit with a performance problem, it is hard to fix when you have a generic "one size fits all" solution. Lazy loading can cause worse performance overall, and implementing lazy loading may be tricky.
Role Interface/Eager Fetching
Each use case is made explicit via a Role Interface implemented by the aggregate class, allowing for data loading strategies to be handled per use case.
Fetching strategy may look like this:
public class BillOrderFetchingStrategy : ILoadDataFor<IBillOrder, Order>
{
Order Load(string aggregateId)
{
var order = new Order();
order.Data = GetOrderLinesWithPrice(aggregateId);
return order;
}
}
Then your aggregate can look like:
public class Order : IBillOrder
{
void BillOrder(BillOrderCommand command)
{
foreach (var line in this.Data.OrderLines)
{
total += line.Price;
}
etc...
}
}
The BillOrderFetchingStrategy is use to build the aggregate, and then the aggregate does its work.
Pros: Allows for custom code per use case, allowing for optimal performance. Is inline with the Interface Segregation Principle. No complex code requirements. Aggregates unit tests do not have to mimic loading strategy. Generic loading strategy can be used for majority of cases (e.g. a "load all" strategy) and special loading strategies can be implemented when necessary.
Cons: Developer still has to adjust/review fetching strategy after changing domain code.
With the fetching strategy approach you might still find yourself changing custom fetching code for a change in business rules. It's not a perfect separation of concerns but will end up more maintainable and is better than the first option. The fetching strategy does encapsulate the HOW, WHEN and WHERE data is loaded. It has a better separation of concerns, without losing flexibility like the one size fits all lazy loading approach.