Let’s Look Under the Hood of Doctrine 2

Doctrine 2

Perhaps it is not an exaggeration to say that Doctrine is the most frequently used ORM in the Symfony ecosystem. That’s why for the PHP developer, the mastery of this library is really important. In this article, I propose to look ‘under the hood’ of this Mustang among ORMs and figure out what abstractions and patterns form the base of this library.

I first heard about Doctrine when there was just the 1st version of the library. The speaker I was listening to was complaining what a «baddie» it was and what a headache it was for the team to deal with it on the project. But since then, there’s been so much water under the bridge, and as this article is being written, the third version of the library is being developed. And it differs significantly from what the library looked like initially. Just think, about 600 contributors are working on it, and almost 12 000 commits have been recorded... no doubt, Doctrine is worth your attention.

ORM in General

Before going deeper into Doctrine 2, I, ironically, would like to talk about the problem as the reason. The thing is, program products never appear without a reason, and this reason is mostly some kind of a problem which we have encountered and which slows down the project (and sometimes even makes it costlier due to the code complication and its further support).

The classic example of the web application in terms of components is UI, through which the user interacts with the app, back end containing business logic, and persistence level in which the data are stored and in the capacity of which a relational database (DB) is used.

Taking into account that the relational DB and classic classes in OOP we are familiar with suggest different approaches to data storage and different data management mechanisms, there raises the challenge of synchronization of changes between these levels.

The task of synchronization is rather complicated and contains a large spectrum of subtasks including mapping, calculation of changes, security, optimization activities etc.

ORM (Object Relational Mapper) is that very tool which takes over all these tasks. Besides, it provides a range of additional useful options, such as Events, QueryBuilder, DQL for the fast query formulation, and so on.

ORM

The Architecture of Doctrine 2

In the heart of Doctrine are patterns and abstractions the understanding of which helps to tackle the principles of work of this ORM. Let’s start with the most important one — Data mapper since Doctrine, in general, is the implementation of this pattern.

Data Mapper Pattern

As I have already mentioned, since objective and relational schemas are not identical, there occurs a problem with data exchange between them. The developer who does not benefit from ORM just has no other option than writing uniform, repetitive SQL requests for all the app components. It is highly inconvenient and complicated in terms of support, and thereby expensive.

Data mapper solves this problem by isolation of the instances and the DB relative to each other and uses the Entity as the basic, conceptual abstraction.

In fact, the Entity is an ordinary PHP class where properties correspond the table fields from the database for which Entity was created. And the mapper does all the work on field mapping, changes calculation etc.

Doctrine Data mapper pattern

EntityManager

EntityManager (EM) is a classic implementation of the Facade pattern. Via its API we work with a range of subsystems: Unit Of Work, Query language, and Repository API. Entity management is committed solely via the API of the EntityManager.

Entity States

The entity is always in one of 4 states: New, Managed, Removed or Detached.

If you have just created the entity instance and invoked ‘persist($entity)’, the initial entity state is New, and after ‘persist’ processing (event dispatch, identifier generation) it is changed by the EM for Managed. After that, all the changes in the entity instance will be traced by the EM and, after the flush call, stored in the DB. Manage state is also assigned to the entities which were received from the database.

The Detached state is used not so often, but there are situations when you need to address one and the same entity in the context of different EM — then this state might be useful. The entity in the Detached state is not traced by the EM. It should be stressed that the entity is not removed, just all the changes in the object after the detach($entity) invoke will be in no way reflected in the database even after the flush call. You should also remember that Doctrine allows applying cascade Detach to all related entities (associations).

And finally, Removed state. Obviously, the transition of the entity into this state only has sense when the entity is in the Managed state already. You should keep in mind that after changing the state of the entity for Removed, the EM still traces the changes, but after you invoke flush the entity is removed from the database. By analogy to Detach, you can also manage cascade association detachment.

Identity Map Pattern

Imagine the situation when, for some reason, you need to request one and the same instance from the database twice. Should you address the DB repeatedly? Obviously, it is not practicable, at least in most cases.

It seems to be more logical to use the approach which would allow applying the results from the first sample. It is the Identity Map pattern that is called to fulfill this task.

Let’s consider the example from the official documentation:

public function testIdentityMap()
{
    $objectA = $this->entityManager->find('EntityName', 1);
    $objectB = $this->entityManager->find('EntityName', 1);
 
    $this->assertSame($objectA, $objectB)
}

The second invoke ‘find’ will not lead to DB reapplying. Contrary, the ORM will find the object with ID = 1 in the identity map and will re-enact it.

It’s another thing when categories are used in the sample:

public function testIdentityMapRepositoryFindBy()
{
    $repository = $this->entityManager->getRepository('Person');
    $objectA = $repository->findOneBy(array('name' => 'Benjamin'));
    $objectB = $repository->findOneBy(array('name' => 'Benjamin'));
 
    $this->assertSame($objectA, $objectB);
}

In this case, there will be two database calls. The reason is because Doctrine stores the identity map which is grouped only on the base of the ID entity.

We should mention that some kind of optimization is still applied in this case: after the second DB call, the new instance will not be created. Instead, the instant which has already been persisted will be used.

Lazy Loading Pattern

When we need the information about the entity which contains associations, Doctrine allows choosing whether the instances for the should be created or not.

When is it useful? Imagine you are dealing with a system of a ‘forum’ type where the users can leave comments on the topics. You need to receive the topic entity from the DB, and since the comments are related entities, it can be assumed that Doctrine will call for these data, as well. Obviously, it is an unnecessary, undue data transmission overhead, spare requests, and the app will need more memory to store comment instances.

Doctrine solves this problem via the mechanism of the Lazy Loading. By the way, this mechanism is activated for all associations by default, and in general, there are three available options:

  • LAZY (by default) — when only the managed instance is loaded into the memory, and associations are loaded (loaded) during the first invoke;
  • EAGER — both managed instance and associations are loaded into memory;
  • EXTRA LAZY — in some cases, the loading of all related instances is not viable even in the case of the first invoke. Let’s say, you need just a number of related instances, Collection#count(). To optimize this task, you can apply EXTRA_LAZY option. In this case, while invoking any of the methods mentioned below, Doctrine will not load the entire collection into the memory:
Collection#contains($entity)
Collection#containsKey($key) (available from Doctrine 2.5)
Collection#count()
Collection#get($key) (available from Doctrine 2.4)
Collection#slice($offset, $length = null)

Proxy Pattern

For the Lazy mechanism implementation and partial object problematic management, Doctrine, on the lower levels, in fact, operates proxy instances.

The proxy instance is the instance which is used in the place or instead of the real instance. When I say ‘in the place’, I mean that we can invoke not the original entity instance but its proxy variant from the EM — and use it similarly to the original. How do we benefit?

Let’s turn to the example from documentation. Suppose we know $item identifier and we would like to add it to the collection, preferably without loading this element from the DB, let’s say, as some optimization measure. It is done quite easily:

$item = $em->getReference('MyProject\Model\Item', $itemId);
$cart->addItem($item);

But if we try to invoke any method from $item, its state will be fully initialized from the DB. In this example, $item is a proxy class instance which was generated for the item entity. And pay attention: you don’t need to apply additional logic for the proxy or the real instance, Doctrine does it transparently for us.

Partial Object Problematic

The Partial object is the object whose state hasn’t been fully initialized.

This kind of situation is impossible when you deal with the package since Doctrine initializes the entity fully (apart from associations). But you might like to invoke, in case of optimization, not all entity fields. It is possible for DQL by means of the ‘partial’ keyword:

<?php $q = $em->createQuery("select partial u.{id,name} from MyApp\Domain\User u");

Or directly via the EM:

$reference = $em->getPartialReference('MyApp\Domain\User', 1);

Despite the fact that the documentation recommends not to use partial objects, I see no problem in it in the cases when you just need to transmit exact fields to the client via the REST API and we are sure that the instance will not be used anywhere else.

Transactional Write-behind

Imagine that any time there are changes in the entity properties, it causes the call to the DB. Apparently, in most of the cases, it is unwelcome.

The Transactional write-behind approach allows us to solve this problem at the cost of the setback between the time of the data change and their update in the DB. We can make numerous amendments, add or remove something, but any changes will be stored in the DB only after the flush call.

And Doctrine cares not only about making it in an optimal way. For example, there is a frequent situation when you need to apply a bulk insert, update, or removal of data from the table. It’s a so-called Batch processing. Due to the Transactional write-behind mechanism, Doctrine copes with this task with a maximum effectiveness.

Unit of Work Pattern

When talking about the Unit of Work, we need to mention business transaction. In terms of PHP, it is the time from the runtime invoke till its completion. The task of the Unit of Work is to trace all the app operations which may change the DB within one business transaction. When the business transaction is complete the Unit of Work detects all the changes and enters them to the DB, simultaneously optimizing the process.

Let’s Sum Up

Doctrine is a complex library, and this is but a tiny bit of what can be told about it. But I do hope that this brief review of the basic concepts will jump-start your thorough exploration of this library. Or you will just systemize what you already know about it.

Thank you for reading, waiting for your feedback and suggestions. Best regards!

You may also read here how to boost project productivity with Doctrine 2.

Stfalcon.com has a solid expertise in the development of projects in Symfony and we are always ready to share our experience. Write us — we continually look to broaden our partnerships!