Category Archives: NHibernate

NHibernate Linq query evaluation process

I like the Linq provider of NHibernate, but I have encountered weird behavior and had to dive into how does NHibernate creates the SQL queries from the linq queries and creation of Linq provider extensions. If you are only interested how to program NHibernate Linq function (e.g. how to check if text column corresponds to some regular expression in database), this is not the post to read – read Michaels post instead. This one is about how does NHibernate turns a Linq query into a result of the query.

I will be mostly dealing with a select clause of Linq, but other are quite similar.

Problem

I had a query that checked for entity nullness before creating a derived object using some properties of null-checked entity:

from post in session.Query<Post>()
select new PostModel
{
  Body = post.Body,
  Blog = post.Blog != null ? new EntityReference
    {
      Id = post.Blog.Id,
      Name = post.Blog.Name
    } : null
}

However, the generated SQL had more fields than necessary, in fact, it had all fields of a Blog entity:

-- Cursive are the properties of Blog that are not needed
select
    post0_.Body as col_0_0_,
    blog1_.Id as col_1_0_,
    post0_.Blog as col_2_0_,
    blog1_.Name as col_3_0_,
    blog1_.Id as Id0_,
    blog1_.Name as Name0_,
    blog1_.Subtitle as Subtitle0_,
    blog1_.Created as Created0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

It doesn’t look like a big problem, but that is because this is a demonstration. In reality I have been creating several references and for each entity the query required all its properties. The result was a query with 100+ columns (4 entities * 20-30 columns per entity) and quite slow execution.

I had no idea why that happened, because similar query without a null check worked fine and NHibernate generated SQL without unnecessary fields:

from post in session.Query<Post>()
select new PostModel
{
    Body = post.Body,
    Blog = new EntityReference
        {
            Id = post.Blog.Id,
            Name = post.Blog.Name
        }
}
-- 
select
    post0_.Body as col_0_0_,
    post0_.Blog as col_1_0_,
    blog1_.Name as col_2_0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

Explanation

I had to dive into NHibernate source code, because Google turned up nothing and all books on NHibernate are either old (pre 3.0 =without Linq) or only go through some simple Linq queries. Official NHibernate documentation even mention how to create Linq extensions, much less how does the process works. Yay me.

Well, here is a short overview of how does NH turns a Linq query to a result:

1. Rewriting query to be more HQL friendly

We have a Linq expression (=Linq query) object supplied by user. It’s representations is identical to the representation in the code, which can be hard to process, so NHibernate will take the query and rewrites it a little to be more easily processed. Here is an example of how a query looks before and after rewriting:

/* Source query */
from Post post in value(NHibernate.Linq.NhQueryable[Nerula.Data.Post])
select new PostModel() {
  Body = [post].Body,
  Blog =
    IIF(([post].Blog != null),
    new EntityReference() 
    {
      Id = [post].Blog.Id,   
      Name = [post].Blog.Name    
    },   
    null)
}

/* Rewritten query */
from Post post in value(NHibernate.Linq.NhQueryable[Nerula.Data.Post]) 
from Blog _0 in [post].Blog 
select new PostModel() 
{
  Body = [post].Body,
  Blog = 
    IIF(([_0] != null),
      new EntityReference() 
      {
        Id = [post].Blog.Id, 
        Name = [_0].Name 
      }, 
      null)
}

2. Find sub-expressions representable in HQL

The Linq query is represented as a tree of subexpression.

For example: new PostModel() {Body = [100001].Body, Blog = IIF(([100001] != null), new EntityReference() {Id = [100001].Blog.Id, Name = [100001].Name}, null)} is a Linq expression of type ExpressionType.MemberInit. It consists from several subexpressions:

new PostModel() is an expression of type ExpressionType.New
[100001].Body is an expression of type ExpressionType.MemberAccess
{IIF(([100001] != null), new EntityReference() {Id = [100001].Blog.Id, Name = [100001].Name}, null)} is an expression of type ExpressionType.Conditional. This expression has its own subexpressions.

It is kind of obvious that expressions form a tree.

NHibernate will recursively visit all linq sub-expressions and gets a list of all expression that can be evaluated in database using HQL select statement – e.g. registered methods, entities and its properties, but not unregistered function calls or constants (e.g. null, 1, “Hello World”, post.Body.Contains(“Hello”)).

Here is a screenshot of all sub-expressions from the query above. The NHibernate extracts following expressions (no method here, but it could be):

BTW here is the magic that allows user to add their own Linq provider extension happens.

3. Replace database subexpressions

Now we have a list of sub-expressions that can be queried directly in the database. We will take our Linq query and rewrite it a little more. NHibernate will go recursively through the Linq expression and rewrites HQL representable expressions with an item from an array. The reason is simple, once it performs the DB query, it will have the result in array and once it performs the Linq query, the result of original subexpression (e.g. post.Body.Contains(“Subexpression”)) will be already done by database and the result will be an item in an array (e.g. true or false). That is the whole point – doing stuff in database.

The select clause will now have stuff queriable in db replaced with an item in an array:

new PostModel() {
  Body = Convert(input[0]), 
  Blog = IIF((Convert(input[1]) != null), 
         new EntityReference() 
         {
           Id = Convert(input[2]), 
           Name = Convert(input[3])
         }, 
         null)
}

4. Perform database query

We have a list of sub-expressions we are interested from step 2, NHibernate will do its magic, query the database and gets result in an input array.

5. Evaluate Linq query

We have a rewritten Linq query from step 3 and we have data used in it from step 4. We can actually evaluate the query! NHibernate does it and returns result.

Back to the problem

Well, that is about it. Where is the problem? Why did I get all these extra columns in my query?

Easy, NHibernate doesn’t recognize operator == in a select clause and therefore it has to load whole entity Blog into a memory (=that is why it loads all properties) where it is compared with null.

I though about why does NHibernate behaves like this, but it seems reasonable after a while. What if the comparison was done between entity and some in-memory instance of entity? What about overloaded operator ==? Database would have no idea what to do, so we can’t have blanket == from NHibernate Linq provider.

Important: This is valid for the Linq select clasue, the where clause doesn’t work like that. The where clause works as you expect, unlike select.

If you use post.Blog != null in the Linq where clause, the NHibernate will correctly translate it to SQL where post0_.Blog is not null.
If you use post.Blog == memoryBlog you get where post0_.Blog=@p0.

Solution

Nope, we are not lost. We can create a custom Linq provider for checking null. The NHibernate will recognize our method EntityState.Exists as a something that can be performed in the database and replace load of a whole entity with an extra column in query:

public static class EntityState {
    public static bool Exists(EntityBase entity) {
        return !ReferenceEquals(entity, null);
    }
}

public class EntityStateGenerator : BaseHqlGeneratorForMethod
{
    public EntityStateGenerator()
    {
        SupportedMethods = new[] { 
            ReflectionHelper.GetMethod(() => EntityState.Exists(null)) 
        };
    }

    public override HqlTreeNode BuildHql(MethodInfo method, Expression targetObject,
        ReadOnlyCollection<Expression> arguments, HqlTreeBuilder treeBuilder, 
        IHqlExpressionVisitor visitor)
    {
        return treeBuilder.IsNotNull(visitor.Visit(arguments[0]).AsExpression());
    }
}

public class MyLinqToHqlGeneratorsRegistry : DefaultLinqToHqlGeneratorsRegistry
{
    public NerulaLinqToHqlGeneratorsRegistry()
    {
        var generator = new EntityStateGenerator();
        foreach (var method in generator.SupportedMethods)
        {
            RegisterGenerator(method, generator);
        }
    }
}

We also have to register the generator (for XML, the property name is linqtohql.generatorsregistry) or you can simple add it to the configuration

configuration.SetProperty(NHibernate.Cfg.Environment.LinqToHqlGeneratorsRegistry, 
  typeof(MyLinqToHqlGeneratorsRegistry).AssemblyQualifiedName);

Now, the query

from post in session.Query<Post>()
select new PostModel
{
    Body = post.Body,
    Blog = EntityState.Exists(post.Blog)
        ? new EntityReference
        {
            Id = post.Blog.Id,
            Name = post.Blog.Name
        }
        : null
}

Will result in the following SQL, because NHibernate will replace subexpression EntityState.Exists(post.Blog) with the result of SQL case statement per step 3,4 and 5:

select
    post0_.Body as col_0_0_,
    case 
        when blog1_.Id is not null then 1 
        else 0 
    end as col_1_0_,
    post0_.Blog as col_2_0_,
    blog1_.Name as col_3_0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

The reason for weird query was quite simple, but pretty understandable. One extra column is OK with me. I just wish it was documented.

Using NHibernate readonly property accessor

Recently I needed to create a string combined from several columns in the database into a single field. More precisely, I had an User entity with a FirstName and Surname properties and I needed the FullName property for filling the data into a ViewModel. The entity itself has many properties and its hydration is quite slow.

public class User : EntityBase
{
    public virtual string FirstName {get;set;}
    public virtual string Surname {get;set;}
    public virtual string FullName
    {
        get { return string.Format("{0} {1}", FirstName, Surname); }
    }
    // many other properties
}

Querying a view model

I wanted to create a view model in single linq query. The query will get from database only the fields that are necessary, not whole entities (e.g. only Title and).

var viewModelQuery = 
    from post in session.Query<Post>()
    where post.Id == postId
    select new PostViewModel
    {
        Title = post.Title,
        Text = post.Body,
        CreatedBy = string.Format("{0} {1}", 
            post.CreatedBy.FirstName, post.CreatedBy.Surname)
    };

The corresponsing query looks like this – notice it is a single query that gets only what is needed, no unnecessary properties of the User or the Post are fetched. This is the reason why I really like the NHibernate Linq provider as opposed to Criteria API or QueryOver.

select
    post0_.Title as col_0_0_,
    post0_.Body as col_1_0_,
    user1_.first_name as col_2_0_,
    user1_.surname as col_3_0_ 
from
    Post post0_ 
left outer join
    User user1_ 
        on post0_.CreatedBy=user1_.Id 
where
    post0_.Id=@p0;

However, there is a flaw: The string.Format creation of the CreatedBy. I would like to use FullName property, but that is not possible, because it is not mapped and using CreatedBy = post.CreatedBy.FullName would throw a mapping exception.

Querying full name

We can get around that using a formula in the mapping that has same result as the property in the entity class:

<class name="User">
    <id name="Id">
      <generator class="increment" />
    </id>
    <property name="FirstName" column="first_name" />
    <property name="Surname" column="surname" />
    <property name="FullName" access="readonly" formula="(first_name || ' ' || surname)" />
</class>

Notice use of readonly access. It was implemented in the NH-1621. It is an accessor used for querying in the database and thanks to that we can use it directly in the query, the prerequisites are existing property with getter (no need for setter) and semantically same as the column in the database. In our case, the property in not mapped to a column, but to a formula, but result is same: We can use it directly in the query:

var viewModelQuery = 
    from post in session.Query<Post>()
    where post.Id == postId
    select new PostViewModel
    {
        Title = post.Title,
        Text = post.Body,
        CreatedBy = post.CreatedBy.FullName
    };
var viewModel = viewModelQuery.Single();
Assert.AreEqual("Post title", viewModel.Title);
Assert.AreEqual("Text of the post", viewModel.Text);
Assert.AreEqual("John Smith", viewModel.CreatedBy)

The SQL query uses formula instead of two separate columns and as before it does it without getting unnecessary properties:

select
    post0_.Title as col_0_0_,
    post0_.Body as col_1_0_,
    (user1_.first_name || ' ' || user1_.surname) as col_2_0_ 
from
    Post post0_ 
left outer join
    User user1_ 
        on post0_.CreatedBy=user1_.Id 
where
    post0_.Id=@p0

Now we can safely add a middle name or reverse order of full name simply by modifying a formula in the mapping file and the getter property and the change will appear everywhere as opposed to modifying the string.Format method.

Synchronize tag in sql-query of NHibernate

I have been trying to write rather complex query using Linq To NHibernate, I gave up and wrote it in the SQL instead. While writing the query in <sql-query> tag, the Visual Studio had offered me a <synchronize> tag. I had no idea what the tag was for.

The tag itself has one required attribute table, so it is used like <synchronize table="SOME_TABLE" />.

The NHibernate is infamous for its quality of official documentation, so I obviously had no luck there. Not even Hibernate was much more forthcoming. Open source to the rescue: In the Beings.hbm.xml of one NH test was following comment:

(2) use of to ensure that auto-flush happens
correctly, and that queries against the derived entity
do not return stale data

The test itself was for a read-only entity created from select (interesting feature I didn’t know about), not a SQL query, but I hoped it would be similar.

I have opened my project for testing the NHibernate (Nerula) and started writing code and observing how does the tag behave.

Observations:

The purpose of the <synchronize> tag is to flush entities that use the specified table before running the query, but it doesn’t flush entities unrelated to the table. The secondary objective is for notifying the second level cache about update/delete sql queries, but that is for another post.

To demonstrate the concept, I have created three identical sql-queries, each with different synchronize tags:

 <sql-query name="DeleteProjectSync">
  <![CDATA[ delete from Project where Code = :Code ]]>
  <query-param name="Code" type="String" />
  <synchronize table="Project"/>
</sql-query>

<sql-query name="DeleteProject">
  <![CDATA[ delete from Project where Code = :Code ]]>
  <query-param name="Code" type="String" />
</sql-query>

<sql-query name="DeleteProjectSyncWrongTable">
  <![CDATA[ delete from Project where Code = :Code ]]>
  <query-param name="Code" type="String" />
  <!-- We are flushing the wrong table -->
  <synchronize table="Blog"/>
</sql-query>

I have loaded and modified a Code property of a Project entity and my queries will try to delete the modified project row from the database.

var project = session.Query().First();
var originalCode = project.Code;
project.Code = "Test";

So what happend when I tried to delete a Project with Code “Test”? Let’s go over each query.

var deletedCount = session.GetNamedQuery("DeleteProject").SetString("Code", "Test").ExecuteUpdate();
Assert.AreEqual(0, deletedCount);

In the query without any synchronization, the project is not deleted (ExecuteUpdate returns number of modified/deleted rows), because the project is modified only in the session, it is not synced with the database and thus the SQL query can’t find a project with specified code. If I manually flush the session (session.Flush()) before running the query, the record is deleted.

var deletedCount = session.GetNamedQuery("DeleteProjectSync").SetString("Code", "Test").ExecuteUpdate();
Assert.AreEqual(1, deletedCount);

In second query, I use tag for the table from which we are deleting. All entities using the table Project are flushed to the database and we actually delete one record with our specified Code.

var deletedCount = session.GetNamedQuery("DeleteProjectSyncWrongTable").SetString("Code", "Test").ExecuteUpdate();
Assert.AreEqual(0, deletedCount);

Just to make sure, we will try with a third query that has a synchronize tag, but for wrong table. This is to demonstrate that we don’t flush all entities in session, in this case we also delete zero rows, because the updated code is still not in db. If I use session.Flush(), the row is deleted.

I hope this sufficiently demonstrates the concept.

This is all nice and well, but what about 2nd level cache? When and how is it updated? I am quite sure (courtesy of BulkOperationCleanupAction) that entities affected by tables in sycnhronize tag are also evicted from 2nd level cache, but I haven’t tested it yet.

Update:

The <synchronize> tag respects the FlushMode of the session. If you set FlushMode to Always, all queries will perform delete, if you set FlushMode to Commit, all will fail, including the DeleteProjectSync.

Jan Havlíček

Abandoned and dying code