NHibernate Linq query evaluation process

I like the Linq provider of NHibernate, but I have encountered weird behavior and had to dive into how does NHibernate creates the SQL queries from the linq queries and creation of Linq provider extensions. If you are only interested how to program NHibernate Linq function  (e.g. how to check if text column corresponds to some regular expression in database), this is not the post to read – read Michaels post instead. This one is about how does NHibernate turns a Linq query into a result of the query.

I will be mostly dealing with a select clause of Linq, but other are quite similar.

Problem

I had a query that checked for entity nullness before creating a derived object using some properties of null-checked entity:

from post in session.Query<Post>()
select new PostModel
{
  Body = post.Body,
  Blog = post.Blog != null ? new EntityReference
    {
      Id = post.Blog.Id,
      Name = post.Blog.Name
    } : null
}

However, the generated SQL had more fields than necessary, in fact, it had all fields of a Blog entity:

-- Cursive are the properties of Blog that are not needed
select
    post0_.Body as col_0_0_,
    blog1_.Id as col_1_0_,
    post0_.Blog as col_2_0_,
    blog1_.Name as col_3_0_,
    blog1_.Id as Id0_,
    blog1_.Name as Name0_,
    blog1_.Subtitle as Subtitle0_,
    blog1_.Created as Created0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

It doesn’t look like a big problem, but that is because this is a demonstration. In reality I have been creating several references and for each entity the query required all its properties. The result was a query with 100+ columns (4 entities * 20-30 columns per entity) and quite slow execution.

I had no idea why that happened, because similar query without a null check worked fine and NHibernate generated SQL without unnecessary fields:

from post in session.Query<Post>()
select new PostModel
{
    Body = post.Body,
    Blog = new EntityReference
        {
            Id = post.Blog.Id,
            Name = post.Blog.Name
        }
}
-- 
select
    post0_.Body as col_0_0_,
    post0_.Blog as col_1_0_,
    blog1_.Name as col_2_0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

Explanation

I had to dive into NHibernate source code, because Google turned up nothing and all books on NHibernate are either old (pre 3.0 =without Linq) or only go through some simple Linq queries. Official NHibernate documentation even mention how to create Linq extensions, much less how does the process works. Yay me.

Well, here is a short overview of how does NH turns a Linq query to a result:

1. Rewriting query to be more HQL friendly

We have a Linq expression (=Linq query) object supplied by user. It’s representations is identical to the representation in the code, which can be hard to process, so NHibernate will take the query and rewrites it a little to be more easily processed. Here is an example of how a query looks before and after rewriting:

/* Source query */
from Post post in value(NHibernate.Linq.NhQueryable[Nerula.Data.Post])
select new PostModel() {
  Body = [post].Body,
  Blog =
    IIF(([post].Blog != null),
    new EntityReference() 
    {
      Id = [post].Blog.Id,   
      Name = [post].Blog.Name    
    },   
    null)
}

/* Rewritten query */
from Post post in value(NHibernate.Linq.NhQueryable[Nerula.Data.Post]) 
from Blog _0 in [post].Blog 
select new PostModel() 
{
  Body = [post].Body,
  Blog = 
    IIF(([_0] != null),
      new EntityReference() 
      {
        Id = [post].Blog.Id, 
        Name = [_0].Name 
      }, 
      null)
}

 2. Find sub-expressions representable in HQL

The Linq query is represented as a tree of subexpression.

For example: new PostModel() {Body = [100001].Body, Blog = IIF(([100001] != null), new EntityReference() {Id = [100001].Blog.Id, Name = [100001].Name}, null)} is a Linq expression of type ExpressionType.MemberInit. It consists from several subexpressions:

  • new PostModel() is an expression of type ExpressionType.New
  • [100001].Body is an expression of type ExpressionType.MemberAccess
  • {IIF(([100001] != null), new EntityReference() {Id = [100001].Blog.Id, Name = [100001].Name}, null)} is an expression of type ExpressionType.Conditional. This expression has its own subexpressions.

It is kind of obvious that expressions form a tree.

NHibernate will recursively visit all linq sub-expressions and gets a list of all expression that can be evaluated in database using HQL select statement – e.g. registered methods, entities and its properties, but not unregistered function calls or constants (e.g. null, 1, “Hello World”, post.Body.Contains(“Hello”)).

Here is a screenshot of all sub-expressions from the query above. The NHibernate extracts following expressions (no method here, but it could be):

NHibernate Linq candidates

BTW here is the magic that allows user to add their own Linq provider extension happens.

3. Replace database subexpressions

Now we have a list of sub-expressions that can be queried directly in the database. We will take our Linq query and rewrite it a little more. NHibernate will go recursively through the Linq expression and rewrites HQL representable expressions with an item from an array. The reason is simple, once it performs the DB query, it will have the result in array and once it performs the Linq query, the result of original subexpression (e.g. post.Body.Contains(“Subexpression”)) will be already done by database and the result will be an item in an array (e.g. true or false). That is the whole point – doing stuff in database.

The select clause will now have stuff queriable in db replaced with an item in an array:

new PostModel() {
  Body = Convert(input[0]), 
  Blog = IIF((Convert(input[1]) != null), 
         new EntityReference() 
         {
           Id = Convert(input[2]), 
           Name = Convert(input[3])
         }, 
         null)
}

4. Perform database query

We have a list of sub-expressions we are interested from step 2, NHibernate will do its magic, query the database and gets result in an input array.

5. Evaluate Linq query

We have a rewritten Linq query from step 3 and we have data used in it from step 4. We can actually evaluate the query! NHibernate does it and returns result.

Back to the problem

Well, that is about it. Where is the problem? Why did I get all these extra columns in my query?

Easy, NHibernate doesn’t recognize operator == in a select clause and therefore it has to load whole entity Blog into a memory (=that is why it loads all properties) where it is compared with null.

I though about why does NHibernate behaves like this, but it seems reasonable after a while. What if the comparison was done between entity and some in-memory instance of entity? What about overloaded operator ==? Database would have no idea what to do, so we can’t have blanket == from NHibernate Linq provider.

Important: This is valid for the Linq select clasue, the where clause doesn’t work like that. The where clause works as you expect, unlike select.

  • If you use post.Blog != null in the Linq where clause, the NHibernate will correctly translate it to SQL  where post0_.Blog is not null.
  • If you use post.Blog == memoryBlog  you get where post0_.Blog=@p0.

Solution

Nope, we are not lost. We can create a custom Linq provider for checking null. The NHibernate will recognize our method EntityState.Exists as a something that can be performed in the database and replace load of a whole entity with an extra column in query:

public static class EntityState {
    public static bool Exists(EntityBase entity) {
        return !ReferenceEquals(entity, null);
    }
}

public class EntityStateGenerator : BaseHqlGeneratorForMethod
{
    public EntityStateGenerator()
    {
        SupportedMethods = new[] { 
            ReflectionHelper.GetMethod(() => EntityState.Exists(null)) 
        };
    }

    public override HqlTreeNode BuildHql(MethodInfo method, Expression targetObject,
        ReadOnlyCollection<Expression> arguments, HqlTreeBuilder treeBuilder, 
        IHqlExpressionVisitor visitor)
    {
        return treeBuilder.IsNotNull(visitor.Visit(arguments[0]).AsExpression());
    }
}

public class MyLinqToHqlGeneratorsRegistry : DefaultLinqToHqlGeneratorsRegistry
{
    public NerulaLinqToHqlGeneratorsRegistry()
    {
        var generator = new EntityStateGenerator();
        foreach (var method in generator.SupportedMethods)
        {
            RegisterGenerator(method, generator);
        }
    }
}

We also have to register the generator (for XML, the property name is linqtohql.generatorsregistry) or you can simple add it to the configuration

configuration.SetProperty(NHibernate.Cfg.Environment.LinqToHqlGeneratorsRegistry, 
  typeof(MyLinqToHqlGeneratorsRegistry).AssemblyQualifiedName);

Now, the query

from post in session.Query<Post>()
select new PostModel
{
    Body = post.Body,
    Blog = EntityState.Exists(post.Blog)
        ? new EntityReference
        {
            Id = post.Blog.Id,
            Name = post.Blog.Name
        }
        : null
}

Will result in the following SQL, because NHibernate will replace subexpression EntityState.Exists(post.Blog) with the result of SQL case statement per step 3,4 and 5:

select
    post0_.Body as col_0_0_,
    case 
        when blog1_.Id is not null then 1 
        else 0 
    end as col_1_0_,
    post0_.Blog as col_2_0_,
    blog1_.Name as col_3_0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

The reason for weird query was quite simple, but pretty understandable. One extra column is OK with me. I just wish it was documented.