I like the Linq provider of NHibernate, but I have encountered weird behavior and had to dive into how does NHibernate creates the SQL queries from the linq queries and creation of Linq provider extensions. If you are only interested how to program NHibernate Linq function (e.g. how to check if text column corresponds to some regular expression in database), this is not the post to read – read Michaels post instead. This one is about how does NHibernate turns a Linq query into a result of the query.

I will be mostly dealing with a select clause of Linq, but other are quite similar.

Problem

I had a query that checked for entity nullness before creating a derived object using some properties of null-checked entity:

from post in session.Query<Post>()
select new PostModel
{
  Body = post.Body,
  Blog = post.Blog != null ? new EntityReference
    {
      Id = post.Blog.Id,
      Name = post.Blog.Name
    } : null
}

However, the generated SQL had more fields than necessary, in fact, it had all fields of a Blog entity:

-- Cursive are the properties of Blog that are not needed
select
    post0_.Body as col_0_0_,
    blog1_.Id as col_1_0_,
    post0_.Blog as col_2_0_,
    blog1_.Name as col_3_0_,
    blog1_.Id as Id0_,
    blog1_.Name as Name0_,
    blog1_.Subtitle as Subtitle0_,
    blog1_.Created as Created0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

It doesn’t look like a big problem, but that is because this is a demonstration. In reality I have been creating several references and for each entity the query required all its properties. The result was a query with 100+ columns (4 entities * 20-30 columns per entity) and quite slow execution.

I had no idea why that happened, because similar query without a null check worked fine and NHibernate generated SQL without unnecessary fields:

from post in session.Query<Post>()
select new PostModel
{
    Body = post.Body,
    Blog = new EntityReference
        {
            Id = post.Blog.Id,
            Name = post.Blog.Name
        }
}
-- 
select
    post0_.Body as col_0_0_,
    post0_.Blog as col_1_0_,
    blog1_.Name as col_2_0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

Explanation

I had to dive into NHibernate source code, because Google turned up nothing and all books on NHibernate are either old (pre 3.0 =without Linq) or only go through some simple Linq queries. Official NHibernate documentation even mention how to create Linq extensions, much less how does the process works. Yay me.

Well, here is a short overview of how does NH turns a Linq query to a result:

1. Rewriting query to be more HQL friendly

We have a Linq expression (=Linq query) object supplied by user. It’s representations is identical to the representation in the code, which can be hard to process, so NHibernate will take the query and rewrites it a little to be more easily processed. Here is an example of how a query looks before and after rewriting:

/* Source query */
from Post post in value(NHibernate.Linq.NhQueryable[Nerula.Data.Post])
select new PostModel() {
  Body = [post].Body,
  Blog =
    IIF(([post].Blog != null),
    new EntityReference() 
    {
      Id = [post].Blog.Id,   
      Name = [post].Blog.Name    
    },   
    null)
}

/* Rewritten query */
from Post post in value(NHibernate.Linq.NhQueryable[Nerula.Data.Post]) 
from Blog _0 in [post].Blog 
select new PostModel() 
{
  Body = [post].Body,
  Blog = 
    IIF(([_0] != null),
      new EntityReference() 
      {
        Id = [post].Blog.Id, 
        Name = [_0].Name 
      }, 
      null)
}

2. Find sub-expressions representable in HQL

The Linq query is represented as a tree of subexpression.

For example: new PostModel() {Body = [100001].Body, Blog = IIF(([100001] != null), new EntityReference() {Id = [100001].Blog.Id, Name = [100001].Name}, null)} is a Linq expression of type ExpressionType.MemberInit. It consists from several subexpressions:

new PostModel() is an expression of type ExpressionType.New
[100001].Body is an expression of type ExpressionType.MemberAccess
{IIF(([100001] != null), new EntityReference() {Id = [100001].Blog.Id, Name = [100001].Name}, null)} is an expression of type ExpressionType.Conditional. This expression has its own subexpressions.

It is kind of obvious that expressions form a tree.

NHibernate will recursively visit all linq sub-expressions and gets a list of all expression that can be evaluated in database using HQL select statement – e.g. registered methods, entities and its properties, but not unregistered function calls or constants (e.g. null, 1, “Hello World”, post.Body.Contains(“Hello”)).

Here is a screenshot of all sub-expressions from the query above. The NHibernate extracts following expressions (no method here, but it could be):

BTW here is the magic that allows user to add their own Linq provider extension happens.

3. Replace database subexpressions

Now we have a list of sub-expressions that can be queried directly in the database. We will take our Linq query and rewrite it a little more. NHibernate will go recursively through the Linq expression and rewrites HQL representable expressions with an item from an array. The reason is simple, once it performs the DB query, it will have the result in array and once it performs the Linq query, the result of original subexpression (e.g. post.Body.Contains(“Subexpression”)) will be already done by database and the result will be an item in an array (e.g. true or false). That is the whole point – doing stuff in database.

The select clause will now have stuff queriable in db replaced with an item in an array:

new PostModel() {
  Body = Convert(input[0]), 
  Blog = IIF((Convert(input[1]) != null), 
         new EntityReference() 
         {
           Id = Convert(input[2]), 
           Name = Convert(input[3])
         }, 
         null)
}

4. Perform database query

We have a list of sub-expressions we are interested from step 2, NHibernate will do its magic, query the database and gets result in an input array.

5. Evaluate Linq query

We have a rewritten Linq query from step 3 and we have data used in it from step 4. We can actually evaluate the query! NHibernate does it and returns result.

Back to the problem

Well, that is about it. Where is the problem? Why did I get all these extra columns in my query?

Easy, NHibernate doesn’t recognize operator == in a select clause and therefore it has to load whole entity Blog into a memory (=that is why it loads all properties) where it is compared with null.

I though about why does NHibernate behaves like this, but it seems reasonable after a while. What if the comparison was done between entity and some in-memory instance of entity? What about overloaded operator ==? Database would have no idea what to do, so we can’t have blanket == from NHibernate Linq provider.

Important: This is valid for the Linq select clasue, the where clause doesn’t work like that. The where clause works as you expect, unlike select.

If you use post.Blog != null in the Linq where clause, the NHibernate will correctly translate it to SQL where post0_.Blog is not null.
If you use post.Blog == memoryBlog you get where post0_.Blog=@p0.

Solution

Nope, we are not lost. We can create a custom Linq provider for checking null. The NHibernate will recognize our method EntityState.Exists as a something that can be performed in the database and replace load of a whole entity with an extra column in query:

public static class EntityState {
    public static bool Exists(EntityBase entity) {
        return !ReferenceEquals(entity, null);
    }
}

public class EntityStateGenerator : BaseHqlGeneratorForMethod
{
    public EntityStateGenerator()
    {
        SupportedMethods = new[] { 
            ReflectionHelper.GetMethod(() => EntityState.Exists(null)) 
        };
    }

    public override HqlTreeNode BuildHql(MethodInfo method, Expression targetObject,
        ReadOnlyCollection<Expression> arguments, HqlTreeBuilder treeBuilder, 
        IHqlExpressionVisitor visitor)
    {
        return treeBuilder.IsNotNull(visitor.Visit(arguments[0]).AsExpression());
    }
}

public class MyLinqToHqlGeneratorsRegistry : DefaultLinqToHqlGeneratorsRegistry
{
    public NerulaLinqToHqlGeneratorsRegistry()
    {
        var generator = new EntityStateGenerator();
        foreach (var method in generator.SupportedMethods)
        {
            RegisterGenerator(method, generator);
        }
    }
}

We also have to register the generator (for XML, the property name is linqtohql.generatorsregistry) or you can simple add it to the configuration

configuration.SetProperty(NHibernate.Cfg.Environment.LinqToHqlGeneratorsRegistry, 
  typeof(MyLinqToHqlGeneratorsRegistry).AssemblyQualifiedName);

Now, the query

from post in session.Query<Post>()
select new PostModel
{
    Body = post.Body,
    Blog = EntityState.Exists(post.Blog)
        ? new EntityReference
        {
            Id = post.Blog.Id,
            Name = post.Blog.Name
        }
        : null
}

Will result in the following SQL, because NHibernate will replace subexpression EntityState.Exists(post.Blog) with the result of SQL case statement per step 3,4 and 5:

select
    post0_.Body as col_0_0_,
    case 
        when blog1_.Id is not null then 1 
        else 0 
    end as col_1_0_,
    post0_.Blog as col_2_0_,
    blog1_.Name as col_3_0_ 
from
    Post post0_ 
left outer join
    Blog blog1_ 
        on post0_.Blog=blog1_.Id

The reason for weird query was quite simple, but pretty understandable. One extra column is OK with me. I just wish it was documented.

Recently I needed to create a string combined from several columns in the database into a single field. More precisely, I had an User entity with a FirstName and Surname properties and I needed the FullName property for filling the data into a ViewModel. The entity itself has many properties and its hydration is quite slow.

public class User : EntityBase
{
    public virtual string FirstName {get;set;}
    public virtual string Surname {get;set;}
    public virtual string FullName
    {
        get { return string.Format("{0} {1}", FirstName, Surname); }
    }
    // many other properties
}

Querying a view model

I wanted to create a view model in single linq query. The query will get from database only the fields that are necessary, not whole entities (e.g. only Title and).

var viewModelQuery = 
    from post in session.Query<Post>()
    where post.Id == postId
    select new PostViewModel
    {
        Title = post.Title,
        Text = post.Body,
        CreatedBy = string.Format("{0} {1}", 
            post.CreatedBy.FirstName, post.CreatedBy.Surname)
    };

The corresponsing query looks like this – notice it is a single query that gets only what is needed, no unnecessary properties of the User or the Post are fetched. This is the reason why I really like the NHibernate Linq provider as opposed to Criteria API or QueryOver.

select
    post0_.Title as col_0_0_,
    post0_.Body as col_1_0_,
    user1_.first_name as col_2_0_,
    user1_.surname as col_3_0_ 
from
    Post post0_ 
left outer join
    User user1_ 
        on post0_.CreatedBy=user1_.Id 
where
    post0_.Id=@p0;

However, there is a flaw: The string.Format creation of the CreatedBy. I would like to use FullName property, but that is not possible, because it is not mapped and using CreatedBy = post.CreatedBy.FullName would throw a mapping exception.

Querying full name

We can get around that using a formula in the mapping that has same result as the property in the entity class:

<class name="User">
    <id name="Id">
      <generator class="increment" />
    </id>
    <property name="FirstName" column="first_name" />
    <property name="Surname" column="surname" />
    <property name="FullName" access="readonly" formula="(first_name || ' ' || surname)" />
</class>

Notice use of readonly access. It was implemented in the NH-1621. It is an accessor used for querying in the database and thanks to that we can use it directly in the query, the prerequisites are existing property with getter (no need for setter) and semantically same as the column in the database. In our case, the property in not mapped to a column, but to a formula, but result is same: We can use it directly in the query:

var viewModelQuery = 
    from post in session.Query<Post>()
    where post.Id == postId
    select new PostViewModel
    {
        Title = post.Title,
        Text = post.Body,
        CreatedBy = post.CreatedBy.FullName
    };
var viewModel = viewModelQuery.Single();
Assert.AreEqual("Post title", viewModel.Title);
Assert.AreEqual("Text of the post", viewModel.Text);
Assert.AreEqual("John Smith", viewModel.CreatedBy)

The SQL query uses formula instead of two separate columns and as before it does it without getting unnecessary properties:

select
    post0_.Title as col_0_0_,
    post0_.Body as col_1_0_,
    (user1_.first_name || ' ' || user1_.surname) as col_2_0_ 
from
    Post post0_ 
left outer join
    User user1_ 
        on post0_.CreatedBy=user1_.Id 
where
    post0_.Id=@p0

Now we can safely add a middle name or reverse order of full name simply by modifying a formula in the mapping file and the getter property and the change will appear everywhere as opposed to modifying the string.Format method.

Jan Havlíček

Abandoned and dying code

Monthly Archives: October 2014

NHibernate Linq query evaluation process