Solving Entity Framework Inheritance Performance Issues, with a custom QueryProvider
Posted: September 4th, 2009 | Author: jeswinpk | Filed under: general | 4 Comments »Entity Framework Inheritance
We had used the Entity Framework in our projects earlier, but it was only recently that we tried its Inheritance support (or the lack of it, as we will soon see).
Here are the different Inheritance Types supported by the Entity Framework:
Table Per Hierarchy (TPH) – With this approach, we create a single “Super Wide Table” for a hierarchy, containing the fields required for all classes in an inheritance hierarchy. If we had a Lawyer, Physician and an Artist, all inheriting from a Person class, we would create a table containing all fields from each of these types.
Huh! I am left wondering why this approach is even supported in the first place. Not only does this give a horrible database schema, you will also have to make compromises like making non-nullable fields nullable.
Table Per Type (TPT) – TPT gives you a good database schema with normalization, plus inheritance. In fact, with TPT the schema is very similar to an approach using Composition instead of Inheritance. This was the approach we adopted for the application early on, since there seemed to be nothing wrong with it.
But when we finished modeling the domain and hit the code, we got a bad surprise. The TPT implementation in Entity Framework is so slow that it can only be called broken. A simple join on an inherited table gave me a 3200 line query, 170KB in size, and it took 15 seconds for Entity Framework to compile the expression to SQL. And this was just for _generating_ the SQL query!
There are some discussions on this, but nobody seemed to have an answer. There is no way this code can go anywhere near a production server.
One option was to avoid Inheritance and use Composition. Using Composition leads to code that makes less semantic sense; if you had to access a lawyer’s FirstName you had to call – Lawyer.User.FirstName. And base class methods that should be available in the Lawyer class now belong to the the User inside Lawyer. It gets worse when Lawyer inherits from another class, say LegalWorker which in turn inherits from User. We decided not to go this route.
Option 2: Translating Inheritance to Composition at runtime
We know that the Entity Framework handles composition queries without (too many) performance issues. If we can transform our queries involving inheritance into ones having only composition, Entity Framework can do the rest without measurable loss of performance.
To do that, while we design our Entity classes using the EDMX Designer we will avoid using inheritance. If ideally a Lawyer is a User, we use composition instead of inheritance and say Lawyer has a User.

Sounds ugly? Yes it is. But don’t worry, we will not be using these classes in code. They are being used only to make Entity Framework do the heavy lifting. Alright. Let us assume that we have created our entire domain model using the EDMX designer.
The next task is to define the same set of classes, but using inheritance. These are the classes we will be using while writing code.
Fig: Relation between classes with inheritance, and those with composition.
Auto-generating Models from EDMX
Manually modifying our Models every time we make a modification to the database or the EDMX is cumbersome and error prone. We wrote a code-generator for this, which analyzes EDMX and outputs C# classes with Inheritance.
For the figure represented above, the code generator creates
public class User
{
public string FirstName { get; set; }
……..
}
public class Lawyer : User
{
public ICollection<Customer> Customers { get; set; }
……..
}
This code generator is available under AgileFx.Tools.
Now we can write queries like:
from lawyer in Lawyers
where lawyer.FirstName == “Hemchand”
select lawyer;
//although FirstName did not belong to Lawyer in the Entity Framework Designer generated model.
But that was the easy part. If we pass this expression (compiler converts this query to an expression tree) to the Entity Framework, it will not know what to do with it. This is because the classes used in the expression are not classes that the Entity Framework can translate to an SQL query. The Lawyer class referenced in the expression is just a mirror (albeit with inheritance) of the EF.Lawyer class, which is the one the Entity Framework is aware of.
We need to write a Query Translator, to convert an expression referencing our new classes into ones that reference Entity Framework Designer-Generated classes (ie, from App.Models.Lawyer to App.Models.EF.Lawyer).
Writing the Query Translator
How it works
Let us consider a simple Where query on the Entity set lawyers.
lawyers.Where(lawyer => lawyer.FirstName == “Hemchand”)
1. The compiler translates lawyer => lawyer.FirstName == “Hemchand” into an expression, say EXP.
2. lawyers.Where is an extension method defined in System.Linq.Queryable
so the actual call is – Queryable.Where(lawyers, EXP)
3. “Queryable.Where” calls lawyers.Provider.CreateQuery(expression-representing-the-Where-call, EXP, …)
4. Since we implemented our own provider to replace Entity Framework’s Query Provider, Provider is an instance of our CustomQueryProvider. Once we receive EXP in CustomQueryProvider.CreateQuery() we parse EXP using a Query ExpressionVisitor and rewrite the query after converting references of lawyer.FirstName to eflawyer.User.FirstName.

References
If you are writing your own IQueryable/IQueryProvider implementations, this 15-post series by Matt Warren will be very helpful. Actually, it is not just helpful; it is the TextBook for writing your own Query Providers. Matt has also made the code available on CodePlex (IQToolkit), which we have re-used in our implementation.
If you want to learn more about LINQ and Querying itself, you should see this series by Wes Dyer. Reading this blog is highly recommended, even if you are not writing an IQueryable Provider. These posts are profoundly insightful.
Shortcomings of the Translation
When we ask Entity Framework to fetch a User, it fetches an instance of the actual type of the User; like a Lawyer. (Trying to get this working is what possibly caused the performance issue in Entity Framework; just guessing here).
That is,
User u = <some_query>;
Now u.GetType() will get you Lawyer, Artist etc.
With our translation layer, you get an instance of User when we issue the same query. Based on the a field like UserType we have to fetch the actual Lawyer or the Artist.
There are two reasons why this is not so bad; one is that you probably want just the User (base class) since that was what you queried for. The second reason is that there is no way we can fetch the actual instance without querying all the tables in the inheritance hierarchy, and the associated performance problems that come with it.
Some useful side effects
By implementing Query Translation, our framework became conceptually ORM Framework independent. We can now translate queries to any framework. Presently we support only Entity Framework, but the next goal is to support Linq to Sql.
Someone fix the Entity Framework!
Finally, some complaining. Considering that the Entity Framework shipped with .Net Framework SP1, one would expect it to be sufficiently tested. But it seems not. If inheritance is a supported feature, how is it that even the simplest queries involving inheritance result in 15 second query generation times?
On the other hand, Linq to Sql is awesome. It is lightweight, and generated simpler SQL. Which is why we are happy that our code base is not ORM dependent anymore. There is some talk of Linq to Sql being abandoned, which I hope is not true. Many of us need just a lightweight ORM, and Linq to Sql is just that.
Thanks.
AgileFx is not yet ready for public release or even a BETA, but you could take a look at it on codeplex - http://agilefx.codeplex.com/
Credits:
Most of the code for the Query Translator was written by our in-house C# grandmaster Hemchand T . If something blows up, catch him. ;)
– Jeswin P

Jeswin,
I think this is an assumption that might not be right that I seen in your blog:
“there is no way we can fetch the actual instance without querying all the tables in the inheritance hierarchy, and the associated performance problems that come with it.”
Why???
1 Super common scenario is fetching one of these items by it’s PK (Primary key) which would be unique in either TPT or TPH due to the base table PK.
So, Assuming we have a primary key (easy logic to check), it would make more sense to build a query that requested just the fields used in that Type’s model and use the primary key for the selection.
i.e. generate out select field1, field2 from baseObjTable where pkId = 1234
Maybe I’m missing something in how the underlying design works (haven’t read agilefx’s code) I don’t see why this scenario wouldn’t be possible and save a ton of hassle.
Even if we fixed just the case of being query’d by the PK that would fix a ton of performance related problems out there
I’m in the middle of trying to solve my nasty query the system is generating to solve that performance problem. I also don’t get why I can’t use a type discriminator column like a objectTypeFKID in TPT inheritance..
I’ve got the 2010 beta 2 but it still generates bad queries.
Thanks,
Bob
Bob,
AgileFx has grown to be larger than what it was when this post was written. We gave up on Entity Framework as our preferred backend, because it has ridiculous performance in the current version.
Instead we implemented inheritance and many-to-many on top of Linq-To-Sql. This is now super fast – our unit test suite not takes 7 seconds instead of 180 seconds on Entity Framework.
I checked out your code but was having a hard time figuring out how to use it not ire if there is a guide anywhere yet or if this is too early?
I built our whole design on tpt and getting killed by the bad queries
Bob, AgileFx will be released on Nov 14th with examples and a Visual Studio add-in to make things easier.