MrKWatkins

Improving LINQ To SQL Performance Part 2 - Combining Compiled Queries And Load Options

Welcome to the second part of a series of blog posts discussing methods I've used to improve performance when using LINQ to SQL. In this post I will use the code to change load options from the first part of this series to make working with compiled queries easier.

If you don't know what compiled queries are in LINQ to SQL then you'd probably best go and do a quick search on the internet and find out as there are lots of very good explanations and examples already out there. I've gained some massive performance improvements in my applications by using them to cache the LINQ to SQL query plans. For complex queries often the time taken by LINQ to SQL to generate the plans is much more than the execution of the queries themselves! Compiling complex queries in your applications is definitely something I'd recommend.

One downside with them however is that you have to be careful with load options. Because load options help define the shape of the query they are baked into the query when it is first compiled. Which means if you then come along and use a different set of load options you will get a nice big exception. Combine that with the fact that the load options have to be exactly the same instance, i.e. you can't create a new DataLoadOptions instance that has the same load options, you have to use exactly the same instance and they can start to be a pain. You find that you can't reuse DataContext instances as much because they have to have specific load options and you end up with lots of static fields and load option construction code scattered around your codebase.

However as discussed in the first part of this series it is possible to write extension methods that can change the load options for a DataContext. This makes it easy to ensure you have the correct load options when running your query, just do something like this:

private static readonly Func<DataContext, int, Car>
   SelectCarByIdQuery = CompiledQuery.Compile<DataContext, int, Car>(
      (dataContext, id) => 
         (from car in dataContext.Cars
          where
             car.Id == id
          select car)
         .SingleOrDefault());

private static readonly DataContext SelectCarByIdQueryLoadOptions;

static CarRepository()
{
   SelectCarByIdQueryLoadOptions = new DataLoadOptions();
   SelectCarByIdQueryLoadOptions.LoadWith<Person>(car => car.Owner);
}

// I'm missing the rest of the repository class code here, but you'd probably have
// a DataContext property that I can convieniently use in the following example method...

public Car SelectCarById(int id)
{
   using (DataContext.TemporarilyChangeLoadOptions(SelectCarByIdQueryLoadOptions))
   {
      return SelectCarByIdQuery(DataContext, id);
   }
}

The query will now always use the same load options and we won't get any annoying exceptions. However the above code is a bit long winded. Do we really want to have a static DataLoadOptions instance, code somewhere to create them and a using statement each time we want to use a compiled query? Probably not. How can we improve it? Well the first step would be to have the using statement inside the compiled query function itself. But how? Well if I were a functional programmer then I'd probably do something like this:

public static Func<TDataContext, TArg1, TResult> Compile<TDataContext, TArg1, TResult>(Expression<Func<TDataContext, TArg1, TResult>> query, DataLoadOptions loadOptions)
   where TDataContext : DataContext
{
   var compiledQuery = CompiledQuery.Compile(query);

   return (dataContext, arg1) =>
   {
      using (dataContext.TemporarilyChangeLoadOptions(loadOptions))
      {
         return compiledQuery(dataContext, arg1);
      }
   };
}

The code above constructs a function for us. First it compiles the query we pass in using the normal CompiledQuery.Compile method. It then returns a new function that calls the compiled query function with the load options we have specified. If we put this function into a class called CompiledQueryWrapper then we could change our repository code to something like this:

private static readonly Func<DataContext, int, Car>
   SelectCarByIdQuery = CompiledQueryWrapper.Compile<DataContext, int, Car>(
      (dataContext, id) => 
         (from car in dataContext.Cars
          where
             car.Id == id
          select car)
         .SingleOrDefault(), 
      SelectCarByIdQueryLoadOptions);

private static readonly DataContext SelectCarByIdQueryLoadOptions;

static CarRepository()
{
   SelectCarByIdQueryLoadOptions = new DataLoadOptions();
   SelectCarByIdQueryLoadOptions.LoadWith<Person>(car => car.Owner);
}

public Car SelectCarById(int id)
{
   return SelectCarByIdQuery(DataContext, id);
}

Better, but not perfect. We still need a static field for our LoadOptions instance and code to construct it in the static constructor. Ideally want all the code that constructs our compiled query in one place, including the code to setup the load options. We can do that by getting more functional and adding an overload for our Compile method that accepts a function to make the necessary calls to LoadWith/AssociateWith:

public static Func<TDataContext, TArg1, TResult> Compile<TDataContext, TArg1, TResult>(Expression<Func<TDataContext, TArg1, TResult>> query, Action<DataLoadOptions> loadOptionsBuilder)
   where TDataContext : DataContext
{
   var loadOptions = new DataLoadOptions();
   loadOptionsBuilder(loadOptions);
   return Compile(query, loadOptions);
}

Our repository code can now be rewritten to:

private static readonly Func<DataContext, int, Car>
   SelectCarByIdQuery = CompiledQueryWrapper.Compile<DataContext, int, Car>(
      (dataContext, id) => 
         (from car in dataContext.Cars
          where
             car.Id == id
          select car)
         .SingleOrDefault(), 
       loadOptions => loadOptions.LoadWith<Person>(car => car.Owner));

public Car SelectCarById(int id)
{
   return SelectCarByIdQuery(DataContext, id);
}

Much tidier! The load options are firmly tied to the compiled query.

The only downside to this approach is that we have to create 15 overloads of our CompiledQueryWrapper.Compile method to match the 15 overloads of CompiledQuery.Compile in .NET 4.0... (.NET 3.5 is slightly less work as it has less overloads) However given the core of the methods will be the same we can use T4 to generate the overloads for us. Find a ZIP containing just such a template and the generated code here. The code it generates is a bit 'nicer' than the example code above as it includes parameter checking and some overloads to use the empty load options from the DataContextExtensions in the first part of this series. Feel free to download and use as you please in your own projects. (Just leave the link to this article in the comments please!)

In the next part of this series I will look at some of the drawbacks with using load options and how it's easy to accidently have hundreds of queries run if you're not careful.

Warning! One thing to bear in mind with compiled queries is that they should only be used for fetching data! If you use compiled queries to get entities and then try to update or delete them you will get random errors due to bugs in LINQ to SQL (3.5 at least; haven't tried with .NET 4.0 yet) that stops object tracking working correctly with compiled queries. You have been warned!

Improving LINQ To SQL Performance Part 1 - Changing Load Options

Welcome to the first part of a new series of blog posts where I discuss methods I've used to improve performance when using LINQ to SQL. For the first part I will be discussed changing load options.

If you've used LINQ to SQL then you'l have come across load options. Load options let you specify that you want certain child data loaded along with the parent entities when querying a DataContext. For example if you had a Car entity with an Owner relationship that points to a Person entity then you could do the following to load the Person at the same time as the Car:

var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Car>(car => car.Owner);
myDataContext.LoadOptions = loadOptions;

You can do lots more with them such as only load certain sets of data, etc. I won't bore you with details you probably already know and can find out elsewhere on the interweb if not.

Now one of the problems with load options is that you can't change them for a DataContext instance once you've performed a query with that DataContext. The reason you're not allowed to change the options is to maintain a consistent view of the data, which is fair enough but can be a bit of a pain at times. Often when you come to improve the performance of your LINQ to SQL code you'll find one or two cases where you should use slightly different load options. I'm going to neatly sidestep the ethical debate of whether you should be changing the load options for a DataContext or not and just explain how to do it.

A quick peek under the hood of the DataContext class with Reflector shows that you just need to change the value of the private loadOptions field on your DataContext to change the load options. Simple enough! On top of that you should really call the private Freeze method on the LoadOptions to stop them being changed after being attached to the DataContext. I guess you could miss this call out if you wanted to be able to change the LoadOptions instance, however I think it's probably easier (and safer!) to just set the value again with some new load options. Especially if using compiled queries which are very picky about their load options. (See the next part of this series for more about compiled queries)

How do we change the value of the loadOptions field? Well we could use reflection but it's a little slow. A much better way is to use the new Expression.Assign method in .NET 4 that allow us to create a compiled assignment expression to set the private field. Apart from the initial hit of the compilation using assignment expressions is much, much faster than using reflection. The code to create such a function looks something like this:

// Define expressions for the two parameters.
var dataContext = Expression.Parameter(typeof(DataContext), "dataContext");
var loadOptions = Expression.Parameter(typeof(DataLoadOptions), "loadOptions");

// Define an expression to access the private field.
var field = Expression.Field(dataContext, "loadOptions");

// Define an expression to assign a value to the field. (This is the .NET 4 only bit)
var assign = Expression.Assign(field, loadOptions);

// Build a lambda for the assignment.
var lambda = Expression.Lambda<Action<DataContext, DataLoadOptions>>(assign, dataContext, loadOptions);

// Compile it.
Action<DataContext, DataLoadOptions> changeLoadOptions = lambda.Compile();

If you don't have the luxury of being able to use .NET 4 yet then you can always fall back to using reflection instead. A compiled expression can also be used to call the private Freeze method on the LoadOptions; that can be done in both .NET 3.5 and .NET 4.

Once we have these basic building blocks we can create a set of extension methods that allow us to change load options. Which is exactly what I've done for you here. Feel free to download and use as you please in your own projects. (Just leave the link to this article in the comments please!)

There are extension methods to change or remove (i.e. replace with an empty set) the load options for a data context:

// Change the load options.
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Car>(car => car.Owner);
myDataContext.ChangeLoadOptions(loadOptions);

// Remove the load options.
myDataContext.RemoveLoadOptions();

There are also methods to temporarily change or remove the load options:

// Remove the load options for the duration of the using block.
using (myDataContext.TemporarilyRemoveLoadOptions())
{
   // Queries performed here will not use any load options.
}
// Load options will have been restored at this point.

I recommend you use these temporary methods to change your load options as then you can be quite clear in your code that you're only changing the options for a small period of time, e.g. one query that you need to optimise. Changing the load options willy nilly can easily lead to you forgetting to restore load options and having the wrong load options some of the time, i.e. everything the LINQ to SQL designers stopped happening by not allowing you to change the load options.

In the next part of this series I'll look at compiled queries and how you can use the extension methods above to make them much less of a pain to use.

A Data Annotations Aware Model Binder

The System.ComponentModel.DataAnnotations assembly added with .NET 3.5 SP1 is quite useful. It contains various attributes that allow you to specify various constraints on the properties of your class. Ideal for model classes; constraints on the data are kept inside your model, nice and succinctly. However ASP.NET MVC version 1 does not contain out of the box server side support for them. Version 2 will but I can't wait that long... Whilst you can use something like the excellent xVal for client side validation of data annotations being a good programmer you'll also want to have the same validation server side in case someone turns off JavaScript in their browser.

Sadly I haven't been able to find a good example of how to do this. The examples I have found tend to require manual calls to some method to perform the validation, such as this one in Steve Sanderson's blog about how to use xVal. I don't want to have to do an extra manual call for my validation; I might forget! The ideal would be a model binder that is aware of the data annotations; I could set it as the default binder and forget about it. The only example of a model binder I could find was in the ASP.NET site on CodePlex but that required a newer version of the DataAnnotations assembly (basically the .NET 4.0 one) which seemed like overkill to me. How hard can it be to write one?

Turns out not very... All you have to do is:

  1. Create a class that inherits from DefaultModelBinder.
  2. Override the OnPropertyValidating method. Your version should call the base method to get the errors from the default binder then look for any ValidationAttributes on the property, check if there are errors by calling the IsValid method and add them to the ModelState if there are. Something like this:
    protected override bool OnPropertyValidating(ControllerContext controllerContext, ModelBindingContext bindingContext, PropertyDescriptor propertyDescriptor, object value)
    {
    	// Run everything by the default implementation first before checking data annotations.
    	var isValid = base.OnPropertyValidating(controllerContext, bindingContext, propertyDescriptor, value);
    	
    	// Loop through any validation attributes.
    	foreach (var validationAttribute in propertyDescriptor.Attributes.OfType<ValidationAttribute>())
    	{
    		// Are we valid?
    		if (!validationAttribute.IsValid(value))
    		{
    			// No. Let's add the error to the model state and set our isValid status to false.
    			string key = CreateSubPropertyName(bindingContext.ModelName, propertyDescriptor.Name);
    			
    			// Add the error to the model state.
    			bindingContext.ModelState.AddModelError(key, validationAttribute.FormatErrorMessage(propertyDescriptor.DisplayName)));
    			isValid = false;
    		}
    	}
    	return isValid;
    }
  3. Replace the default model binder with an instance of your new one via the ModelBinders.Binders.DefaultBinder property.

Okay, it's not quite as simple as that. You also have to take into account any MetadataTypeAttributes that might be found on your model class. This attribute allows you to specify another type to put your metadata on. You can specify identical properties on this other type and decorate them with the DataAnnotations attributes instead. Pointless? Well yes, it should be really; why not just decorate your properties? However it comes in useful if you're working with auto-generated code that doesn't allow you to add attributes, such as the evil and crappy LINQ to SQL designer. By using the MetadataTypeAttribute you can create a partial version of your LINQ to SQL entity class, add the attribute to that and specify constraints for the auto generated properties in some other class. (Whether you should be using LINQ to SQL entities as your model is an ethical debate I am going nowhere near)

If you can't be bothered to do all this of course then please feel free to download one I made earlier. This binder takes into account MetadataTypeAttributes and I've even thrown in a demo project so you can see the thing in action. Enjoy!

DataContext Inheritance

LINQ to SQL isn't bad. Apart from a few annoying bugs with the designer (i.e. The designer.cs getting deleted and not regenerated when you update the model if you have a partial code file and it's insistence on adding in connection strings all over the shop being the two biggest) and some missing features (proper many-to-many relationships) it's pretty good. We use it at work and it performs well. One thing that can happen if you're not careful though is you end up with a massive .dbml files that contain all the tables in your database. This can be less than ideal, especially if your application has several areas. For example you might work for a company that makes cars and boats. (Yeah I admit that's quite unlikely but please endulge me...) Both sections of the company would need access to customer information but the car guys wouldn't need details of the boats and vice versa. Rather than have one big DataContext with all the tables in is there anyway we could structure things a bit better?

The short answer is yes, thanks to the 'Base Class' property in the .dbml designer. (Click on the .dbml somewhere and have a look in the Properties window) Make youself a .dbml for your core entities, then make .dbmls for each of your other areas and set the 'Base Class' property for these 'child' .dbmls the core DataContext. So you might have a CoreDataContext that contains Customer entities and then a CarDataContext that has Car entities and a BoatDataContext with Boat entities. Both the CarDataContext and BoatDataContext will have access to the Customer entities but not to each other's entities.

Of course just doing that isn't that much use. The benfit of O/R mappers like LINQ to SQL is the easy use of relationships between entities. So for our Car.dbml to be useful we would like our Car entity to have a reference to a Customer entity. But if we load Car.dbml into our designer we will not see any entities from the parent Core.dbml. So how do we create the relationship?

Manually is the answer. If we inspect the designer file for a .dbml we can copy the code for a similar one-to-many property and tweak it for our use:

public partial class Car
{
    private EntityRef<Customer> _Owner;

    [AssociationAttribute(Name = "Owner_Car", Storage = "_Owner", ThisKey = "OwnerId", OtherKey = "Id", IsForeignKey = true)]
    public Customer Owner
    {
        get
        {
            return _Owner.Entity;
        }
        set
        {
            Customer previousValue = _Owner.Entity;
            if (((previousValue != value) || !_Owner.HasLoadedOrAssignedValue == false))
            {
                SendPropertyChanging();
                if ((previousValue != null))
                {
                    _Owner.Entity = null;
                }
                _Owner.Entity = value;
                if ((value != null))
                {
                    _OwnerId = value.Id;
                }
                else
                {
                    _OwnerId = 0;
                }
                SendPropertyChanged("Owner");
            }
        }
    }
}

We can now have an Owner property on our Car entity that we can use like a normal LINQ to SQL property:

using (var carDataContext = new CarDataContext(Connection))
{
    // Grab our Customer.
    var customer = carDataContext.Customers.Where(c => c.Id == customerId).Single();

    // Create a Car.
    var car = new Car
    {
        Registration = "T1 KARR",
        Owner = customer
    };

    // Save it.
    carDataContext.Cars.InsertOnSubmit(car);
    carDataContext.SubmitChanges();
}

But what about the other way around? What if the boat department what to get a list of all the boats a customer owns? I.e. can we get a Customer.Boats property? The answer to that is not really... The approach I tried was to define a subclass of Customer with the property and then replace the table in the DataContext. I.e. something like:

public sealed class BoatCustomer : Customer
{
    private EntitySet<Boat> _Boats;

    public BoatCustomer()
    {
        _Boats = new EntitySet<Boat>(new Action<Boat>(Attach_Boats), new Action<Boat>(Detach_Boats));
    }

    [AssociationAttribute(Name = "Customer_Boat", Storage = "_Boats", ThisKey = "Id", OtherKey = "OwnerId")]
    public EntitySet<Boat> Boats
    {
        get
        {
            return _Boats;
        }
        set
        {
            _Boats.Assign(value);
        }
    }

    private void Attach_Boats(Boat entity)
    {
        SendPropertyChanging();
        entity.Owner = this;
    }

    private void Detach_Boats(Boat entity)
    {
        SendPropertyChanging();
        entity.Owner = null;
    }
}

public partial class BoatDataContext : CoreDataContext
{
    public new Table<BoatCustomer> Customers
    {
        get
        {
            return GetTable<BoatCustomer>();
        }
    }
}    

Sadly this, and other similar approaches, don't work... InvalidOperationExceptions with messages like "Data member 'Int32 Id' of type 'KWatkins.DataContextInheritance.Customer' is not part of the mapping for type 'BoatCustomer'. Is the member above the root of an inheritance hierarchy?" start getting thrown around the place. The problem is that the attribute mapping doesn't look into base classes for some members. (If you open the lid with Reflector you'll see gratuitous use of the BindingFlags.DeclaredOnly enum value) So what are our alternatives?

  1. Recreate the various properties of Customer entity in the BoatCustomer class. Not great; pretty soon we'd be back to large DataContexts with everything in. Kinda defeats the point of doing this whole DataContext inheritance thing in the first place!
  2. Alter the AttributeMappingSource class so that it looks in base classes. Not really possible; all AttributeMappingSource does is return the class AttributedMetaModel. AttributedMetaModel does all the work but we cannot change it because it is internal.
  3. Create a public version of AttributedMetaModel by using Reflector to get the original code then change it to look in base classes. Sadly not really an option either... There is a lot of code behind AttributedMetaModel that uses a lot of internal stuff from various areas of the framework. I started then gave up...
  4. Create a MetaModel that acts as a wrapper around another MetaModel. We might then be able to wrap it around an AttributedMetaModel instance and intecept the relevant calls and change them accordingly. I didn't even attempt this one; if someone wants to try please be my guest...

In the end I settled on an extension method on Customer that returns a query:

public static IQueryable<Boat> Boats(this Customer customer, BoatDataContext dataContext)
{
    Validate.Argument(customer, "customer").IsNotNull();
    Validate.Argument(dataContext, "dataContext").IsNotNull();

    return from boat in dataContext.Boats
           where
                boat.OwnerId == customer.Id
           select boat;
}

(See this post for details about the Validate class) This method returns a query that returns all the Boats for a Customer. Sadly it's not a proper EntitySet so we can't add and remove members from it. But I haven't found this to be a major problem; I usually use properties like this for querying rather than inserting/deleting so it hasn't bothered me much. It's a shame we have to pass in a BoatDataContext but LINQ to SQL entities don't have a reference to a DataContext so there isn't much we can do.

There you go. By use of inheritance you can help split up your .dbml files and not end up with one massive, unwieldy DataContext. Feel free to download some example source code that demonstrates all of this. Note that I used Visual Studio 2010 to write it though; I wanted to see if some of the LINQ to SQL bugs have been fixed. (The .designer.cs classes getting deleted I mentioned above seems to have been fixed) You'll therefore have to setup your own projects if you want a VS2008 version...