MrKWatkins

Improving LINQ To SQL Performance Part 2 - Combining Compiled Queries And Load Options

Welcome to the second part of a series of blog posts discussing methods I've used to improve performance when using LINQ to SQL. In this post I will use the code to change load options from the first part of this series to make working with compiled queries easier.

If you don't know what compiled queries are in LINQ to SQL then you'd probably best go and do a quick search on the internet and find out as there are lots of very good explanations and examples already out there. I've gained some massive performance improvements in my applications by using them to cache the LINQ to SQL query plans. For complex queries often the time taken by LINQ to SQL to generate the plans is much more than the execution of the queries themselves! Compiling complex queries in your applications is definitely something I'd recommend.

One downside with them however is that you have to be careful with load options. Because load options help define the shape of the query they are baked into the query when it is first compiled. Which means if you then come along and use a different set of load options you will get a nice big exception. Combine that with the fact that the load options have to be exactly the same instance, i.e. you can't create a new DataLoadOptions instance that has the same load options, you have to use exactly the same instance and they can start to be a pain. You find that you can't reuse DataContext instances as much because they have to have specific load options and you end up with lots of static fields and load option construction code scattered around your codebase.

However as discussed in the first part of this series it is possible to write extension methods that can change the load options for a DataContext. This makes it easy to ensure you have the correct load options when running your query, just do something like this:

private static readonly Func<DataContext, int, Car>
   SelectCarByIdQuery = CompiledQuery.Compile<DataContext, int, Car>(
      (dataContext, id) => 
         (from car in dataContext.Cars
          where
             car.Id == id
          select car)
         .SingleOrDefault());

private static readonly DataContext SelectCarByIdQueryLoadOptions;

static CarRepository()
{
   SelectCarByIdQueryLoadOptions = new DataLoadOptions();
   SelectCarByIdQueryLoadOptions.LoadWith<Person>(car => car.Owner);
}

// I'm missing the rest of the repository class code here, but you'd probably have
// a DataContext property that I can convieniently use in the following example method...

public Car SelectCarById(int id)
{
   using (DataContext.TemporarilyChangeLoadOptions(SelectCarByIdQueryLoadOptions))
   {
      return SelectCarByIdQuery(DataContext, id);
   }
}

The query will now always use the same load options and we won't get any annoying exceptions. However the above code is a bit long winded. Do we really want to have a static DataLoadOptions instance, code somewhere to create them and a using statement each time we want to use a compiled query? Probably not. How can we improve it? Well the first step would be to have the using statement inside the compiled query function itself. But how? Well if I were a functional programmer then I'd probably do something like this:

public static Func<TDataContext, TArg1, TResult> Compile<TDataContext, TArg1, TResult>(Expression<Func<TDataContext, TArg1, TResult>> query, DataLoadOptions loadOptions)
   where TDataContext : DataContext
{
   var compiledQuery = CompiledQuery.Compile(query);

   return (dataContext, arg1) =>
   {
      using (dataContext.TemporarilyChangeLoadOptions(loadOptions))
      {
         return compiledQuery(dataContext, arg1);
      }
   };
}

The code above constructs a function for us. First it compiles the query we pass in using the normal CompiledQuery.Compile method. It then returns a new function that calls the compiled query function with the load options we have specified. If we put this function into a class called CompiledQueryWrapper then we could change our repository code to something like this:

private static readonly Func<DataContext, int, Car>
   SelectCarByIdQuery = CompiledQueryWrapper.Compile<DataContext, int, Car>(
      (dataContext, id) => 
         (from car in dataContext.Cars
          where
             car.Id == id
          select car)
         .SingleOrDefault(), 
      SelectCarByIdQueryLoadOptions);

private static readonly DataContext SelectCarByIdQueryLoadOptions;

static CarRepository()
{
   SelectCarByIdQueryLoadOptions = new DataLoadOptions();
   SelectCarByIdQueryLoadOptions.LoadWith<Person>(car => car.Owner);
}

public Car SelectCarById(int id)
{
   return SelectCarByIdQuery(DataContext, id);
}

Better, but not perfect. We still need a static field for our LoadOptions instance and code to construct it in the static constructor. Ideally want all the code that constructs our compiled query in one place, including the code to setup the load options. We can do that by getting more functional and adding an overload for our Compile method that accepts a function to make the necessary calls to LoadWith/AssociateWith:

public static Func<TDataContext, TArg1, TResult> Compile<TDataContext, TArg1, TResult>(Expression<Func<TDataContext, TArg1, TResult>> query, Action<DataLoadOptions> loadOptionsBuilder)
   where TDataContext : DataContext
{
   var loadOptions = new DataLoadOptions();
   loadOptionsBuilder(loadOptions);
   return Compile(query, loadOptions);
}

Our repository code can now be rewritten to:

private static readonly Func<DataContext, int, Car>
   SelectCarByIdQuery = CompiledQueryWrapper.Compile<DataContext, int, Car>(
      (dataContext, id) => 
         (from car in dataContext.Cars
          where
             car.Id == id
          select car)
         .SingleOrDefault(), 
       loadOptions => loadOptions.LoadWith<Person>(car => car.Owner));

public Car SelectCarById(int id)
{
   return SelectCarByIdQuery(DataContext, id);
}

Much tidier! The load options are firmly tied to the compiled query.

The only downside to this approach is that we have to create 15 overloads of our CompiledQueryWrapper.Compile method to match the 15 overloads of CompiledQuery.Compile in .NET 4.0... (.NET 3.5 is slightly less work as it has less overloads) However given the core of the methods will be the same we can use T4 to generate the overloads for us. Find a ZIP containing just such a template and the generated code here. The code it generates is a bit 'nicer' than the example code above as it includes parameter checking and some overloads to use the empty load options from the DataContextExtensions in the first part of this series. Feel free to download and use as you please in your own projects. (Just leave the link to this article in the comments please!)

In the next part of this series I will look at some of the drawbacks with using load options and how it's easy to accidently have hundreds of queries run if you're not careful.

Warning! One thing to bear in mind with compiled queries is that they should only be used for fetching data! If you use compiled queries to get entities and then try to update or delete them you will get random errors due to bugs in LINQ to SQL (3.5 at least; haven't tried with .NET 4.0 yet) that stops object tracking working correctly with compiled queries. You have been warned!

Improving LINQ To SQL Performance Part 1 - Changing Load Options

Welcome to the first part of a new series of blog posts where I discuss methods I've used to improve performance when using LINQ to SQL. For the first part I will be discussed changing load options.

If you've used LINQ to SQL then you'l have come across load options. Load options let you specify that you want certain child data loaded along with the parent entities when querying a DataContext. For example if you had a Car entity with an Owner relationship that points to a Person entity then you could do the following to load the Person at the same time as the Car:

var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Car>(car => car.Owner);
myDataContext.LoadOptions = loadOptions;

You can do lots more with them such as only load certain sets of data, etc. I won't bore you with details you probably already know and can find out elsewhere on the interweb if not.

Now one of the problems with load options is that you can't change them for a DataContext instance once you've performed a query with that DataContext. The reason you're not allowed to change the options is to maintain a consistent view of the data, which is fair enough but can be a bit of a pain at times. Often when you come to improve the performance of your LINQ to SQL code you'll find one or two cases where you should use slightly different load options. I'm going to neatly sidestep the ethical debate of whether you should be changing the load options for a DataContext or not and just explain how to do it.

A quick peek under the hood of the DataContext class with Reflector shows that you just need to change the value of the private loadOptions field on your DataContext to change the load options. Simple enough! On top of that you should really call the private Freeze method on the LoadOptions to stop them being changed after being attached to the DataContext. I guess you could miss this call out if you wanted to be able to change the LoadOptions instance, however I think it's probably easier (and safer!) to just set the value again with some new load options. Especially if using compiled queries which are very picky about their load options. (See the next part of this series for more about compiled queries)

How do we change the value of the loadOptions field? Well we could use reflection but it's a little slow. A much better way is to use the new Expression.Assign method in .NET 4 that allow us to create a compiled assignment expression to set the private field. Apart from the initial hit of the compilation using assignment expressions is much, much faster than using reflection. The code to create such a function looks something like this:

// Define expressions for the two parameters.
var dataContext = Expression.Parameter(typeof(DataContext), "dataContext");
var loadOptions = Expression.Parameter(typeof(DataLoadOptions), "loadOptions");

// Define an expression to access the private field.
var field = Expression.Field(dataContext, "loadOptions");

// Define an expression to assign a value to the field. (This is the .NET 4 only bit)
var assign = Expression.Assign(field, loadOptions);

// Build a lambda for the assignment.
var lambda = Expression.Lambda<Action<DataContext, DataLoadOptions>>(assign, dataContext, loadOptions);

// Compile it.
Action<DataContext, DataLoadOptions> changeLoadOptions = lambda.Compile();

If you don't have the luxury of being able to use .NET 4 yet then you can always fall back to using reflection instead. A compiled expression can also be used to call the private Freeze method on the LoadOptions; that can be done in both .NET 3.5 and .NET 4.

Once we have these basic building blocks we can create a set of extension methods that allow us to change load options. Which is exactly what I've done for you here. Feel free to download and use as you please in your own projects. (Just leave the link to this article in the comments please!)

There are extension methods to change or remove (i.e. replace with an empty set) the load options for a data context:

// Change the load options.
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Car>(car => car.Owner);
myDataContext.ChangeLoadOptions(loadOptions);

// Remove the load options.
myDataContext.RemoveLoadOptions();

There are also methods to temporarily change or remove the load options:

// Remove the load options for the duration of the using block.
using (myDataContext.TemporarilyRemoveLoadOptions())
{
   // Queries performed here will not use any load options.
}
// Load options will have been restored at this point.

I recommend you use these temporary methods to change your load options as then you can be quite clear in your code that you're only changing the options for a small period of time, e.g. one query that you need to optimise. Changing the load options willy nilly can easily lead to you forgetting to restore load options and having the wrong load options some of the time, i.e. everything the LINQ to SQL designers stopped happening by not allowing you to change the load options.

In the next part of this series I'll look at compiled queries and how you can use the extension methods above to make them much less of a pain to use.

A Data Annotations Aware Model Binder

The System.ComponentModel.DataAnnotations assembly added with .NET 3.5 SP1 is quite useful. It contains various attributes that allow you to specify various constraints on the properties of your class. Ideal for model classes; constraints on the data are kept inside your model, nice and succinctly. However ASP.NET MVC version 1 does not contain out of the box server side support for them. Version 2 will but I can't wait that long... Whilst you can use something like the excellent xVal for client side validation of data annotations being a good programmer you'll also want to have the same validation server side in case someone turns off JavaScript in their browser.

Sadly I haven't been able to find a good example of how to do this. The examples I have found tend to require manual calls to some method to perform the validation, such as this one in Steve Sanderson's blog about how to use xVal. I don't want to have to do an extra manual call for my validation; I might forget! The ideal would be a model binder that is aware of the data annotations; I could set it as the default binder and forget about it. The only example of a model binder I could find was in the ASP.NET site on CodePlex but that required a newer version of the DataAnnotations assembly (basically the .NET 4.0 one) which seemed like overkill to me. How hard can it be to write one?

Turns out not very... All you have to do is:

  1. Create a class that inherits from DefaultModelBinder.
  2. Override the OnPropertyValidating method. Your version should call the base method to get the errors from the default binder then look for any ValidationAttributes on the property, check if there are errors by calling the IsValid method and add them to the ModelState if there are. Something like this:
    protected override bool OnPropertyValidating(ControllerContext controllerContext, ModelBindingContext bindingContext, PropertyDescriptor propertyDescriptor, object value)
    {
    	// Run everything by the default implementation first before checking data annotations.
    	var isValid = base.OnPropertyValidating(controllerContext, bindingContext, propertyDescriptor, value);
    	
    	// Loop through any validation attributes.
    	foreach (var validationAttribute in propertyDescriptor.Attributes.OfType<ValidationAttribute>())
    	{
    		// Are we valid?
    		if (!validationAttribute.IsValid(value))
    		{
    			// No. Let's add the error to the model state and set our isValid status to false.
    			string key = CreateSubPropertyName(bindingContext.ModelName, propertyDescriptor.Name);
    			
    			// Add the error to the model state.
    			bindingContext.ModelState.AddModelError(key, validationAttribute.FormatErrorMessage(propertyDescriptor.DisplayName)));
    			isValid = false;
    		}
    	}
    	return isValid;
    }
  3. Replace the default model binder with an instance of your new one via the ModelBinders.Binders.DefaultBinder property.

Okay, it's not quite as simple as that. You also have to take into account any MetadataTypeAttributes that might be found on your model class. This attribute allows you to specify another type to put your metadata on. You can specify identical properties on this other type and decorate them with the DataAnnotations attributes instead. Pointless? Well yes, it should be really; why not just decorate your properties? However it comes in useful if you're working with auto-generated code that doesn't allow you to add attributes, such as the evil and crappy LINQ to SQL designer. By using the MetadataTypeAttribute you can create a partial version of your LINQ to SQL entity class, add the attribute to that and specify constraints for the auto generated properties in some other class. (Whether you should be using LINQ to SQL entities as your model is an ethical debate I am going nowhere near)

If you can't be bothered to do all this of course then please feel free to download one I made earlier. This binder takes into account MetadataTypeAttributes and I've even thrown in a demo project so you can see the thing in action. Enjoy!

The Order Of Things

Code that relies on the order of things generally annoys me. It tends to be weak, fragile code because someone somewhere will come along and change the order. Most of the time nothing to bad will happen. Names in a drop down will no longer be alphabetical, that sort of thing. But I've seen code 'oh crap yellow screen of death' break due to people relying on the order of data. The code they'd written depended on the order of items coming out of the database, however they didn't have an ORDER BY clause in their SQL. So one day someone inserted a record in the table, the data came back differently, their code couldn't handled that and a lovely NullReferenceException graced the front page of the website. Not good. If your code must depend on order then your code should make sure things are in order!

I found order behaviour in the wild yesterday, in the RSS Best Practices Profile. Whilst using the W3C's Feed Validator to check the markup of this site's RSS was up to scratch I came across the recommendation that 'all item elements should appear after all of the other elements in a channel.' Why? Perhaps it can be used as an optimisation for some RSS parsers, although if that's the case it should probably be compulsory rather than a recommendation because then something will depend on that fact and possibly break if that isn't the case in some feed. The only valid reason I can think of is that it makes reader the XML by hand slightly easier. If anyone knows the real answer please drop me a line; it's annoying the hell out of me...

Valid XHTML In Redirects

I've been using the Search Engine Optimization Toolkit add-in for IIS 7 to analyse this website. If you've not used it before I recommend you give it a try; it's very good for picking up random bugs, spurious URL routing and dodgy links that you'd never have spotted otherwise. For example it would report issues on this site whereby the same content was accessible by two different URLs. Turns out some links would have a trailing slash, some not. (Mainly due to ASP.NET MVC's ActionLink being rubbish and missing off trailing slashes. Why does it not return links in the format specified by the route???) The SEO toolkit picked them up and now they're hopefully all fixed.

One set of problems I had trouble fixing was the root URL to this site, http://www.mrkwatkins.co.uk/. Currently I only have a blog so that URL redirects to the root URL of my blog, http://www.mrkwatkins.co.uk/Blog/. However the SEO Toolkit would report errors with the content returned from the redirect. For example it would tell me that there was no <h1> tag in the content. Now whether or not the toolkit should be analysing the content returned from a redirect or not I have no idea. What I do know is that they were the last errors on my report and needed removing!

The redirect was done using the standard RedirectResult in ASP.NET MVC. This in turn uses the standard Response.Redirect from ASP.NET. Trouble is that Response.Redirect spits out the following content as well as the 302 status code and location HTTP header:

<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a href="http://www.mrkwatkins.co.uk/Blog/">here</a>.</h2>
</body></html>

So to fix the problems we need to change this XHTML. Turns out that is pretty simple to do. You can return whatever content you like from the request as normal. All you need to do is set the status code to 302 using the Response.StatusCode property and set the Location HTTP header using the Response.RedirectLocation property. I've wrapped all this up in a custom ViewResult class:

using System.Web;
using System.Web.Mvc;
using KWatkins.Validation;

namespace KWatkins.MrKWatkins.Web.Mvc
{
    /// <summary>
    /// A <see cref="ViewResult" /> that redirects the user to another location; allows
    /// you to customize the XHTML returned by the redirection rather than use the
    /// standard (invalid XHTML) retured by <see cref="HttpResponse.Redirect(string)" />.
    /// </summary>
    /// <remarks>
    /// The redirect location is added to the <see cref="ViewRedirectResult.ViewData" />
    /// with the key specified by <see cref="ViewRedirectResult.RedirectLocationViewDataKey" />.
    /// </remarks>
    public sealed class ViewRedirectResult : ViewResult
    {
        /// <summary>
        /// The key used to store the <see cref="RedirectLocation" /> in the
        /// <see cref="ViewRedirectResult.ViewData" />.
        /// </summary>
        public const string RedirectLocationViewDataKey = "RedirectLocation";

        private readonly string redirectLocation;

        /// <summary>
        /// Initializes a new instance of the <see cref="ViewRedirectResult"/> class.
        /// </summary>
        /// <param name="redirectLocation">The redirect location.</param>
        public ViewRedirectResult(string redirectLocation)
        {
            Validate.Argument(redirectLocation, "redirectLocation").IsNotNull().IsNotEmpty();

            this.redirectLocation = redirectLocation;
            ViewData[RedirectLocationViewDataKey] = redirectLocation;
        }

        /// <summary>
        /// When called by the action invoker, renders the view to the response.
        /// </summary>
        /// <param name="context">The context within which the result is executed.</param>
        public override void ExecuteResult(ControllerContext context)
        {
            context.HttpContext.Response.StatusCode = 302;
            context.HttpContext.Response.RedirectLocation = redirectLocation;
            base.ExecuteResult(context);
        }

        /// <summary>
        /// Gets the redirect location.
        /// </summary>
        /// <value>The redirect location.</value>
        public string RedirectLocation
        {
            get
            {
                return redirectLocation;
            }
        }
    }
}

You can then setup a new Redirect method in your base controller to create a ViewRedirectResult for you:

/// <summary>
/// Returns a <see cref="ViewRedirectResult" /> that redirects to the specified URL.
/// </summary>
/// <param name="url">The URL to redirect to.</param>
/// <returns>The <see cref="ViewRedirectResult" /> that redirects to the specified URL.</returns>
protected static new ViewRedirectResult Redirect(string url)
{
    var result = new ViewRedirectResult(url)
                 {
                     ViewName = "Redirect"
                 };
    return result;
}

All that remains is to create the Redirect view. We need this to be valid XHTML with a few extras to keep the SEO Toolkit happy; we need a content type, a description and a <h1>:

<%@ Page Language="C#" Inherits="System.Web.Mvc.ViewPage" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>MrKWatkins - Object Moved</title>
        <meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
        <meta name="description" content="This object has been moved." />
    </head>
    <body>
        <div>
            <h1>Object Moved</h1>
            <p>
                Object moved to <a href="<%= HttpUtility.HtmlAttributeEncode((string)ViewData["RedirectLocation"]) %>"><%= ViewData["RedirectLocation"] %></a>.
            </p>
        </div>
    </body>
</html>

Et voila. The content of your redirects is now valid XHTML and the spurious warnings from your SEO Toolkit report go away.