So that's it for fluent argument validation for the time being. I've got enough done to spit out version one, which is more than enough for my current needs. Afraid you don't get the source just yet... I plan to host the source on CodePlex eventually but I might just wait for version two for that...
"Version two?" I hear you say. "But what is wrong with version one?" Nothing, really. It suits me quite well. However whilst writing these articles (I'd actually written the library several months before...) I've thought of several ways to improve the thing...
I touched on this one in part three. Currently the following is perfectly acceptible:
public static void SomeMethod(int number)
{
Validate.Argument(number, "number")
.IsGreaterThan(5)
.IsGreaterThan(3)
.IsGreaterThan(3);
Now are you ever going to validate a number is greater than another number twice? Surely you'd just validate it against the higher of the two. Therefore should I allow more than one call to IsGreaterThan in a validation chain?
Okay, you might be validating against variables and therefore have to validate against two variables to check they are greater than both. But in this case I think we need a different method anyways because one would want the exception method to express that the argument has to be greater than the variable rather than the current value of the variable. For example the following code:
public int SomeProperty
{
get
{
return 5;
}
}
public static void SomeMethod(int number)
{
Validate.Argument(number, "number")
.IsGreaterThan(SomeProperty);
would give the error message "Value must be greater than 5." A more useful message would be "Value must be greater than SomeProperty.". So perhaps we need another set of methods that allows you to specify something (other than a constant) to validate against?
The English made by reading the Fluent validation could be a bit better. For example the following:
Validate.Argument(value, "value")
.IsNotNull()
.IsNotEmpty()
.StartsWith("Something");
would sound better as:
Validate.Argument(value, "value")
.IsNotNull()
.OrEmpty()
.AndStartsWith("Something");
Unless you specify a tolerance then currently the validation methods for imprecise numbers don't use a tolerance which basically means they're not very useful; in general they will fail for some valid cases. NUnit's constraint model, which is a great example of a good fluent interface, has a default tolerance to use in imprecise comparisons. I think my library should do that too...
I'd like to add more validation than just arguments, e.g. type parameter validation, IO validation, XML validation, etc..
I'm sure I'll think of lots more for version two. But in the meantime version one is finished. Have a look at parts one, two, three and four if you haven't already. Feel free to download the library and use it in your own projects. If you do find the library useful then please ping me an email with details of the project you used it on and any ideas you have for improving it. And if you use it to make loads of cash then feel free to donate. No, in fact, feel compelled.
I've been creating a fluent argument validation library; see parts one, two and three if you need to catch up.
Generics are brilliant. Wonderful things. But there are a couple of things about them that annoy me. For example, why can't I do the following?
public class SomeClass
{
public void SomeMethod<T>(T value) where T : class...
public void SomeMethod<T>(T value) where T : struct...
}
I.e. use a different method overload depending on whether my type is a reference or a value type? A type cannot be both a reference and value type at the same time so surely we have two different method signatures? Yes, I know there are good reasons why not. But still... I want to be able to do this!
Another one that annoys me (and is relevant to this post!) is not being able to to specify my type must be an enum, i.e. something like:
public void Argument<T>(T value) where T : enum
I want to be able to create a validation method that checks if an enum value is defined. But that the same time I don't want to have people pass any value type they like in. Whilst I can quite happily check at runtime that a type is an enum it just feels like something I should be doing at compile time...
Of course what makes this really annoying is that you can specify an enum as a type constraint in IL! Whilst I'm quite happy for C# to restrict me in some ways (e.g. in IL you can throw an object of any type whereas C# restricts it to Exception and sub-classes) this one I'm not happy about. Mainly because it affects me personally, I'll admit. But is that not a good enough reason?
Guess I could rewrite the thing in a language that does allow it. Nah. I like C#. I could write my own language that is an extension of C#. Whilst I'd like to do that one day I think it might be overkill for this... What if I decompiled the assembly the C# compiler generated and changed the offending line? That might work... C# would have to respect the type constraint even if it cannot be expressed in C#. Let's try that...
Turns out it works too. Almost. Turns out I bump into my first generic annoyance, that of methods with the same signature but different type constraints. So I cannot do the following pseudo-C#:
public class Validate
{
public void Argument<T>(T argument, string parameterName) where T : class...
public void Argument<T>(T argument, string parameterName) where T : enum...
}
Guess I'll just have to live with that one and create an EnumArgument method on my Validate class. So what IL do I need to rewrite? Well firstly I'll have to rewrite the initial generic method on my Validate class. Luckily IL method signatures are fairly readable. The C# method signature:
public static IEnum<T> EnumArgument<T>(T argument, string parameterName) where T : struct
Becomes:
.method public hidebysig static class KWatkins.Validation.Argument.IEnum`1<!!T>
EnumArgument<valuetype .ctor ([mscorlib]System.ValueType) T>(!!T argument, string parameterName) cil managed
The type constraints are quite complex compared to C#. The type has to be a value type, has to have a parameterless constructor and has to inherit from System.ValueType. C# hides the second two things from us as in C# we can infer that if something is a value type then it must have a parameterless constructor and inherit from System.ValueType. Anyways its simple enough to change this to have an enum type constraint:
.method public hidebysig static class KWatkins.Validation.Argument.IEnum`1<!!T>
EnumArgument<valuetype .ctor ([mscorlib]System.Enum) T>(!!T argument, string parameterName) cil managed
Done. Change ValueType to Enum and walk away. Nice and simple. I also have to rewrite the IL on the class that actually does the validation but that's basically the same replacement as above.
How do I rewrite the IL though? Luckily .NET comes with a tool called ildasm that decompiles any given assembly into IL. And then Visual Studio comes with a tool called ilasm that compiles IL into an assembly. Therefore I can create a command line application that can run as a post build command for my validation assembly. It can call ildasm to disassembly the assembly, do some crude string replacement on the IL and then pass it back into ilasm to turn the thing back into an assembly.
And that's that. I now have a Validate.Argument<T> method that will only accept an enum for it's type parameter. And that's pretty much it for the vaguely interesting stuff in my argument validation library too. So I'll wrap things up for version one of the library in the next post.
In parts one and two I discussed the approach I'm going to make to create a fluent argument validation library. Now the implementation.
Well it turns out most of it is pretty straightforward. And a little boring. There are a couple of things that warrent comment, the first being numbers. There are a lot of number types in .NET. There are precise types; byte, 3 sizes of int and decimal. Plus of course their evil partners the unsigned ints and the signed byte. And then on top of that you have imprecise types, the single and double. For the sake of completeness I'm going to add validation methods for all of them. Which means a lot of code to write; numeric types don't have a common interface we could exploit, or anything like type classes we can operate on. How can we be lazy and write less code?
By auto-generating it of course. One of the best kept secrets in Visual Studio is the Text Template Transformation Toolkit, or just T4 for short. T4 is a code generator that builds code from templates. I first heard about it in a blog post by Scott Hanselman a while ago and it's nice to finally have a chance to use it!
But before we get generating, what sort of interfaces/methods will we need? Basic range checking, greater than, less than, etc., will be needed for all number types. I find myself having to validate numbers are positive a fair bit so I'm going to add quick methods to check for positive and negative. These are only necessary for the signed types; there is no point in checking to see if a uint is negative because it cannot be. It sounds like we immediately have two interfaces, one for signed types and one for unsigned. Both contain greater/less methods but the signed one also contains positive/negative methods
As discussed in part one one of the problems I had with some of the other fluent interfaces available is that they allow you to make non-sensical and/or pointless calls, e.g:
public static void SomeMethod(int number)
{
Validate.Argument(number, "number")
.IsPositive()
.IsNegative()
.IsNegative();
A number cannot be both positive and negative, and there is no point in checking a number is negative twice. Therefore my inteface needs to return the interface for validating unsigned types after a call to positive or negative. (Even for singles and doubles! So if I talk about unsigned doubles I am making sense. Honest.) Ideally we would also like to stop calls such as:
public static void SomeOtherMethod(int number)
{
Validate.Argument(number, "number")
.IsGreaterThan(5)
.IsGreaterThan(3)
.IsGreaterThan(3);
There is no point in checking if something is over both 3 and 5 as just checking 5 would be enough. Similarly checking 3 twice is not necessary. Unlike the positive/negative we cannot enforce this at compile time so I'm going to have to let this one slide. (Although I will come back to this in part five)
The only other methods I'm going to implement are for the imprecise types only. These will be overloads of all the methods so far but with a tolerance to allow us to check two imprecise numbers follow the condition within a given amount.
Now how does T4 come into play? Well I can define generic interfaces for all the methods but I will need concrete classes for each numeric type. These concrete classes basically have the same implementation (operator comparisons such as >, <=, etc.) so instead of duplicating the same code over and over I'll create a template that generates the methods for each numeric type. If we can define the various properties of our numeric types (e.g. signed/unsigned, precise/imprecise, etc.) then our template can query those definitions to work out what interfaces and methods need implemented.
Definitions of the numeric types could come in handy in future templates. Therefore I defined the details of the numeric types in a file called NumericTypeDefinitions.ttinclude, which looks something like:
<#
var numericTypes = new []
{
new
{
Name = "Byte",
Zero = "0",
CLSCompliant = true,
Signed = false,
Precise = true
},
new
{
Name = "Decimal",
Zero = "0m",
CLSCompliant = true,
Signed = true,
Precise = true
},
...
Sadly the syntax highlighting is mine; rather annoyingly VS2008 doesn't highlight .tt or .ttinclude files... We can include this file in our actual T4 template. Basic templates just have a blob of code. The include means our definitions will be added to this blob. (You can create custom generator classes for more complex templates) Our T4 file will look something like this: (the ASP.NETters amongst you will probably recognise some of the syntax)
<#@ template language="C#v3.5" #>
<#@ output extension="cs" #>
<#@ assembly name="System.Core.dll" #>
<#@ import namespace="System.Linq" #>
<#@ include file="NumericTypeDefinitions.ttinclude" #>
using System;
using KWatkins.Validation.Argument;
namespace KWatkins.Validation
{
public static partial class Validate
{
<#
// Partial part of Validate first.
foreach(var type in numericTypes.Where(t => t.Precise))
{
if (!type.CLSCompliant)
{
#>
[CLSCompliant(false)]
<#
}
#>
public static <#= GetInterfaceType(type.Signed, type.Name) #> Argument(<#= type.Name #> argument, string parameterName)
{
return new <#= type.Name #>Validation(argument, parameterName);
}
<#
}
#>
}
}
...
What's happening here? Well we're basically adding the method Argument(T argument, string parameterName) to our Validate class. The directives at the start of the template define what we need for the code in our template, e.g. the language our template uses, the assemblies and namespaces we want to use, etc. Then the template starts; the text in grey is the text that will appear in the generated .cs file. In the snippet above I am looping through all the precise numeric types. If they are not CLS compliant then I add a CLSCompliantAttribute to the method. I then spit out the method definition, which returns a validation class of the appropriate type, casted to an interface of the appropriate type. I'll spare you the rest of the details; you can have a look at the source when I've finished.
Now the tests! How on Earth are we gonna test generated code? We could I suppose test the generator itself, i.e. test the output created by our T4 template is correct. But this doesn't actually test our final validation code! We could write a test for each generated method. But that would take ages besides being a bit rubbish. We could auto-generate our tests using a T4 template. Not a bad idea, but I personally prefer to have my test code as different to my code being tested as possible. So how to avoid writing a thousand separate tests for all our generated methods?
Use NUnit 2.5's excellent new generic test fixtures! NUnit 2.5 allows you to use generic classes for your test fixtures. You can then specify multiple TestFixtureAttributes each with a different generic type. The fixture will be executed multiple times, one for each TestFixtureAttribute. Add a splash of reflection to create our T4 generated validation class and we can write test fixtures like this:
[TestFixture(typeof(Byte))]
[TestFixture(typeof(UInt16))]
[TestFixture(typeof(UInt32))]
[TestFixture(typeof(UInt64))]
public class UnsignedPreciseNumberTests<TNumber> : NumberTests<TNumber>
where TNumber : struct
{
[TestCase("5", "TestParameterName", "10",
ExpectedException = typeof(ArgumentOutOfRangeException),
ExpectedMessage = "Value must be greater than 10.\r\nParameter name: TestParameterName\r\nActual value was 5.")]
[TestCase("5", "TestParameterName", "5",
ExpectedException = typeof(ArgumentOutOfRangeException),
ExpectedMessage = "Value must be greater than 5.\r\nParameter name: TestParameterName\r\nActual value was 5.")]
[TestCase("5", "TestParameterName", "0")]
public void IsGreaterThan(string argument, string parameterName, string minimum)
{
IUnsignedPreciseNumber<TNumber> validator = IUnsignedPreciseNumber<TNumber>)CreateValidationType(argument, parameterName);
validator.IsGreaterThan(ConvertFromString(minimum));
}
...
Here we are testing the IsGreaterThan method for the unsigned, precise number types. The test method gets an instance of the validation type using the CreateValidationType which does some reflective magic to create the correct type. It then calls the method using the values passed into the method via NUnit 2.5's TestCaseAttribute which allows you to run the same test multiple times with different values.
So that is the numbers done. I urge you to have a play with T4 and NUnit 2.5; both are fantastic tools to have in your arsenal. In the next post I will get irritated with enumerations...
As covered in part one I have decided to create a fluent library for validating arguments as I wasn't happy with ones currently available. But what exactly do I want?
.Check() or .Validate() method)Which gives a syntax something like this:
public static void SomeMethod(string stringValue, int intValue)
{
Validate.Argument(stringValue, "stringValue")
.IsNotNull()
.IsNotEmpty();
Validate.Argument(intValue, "intValue")
.IsPositive()
.IsLessThanOrEqualTo(5);
Why have I created a class called Validate with a property called Argument rather than a class called ValidateArgument? This is so I can expand the library to validate other things apart from arguments, all from the same starting point. For example I could expand it to do things like:
public static void SomeOtherMethod<TEnum>(TEnum enumValue, string directory) where TEnum : struct
{
Validate.TypeParameter(typeof(TEnum), "TEnum")
.IsEnum();
Validate.Directory(directory)
.Exists()
.IsEmpty();
So, how am I going to actually code this thing? Well if I'm going to chain these validations together then I will need to store the argument and parameter name somewhere. There are a two obvious ways to do this:
IsNotNull() in a row for example. Therefore we are either going to need to create a new object when the methods in the chain change or create one object that implements multiple interfaces explicitly and then cast it to a different interface when then methods in the chain change. And should our objects be structs or classes?Which way is best? Are any methods intrinsically worse for some reason? Performance is a good reason. Whilst I'm a firm believer that you should get you application working and performance enhance after (slow code is better than no code!) I also think prototyping is a good idea. If we can do a simple test at this stage then we might well find out that one approach is fundamentally slower than the others.
So I've knocked together a simple application to test each method and see which is quickest; feel free to play with the source code. It measures how long it takes to validate a string argument is not not null and not empty 25,000,000 times, compared against two not so fluent methods. On my PC the results are: (times in seconds)
| Method | Description | Debug | Release |
|---|---|---|---|
| Not as fluent, single method | Does both validations in a single call, i.e. IsNotNullOrEmpty(argument, parameterName); |
1.74 | 0.39 |
| Not as fluent, separate methods | Does one method call for each validation, passing in argument and parameter name to each method. | 2.66 | 0.69 |
| Create multiple classes | Creates a new class each time the methods in the chain change. | 6.05 | 2.47 |
| Create single class | Creates a single class and casts it to a new interface each time the methods in the chain change. | 3.00 | 1.00 |
| Create multiple structs | Creates a new struct each time the methods in the chain change. | 3.70 | 1.75 |
| Create single struct | Creates a single struct and casts it to a new interface each time the methods in the chain change. | 3.73 | 1.86 |
| ThreadStatic store | Stores the argument and parameter name in fields marked with the ThreadStaticAttribute; uses extension methods to build the chain. | 7.32 | 4.01 |
| ThreadLocal store | As above but uses a named data slot on the current thread to store the argument and parameter name. | 80.97 | 80.08 |
| ContextStatic store | Stores the argument and parameter name in fields marked with the ContextStaticAttribute. Not at all suitable for the job... It's here for no other reason than I'd not heard of this attribute until recently and wanted to see how it performed... | 50.25 | 51.93 |
So it looks like the single class approach wins. A few things to note though:
So I have an approach. In the next post I'll start to look at implementing it.
I like fluent interfaces. And I'm very pedantic about validating arguments. So I went looking for a fluent interface for validating arguments. And I found a few, but none were quite what I'm looking for. So yup, I decide to write my own.
Before explaining how I did that though I would like to mention a couple of those and explain why they are not what I'm after; hopefully that will help to why I felt the need to write my own.
An excellent article on the Paint.NET blog gives details of one approach. Whilst taking care to optimise for the non-exceptional path (no objects are created unless validation fails) it didn't feel fluent enough for me. An example of the interface given is:
public static void Copy<T>(T[] dst, long dstOffset, T[] src, long srcOffset, long length)
{
Validate.Begin()
.IsNotNull(dst, "dst")
.IsNotNull(src, "src")
.Check()
.IsPositive(length)
.IsIndexInRange(dst, dstOffset, "dstOffset")
.IsIndexInRange(dst, dstOffset + length, "dstOffset + length")
.IsIndexInRange(src, srcOffset, "srcOffset")
.IsIndexInRange(src, srcOffset + length, "srcOffset + length")
.Check();
For me a fluent interface should allow you to specify something, and then let you chain a sequence of operations on that something. jQuery is the an excellent example of a good fluent interface; you specify an element and then perform a chain of actions on that element. For argument validation I would expect us to specify an argument and then chain a sequence of validations on that element. Which is why I don't feel this interface is fluent enough for me. It does not specify an argument at the start, but rather it specifies the arguments each time. Consider the following example of the above style:
public void DoSomething(int? argument)
{
// Check argument isn't null and is between 0 and 100.
Validate.Begin()
.IsNotNull(argument, "argument")
.IsGreaterThanOrEqualTo(argument, "argument", 0)
.IsLessThanOrEqualTo(argument, "argument", 100)
.Check();
Doesn't seem much point to me in chaining the methods if we keep having to specify the argument and parameter name each time.
Of course there are valid reasons for this; if we don't specify the argument and parameter name each time then we are going to have to store them somewhere and that means a tiny performance hit; you're gonna need to set a field or create and object or do something to store that information. For an application like Paint.NET that needs every ounce of performance that could well be a problem.
For my interface I'm willing to take a tiny performance hit to get the interface I want. If I have to create a short lived object or set a field then I'm not too fussed about that to be honest. I find there are a lot of people who would moan about such things. Sometimes for good reason. But more often they'll complain that I should use a byte instead of an int whilst at the same time think that boxing is a sport and an expression tree is a plant that smiles. If I've optimised every other area of my application and actually need to get a bit of extra performance then I'll be a happy man. And I'll just drop the argument check...
Roger Alsing has a good approach. Much more fluent; we only specify the argument once:
public static string ValidationFunc(int a, string b, DateTime c)
{
a.Require("a")
.IsGreaterThan(10);
b.Require("b")
.NotNull()
.NotEmpty()
.LongerThan(2)
.StartsWith("Ro");
I have two main problems with this approach:
a.Require()) so it can be used as a general validation method. But I would prefer to be explicit about the fact that we're validating an argument.b.Require("b").NotNull().NotNull().NotNull().NotNull().NotNull().NotNull();
a.Require("a").IsGreaterThan(4).Require("b").NotNull();.NET Junkie outlines a similar approach. The article talks about the relationship to Spec# which gives some more info on why the interface is created like it is. I just want to validate my arguments in a nice fluent way which are my pre-conditions. As for post-conditions my method code should be doing that!
Awesome but not released yet. 8o)
In the next post I'll start to plan out what I want from the fluent interface and start thinking about the best way to get it.