September 2008 Entries
CompiledQuery and Enumerating Query Results

More fun with compiled queries, this time when processing the results.  Firstly, the simple non-compiled query:

using (DataClasses1DataContext context = new DataClasses1DataContext())
{
   var results = context.Employees.Where(e => e.EmployeeID == 1);
 
   Console.WriteLine("Number of employees: {0}", results.Count());
   Console.WriteLine("First ID: {0}", results.First().EmployeeID);
}

Simple stuff, and it works as you'd expect.  We get both the number of employees and the Id of the first one.  Note that under the covers, SQL gets executed twice.  Perhaps not quite what you'd expect ;)

Here's the compiled version:

var compiledQuery = CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(e => e.EmployeeID == 1));
 
using (DataClasses1DataContext context = new DataClasses1DataContext())
{
   var results = compiledQuery(context);
 
   Console.WriteLine("Number of employees: {0}", results.Count());
   Console.WriteLine("First ID: {0}", results.First().EmployeeID);
}

 

This one doesn't do what you'd expect. On the second Console.WriteLine(), instead of executing a suitable query, you get an InvalidOperationException saying that "The query results cannot be enumerated more than once".  It's pretty clear what it means, and the fix is simple - make sure you only enumerate the results once, using something like the ToList() method:

var results = compiledQuery(context).ToList();

With this, you only hit the DB once, and you can then look at the list of results as much as you like. 

The difference between the non-compiled and the compiled query is down to how the two different queries are processed.  In the non-compiled version, there's an expression tree floating around, which is lazily evaluated when the results are enumerated.  Because LINQ to SQL has the expression tree available, it can generate different SQL on each hit, so results.Count() generates a "select count(*) ..." statement and results.First() generates a "select top 1 ..." statement.

For compiled queries, the SQL is determined at the point you call CompiledQuery.Compile().  When the results are enumerated the first time, this pre-prepared SQL is executed and the results processed.  Note that since the SQL is already built, the thing that you are doing with the results doesn't influence the SQL.  So the call to "results.Count()" will execute a select of the entire dataset, which will then get enumerated and counted in the client.

Since repeatedly issuing the same SQL is unlikely to be what you want your app to be doing, the designers of LINQ to SQL quite wisely throw an exception if you try to do so.  Instead, you need to stick in the explicit ToList() to make it clear that you understand the behaviour.

At first glance it seems a shame that you can't just swap normal queries & compiled queries but hopefully you can see that the semantics are quite different between the two, at which point having different client code which respects these differences is more acceptable.

Last point - before you start getting excited about the number of SQL queries that the non-compiled version may be executing, and start scattering ToList() calls everywhere, make sure that you understand the consequences.  For the code above, ToList() is going to be a good thing, but if you consider a scenario where the Where() clause doesn't identity a single entry but instead hits perhaps many thousands of rows, then it may not be so good.  The first time we look at the results, we just want the count which gets mapped to count() in SQL, and on the second time the First() method gets mapped to a TOP 1 clause in the SQL.  Although it would mean two hits on the database, it would likely be far better to do that than to bring back thousands of records for processing in the client.

As with most abstraction layers, LINQ offers a lot of benefits but it cannot be used without considered thought as to what is happening underneath.  I thoroughly recommend that when testing your LINQ queries you have the SQL profiler running so that you can see what's going on.  Also, don't forget the DataContext.Log property which lets you dump the SQL out to a TextWriter.  Using this, it would be quite possible to check within your unit tests that the DB interaction is running the way you expect, and also to spot when changes cause unexpected interactions.

Passing Predicates into Compiled Queries

I've recently been looking at generating LINQ predicates on the fly in a mapping layer between a set of business domain entities and a set of related, but different, database entities.  One of the problems that I've encountered is to do with the way in which predicates are handled when using CompiledQueries.

To start off, lets consider the easy non-compiled version.  Here's a method:

// Get an employee by a predicate
static void GetEmployee(Expression<Func<Employee, bool>> predicate)
{
   // Just perform the select, and output the results
   using (DataClasses1DataContext context = new DataClasses1DataContext())
   {
      var results = context.Employees.Where(predicate);
 
      Console.WriteLine("Number of employees: {0}", results.Count());
   }
}

The usage of this is nice and simple, and the sort of thing in LINQ examples all over the web:

GetEmployee(e => e.EmployeeID == 1);

Calling this does exactly what you'd expect.  My next step was to look at how this approach could be used with compiled queries.  I started with a simple method:

// Get an employee by a predicate, using a compiled expression
static void GetEmployeeCompiled(Expression<Func<Employee, bool>> predicate)
{
   // Compile the query
   var compiledQuery =
      CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(predicate));
 
   // and using the compiled query, output the results.  This crashes :(
   using (DataClasses1DataContext context = new DataClasses1DataContext())
   {
      var results = compiledQuery(context);
 
      Console.WriteLine("Number of employees: {0}", results.Count());
   }
}

Obviously, this is pointless since it just recompiles the query every time.  But let's ignore that small fact - it should, after all, still work.  Alas, it doesn't. 

At the point where the results are enumerated, it explodes with a "NotSupportedException".  Specifically, it fails due to an "Unsupported overload used for query operator 'Where'.".  Looking at the expression tree that is being compiled, in conjunction with some help from Reflector to look at what LINQ is doing under the cover, it can be seen that the issue is down to how the predicate is included in the final query expression. 

Remember, the compiler is not generating executable code here, it is just building a lambda expression.  When it sees the parameter to the Where() method, it has little choice but to be "lift" this variable its own class, and it is a property on this lifted class that is passed as a parameter to the Where() method.  Although the non-compiled version handles this just fine, it causes the CompiledQuery object to barf.  This is just the same as any other query that uses a local variable or parameter.

I've experimented with a number of ways of constructing the query that I'm trying to compile, but all ultimately end up with the same problem.  The solution I've found is a little nasty, but it does work.  If I've missed a cleaner way, then I'd love to hear about it!

Anyhow, the solution.  It is based on the fact that it is the act of "passing" the predicate into the Where() method that is the problem.  So the solution is to not pass in the predicate, but instead pass in some dummy predicate.  Then do some expression tree walking to swap out the dummy predicate for the real one.  The code looks like this:

// Get an employee by a predicate, using a compiled expression
static void GetEmployeeCompiled2(Expression<Func<Employee, bool>> predicate)
{
   // Setup the required query, using a dummy predicate (c => true)
   Expression<Func<DataClasses1DataContext, IEnumerable<Employee>>> compiledExpression =
      context => context.Employees.Where(c => true);
 
   // Dig out the dummy predicate from the expression tree created above
   Expression template = ((UnaryExpression)((MethodCallExpression)(compiledExpression.Body)).Arguments[1]).Operand;
 
   // Swap out the template for the predicate
   compiledExpression = (Expression<Func<DataClasses1DataContext, IEnumerable<Employee>>>) 
                                 ExpressionRewriter.Replace(compiledExpression, template, predicate);
 
   // Compile the query
   var compiledQuery = CompiledQuery.Compile(compiledExpression);
 
   // and using the compiled query, output the results.  This works :)
   using (DataClasses1DataContext context = new DataClasses1DataContext())
   {
      var results = compiledQuery(context);
 
      Console.WriteLine("Number of employees: {0}", results.Count());
   }
}

So the required query is itself stored as an expression, with a dummy predicate (c => true) used to get the correct "shape" of tree.  This predicate is then located and the expression tree is rewritten, swapping out the dummy predicate for the real one.  This new query expression then compiles and executes just fine.

For completeness, the ExpressionRewriter class is defined as:

class ExpressionRewriter : ExpressionVisitor
{
   static public Expression Replace(Expression tree, Expression toReplace, Expression replaceWith)
   {
      ExpressionRewriter rewriter = new ExpressionRewriter(toReplace, replaceWith);
 
      return rewriter.Visit(tree);
   }
 
   private readonly Expression _toReplace;
   private readonly Expression _replaceWith;
 
   private ExpressionRewriter(Expression toReplace, Expression replaceWith)
   {
      _toReplace = toReplace;
      _replaceWith = replaceWith;
   }
 
   protected override Expression Visit(Expression exp)
   {
      if (exp == _toReplace)
      {
         return _replaceWith;
      }
      return base.Visit(exp);
   }
}

where the ExpressionVisitor base class can be found on MSDN

Unsafe code without the Unsafe keyword

I've been playing around with some code lately that uses dynamic method generation fairly extensively.  In the course of doing so, I've written the odd dodgy bit of IL out.  Interestingly, a couple of time I got some very strange results when assigning fields from one object to another - specifically, if I got the types mismatched I just got garbage in the destination rather than some form of Cast exception (which I'd expect the runtime to generate during execution) or Verification exception (which I'd expect when I finally surface my generated method through a call to DynamicMethod.CreateDelegate()).

Finally had some time today to take a closer look, and the results are very interesting and not at all clear from the documentation.  Specifically, if you create a dynamic method using the following constructor:

public DynamicMethod(
    string name,
    Type returnType,
    Type[] parameterTypes,
    Module m
)

and pass in "Assembly.GetExecutingAssembly().ManifestModule" for the module, then it appears that all type safety within the generated code is turned off.  i.e., you can pretty much assign anything to anything.  The following code, for example, enables you to dump the memory address of any reference type:

/// <summary>
/// Return a method that gives the memory address of any object
/// </summary>
static Func<object, int> Get_GetAddress_Method()
{
   DynamicMethod d = new DynamicMethod("", typeof (int), new Type[] {typeof (Object)},
                                       Assembly.GetExecutingAssembly().ManifestModule);
 
   ILGenerator ilGen = d.GetILGenerator();
 
   ilGen.Emit(OpCodes.Ldarg_0); // Load arg_0 onto the stack (of type object)
   ilGen.Emit(OpCodes.Ret);     // And return - note that the return type is an int...
 
   return (Func<object, int>)d.CreateDelegate(typeof(Func<object, int>));
}

You can use this in the following way:

Func<object, int> getAddress = Get_GetAddress_Method();
const string greeting = "Hello";
 
// Get the address of the "Hello" string
int x = getAddress(greeting);

x now contains the memory address of the string "Hello".  So what?  Well, you can also write a method like this:

/// <summary>
/// Return a method that "maps" any type to a particular memory location
/// </summary>
static Func<int, T> Get_ObjectAtAddress_Method<T>()
{
   DynamicMethod d = new DynamicMethod("", typeof (T), new Type[] {typeof (int)},
                                       Assembly.GetExecutingAssembly().ManifestModule);
 
   ILGenerator ilGen = d.GetILGenerator();
 
   ilGen.Emit(OpCodes.Ldarg_0);  // Load arg_0 onto the stack (of type int)
   ilGen.Emit(OpCodes.Ret);      // And return - note that the return type is T
 
   return (Func<int, T>)d.CreateDelegate(typeof(Func<int, T>));
}

This chap lets you take any memory address, and "pretend" that an object of type T resides there.  So you can do something like this:

Func<int, byte[]> getData = Get_ObjectAtAddress_Method<byte[]>();
 
// Get a byte array on the same location
byte[] data = getData(x);

where x is a memory location that you've acquired previously.  It doesn't matter if the type that really resides at address x is a byte[] or not.  This basically lets you get access to the whole address space within your AppDomain (and possibly the whole Win32 process) and write whatever you like into it. 

This seems plain wrong to me - I haven't specified the "unsafe" keyword anywhere, nor is this code built with the "Allow unsafe code" box checked.  Without jumping through those hoops, I should not be able to write code like this.  I'll concede that this only works in a full trust environment, but it still smells like a very serious hole in the type safety of .Net.  Interestingly, if you use the DynamicMethod constructor that doesn't take a Module parameter, then everything works as you'd expect - you are politely served a VerficationException when you try to compile the method.  According to the docs, the constructor overload that takes a module is only supposed to allow access to internals of the specified module, not to skip type safety.  I wonder if the implementation of DynamicMethod in that scenario is flawed.

Below is a big lump of code - it compiles and shows the issue quite clearly.  I'd be interested in your views on whether this is a bug or "by design". If the latter, what exactly was the scenario that they were designing for?

using System;
using System.Reflection;
using System.Reflection.Emit;
using System.Text;
 
namespace ConsoleApplication1
{
   class Program
   {
      static void Main()
      {
         // Get some methods generated...
         Func<object, int> getAddress = Get_GetAddress_Method();
         Func<int, byte[]> getData = Get_ObjectAtAddress_Method<byte[]>();
 
         const string greeting = "Hello";
 
         // Print the greeting
         Console.WriteLine(greeting);
 
         // Get the address of the "Hello" string
         int x = getAddress(greeting);
 
         // Get a byte array on the same location
         byte[] data = getData(x);
 
         // Change some data...
         SetString("Bye!!", data);
 
         // And display the greeting again (remember, strings are immutable...)
         Console.WriteLine(greeting);
 
         // And just to show it against other bits of the framework...
         Console.WriteLine(Assembly.GetExecutingAssembly().FullName);
 
         SetString("Hacked!", getData(getAddress(Assembly.GetExecutingAssembly().FullName)));
 
         Console.WriteLine(Assembly.GetExecutingAssembly().FullName);
      }
 
      /// <summary>
      /// Return a method that gives the memory address of any object
      /// </summary>
      static Func<object, int> Get_GetAddress_Method()
      {
         DynamicMethod d = new DynamicMethod("", typeof (int), new Type[] {typeof (Object)},
                                             Assembly.GetExecutingAssembly().ManifestModule);
 
         ILGenerator ilGen = d.GetILGenerator();
 
         ilGen.Emit(OpCodes.Ldarg_0); // Load arg_0 onto the stack (of type object)
         ilGen.Emit(OpCodes.Ret);     // And return - note that the return type is an int...
 
         return (Func<object, int>)d.CreateDelegate(typeof(Func<object, int>));
      }
 
      /// <summary>
      /// Return a method that "maps" any type to a particular memory location
      /// </summary>
      static Func<int, T> Get_ObjectAtAddress_Method<T>()
      {
         DynamicMethod d = new DynamicMethod("", typeof (T), new Type[] {typeof (int)},
                                             Assembly.GetExecutingAssembly().ManifestModule);
 
         ILGenerator ilGen = d.GetILGenerator();
 
         ilGen.Emit(OpCodes.Ldarg_0);  // Load arg_0 onto the stack (of type int)
         ilGen.Emit(OpCodes.Ret);      // And return - note that the return type is T
 
         return (Func<int, T>)d.CreateDelegate(typeof(Func<int, T>));
      }
 
      /// <summary>
      /// Little helper method to copy a string into a byte[]
      /// </summary>
      static void SetString(string requiredString, byte[] dest)
      {
         UnicodeEncoding encoder = new UnicodeEncoding();
         byte[] requiredBytes = encoder.GetBytes(requiredString);
 
         // Need to do the copy by hand, since Array.Copy bleats
         // about the dimensions of the destination.  No surprise really,
         // since the destination isn't really an array...
         for (int i = 0; i < requiredBytes.Length; i++)
         {
            dest[i] = requiredBytes[i];
         }
      }
   }
}
Debugging Services

We all know the problem - to debug a program (particularly if it's the startup procedures that you need to look at), you simply load the solution in Visual Studio and hit F5.  Except if it's a service.  With a service, you can't just run it but instead it needs to be launched via the Service Control Manager, which makes debugging its startup a real pain.

The solutions that I've used in the past have either been to have a command line option to enable the process to launch as a regular process as opposed to being control by the SCM, or to have it sleep for a number of seconds when it is launched.  Either of these gives me a route to get a debugger attached before anything "interesting" happens.  But both of these also mean I've got code present that isn't going to be in the live, and I much prefer to be debugging "the real thing" rather than some (albeit close) approximation.

But there's a third way, which I'd not heard of before - there is a registry setting for "Image File Execution Options" which, amongst other things, allows you to specify that when an app is launched (via CreateProcess(), which covers most scenarios), instead of just firing up the exe it instead runs the required debugger.  Nice :)

For more details, check out this blog on the subject.  There's also this entry which describes a few more of the settings that are available.

What's wrong with this code, part II

Following in from the previous blog, here are the list of problems with the code snippet, as identified by the original author.

  1. Because the method uses the yield keyword, the compiler performs some magic and turns it into a enumerator.  Hence, nothing will actually get executed until someone calls MoveNext().  That includes the null check, which means that the exception won't be thrown until sometime later in the execution flow, where it may well be confusing
  2. The condition on the while loop is probably more complex than necessary.  There are no prices for trying to cram lots of code onto a single line - it won't run any faster and it's just harder to read.
  3. But the biggie is, as spotted by Dan, the fact that the code messes with the underlying stream.

The first of these is non-obvious, unless you know exactly what "yield" is doing under the covers - it's worth being aware of how things like that work, since there use does have an impact on how you should structure your code.

The second is just good practice - you will only write the code once, but it will be read many times, so always write with readability in mind.

The third introduces all sorts of complications with regard to how a client would use this method.  The original article is here, and I encourage you to read it, since the author makes a very good job of explaining why code like this smells so bad.

In particular, I think that careful consideration of the contract that you are expecting callers of your code to adhere to is very important - get that right (or at least, don't get it badly wrong), and the whole system is likely to benefit.

What's wrong with this code?

This is blatantly copied from another blog, but it's worth some consideration so I'm duplicating it here.  In this post, I'm just going to paste the code, and request answers to the "what's wrong with this" question - there are several things, so open the floodgates.

Of course, you can just cheat and google it, since I suspect pasting a lump of this code into google will give you the original article, but it kind of defeats the purpose of the exercise.  I'll blog the answers in a future blog, plus give a link to the source so that the original author gets the full credit.

   1: public static class StreamReaderExtensions
   2: {
   3:     public static IEnumerable<string> Lines(this StreamReader reader)
   4:     {
   5:         if (reader== null)
   6:             throw new ArgumentNullException("reader");
   7:         reader.BaseStream.Seek(0, SeekOrigin.Begin);
   8:         string line;
   9:         while ((line = reader.ReadLine()) != null)
  10:             yield return line;
  11:     }
  12: }
Naming Tests

Following on from Hadi's post, I found this recent blog on the same topic.  It says pretty much the same thing, and acts as a good re-enforcement to the general point, which is to give your tests good names so that the next chap who looks at the code understands what the code being tested is supposed to do.  It's such a valuable addition to the project documentation.

Exactly which form you choose isn't really that important, providing that it is both descriptive and consistent.  What do you use for your test names?

More on the Chrome EULA

Google obviously read my blog* and caved in without a fight.  Clause 11 in the EULA has now been changed to:

11. Content license from you

11.1 You retain copyright and any other rights you already hold in Content which you submit, post or display on or through, the Services.

Much better.

* Ok, perhaps it wasn't just my blog that did it :)
Chrome EULA

There's an interesting clause in the EULA for Chrome:

11.1 You retain copyright and any other rights you already hold in Content which you submit, post or display on or through, the Services. By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to reproduce, adapt, modify, translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services. This license is for the sole purpose of enabling Google to display, distribute and promote the Services and may be revoked for certain Services as defined in the Additional Terms of those Services.

where Services is defined as:

1.1 Your use of Google’s products, software, services and web sites (referred to collectively as the “Services” in this document and excluding any services provided to you by Google under a separate written agreement) is subject to the terms of a legal agreement between you and Google. “Google” means Google Inc., whose principal place of business is at 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States. This document explains how the agreement is made up, and sets out some of the terms of that agreement

Chrome is, I believe, a Google product, and so falls into the definition of Services.  Hence, according to the EULA, Google can do pretty much anything with any information that you "submit, post or display".  I suspect that this is a mistake on their part, and that they've just cut'n'pasted a little to much from the EULA's for some of their other services.  It's certainly at odds with the privacy policy for Chrome.

However, if you're worried about such things, then I'd suggest that you don't use Chrome for anything sensitive.  Note that this licence only applies to the executable installation; if you download the source and build it yourself then you are covered by a regular Open Source licence.  It'll be interesting to see how long it takes them to re-word this...

U.S. Employer Identification Number (EIN)

Any non-US company that wishes to sell iPhone applications in the US via the AppStore needs to obtain an Employer Identification Number to complete the contract details with Apple.  The Apple site is not particularly helpful with regard to how this is achieved, and simply points you at the IRS Form SS-4 PDF.  You can fill this out and send it off, but my understanding is that if you do so it will be several weeks before you receive your EIN. 

Alternatively, try dialling this number: +1-215-516-6999 - I just did, and after a few minutes with a very helpful chap I received my EIN. Fast and easy, just how I like it :)

Google Chrome

No doubt this is old news already, but for those that haven't seen it there's a cartoon strip that describes the new google browser here (although it's an odd format, it's actually quite a good read).  The browser itself is available for download here.  No Mac or Linux support yet, but supposedly that's on the way.

My first impressions are pretty good - it's seems to be fairly snappy and correctly renders most of the pages that I use.  I like the UI - it's nicely un-cluttered, and having the tabs right at the top seems to work well. 

Pseudo-Predicates in Specifications

Following on from my previous blog entry in which I asserted that I don't normally bother just referencing other people's blog, here's another reference :)

It's quite a long article, but well written and definitely makes it's point.  To anyone writing specifications, and to anyone reading them (which I think covers just about everyone who's reading this!), it's well worth a read:

Tasty Beverages

It's talking about predicates in the context of security, but I think the lesson is actually broader than that - I don't think it's all that unusual to see pseudo-predicates in pretty much any form of specification, and the danger with them is that the human brain is pretty adept at filling in what it thinks is missing (hence, they can be hard to spot).  Alas, what the brain makes up isn't always what the author of the spec was thinking, leading to the wrong thing being developed.