For the last 3 or 4 weeks I've been spending some time helping out on the NHibernate project, with an end game for my efforts being a full LINQ implementation. When I started this, I was a bit of a newbie to the NHibernate source code (i.e., I'd never looked at it), so I hadn't quite understood the nature of the nettle that I was about to grasp.
As most of you know, NHibernate has a full-featured query language, HQL, which is the primary way of executing queries. I started with the somewhat naive idea that all I need to do was create a transform from LINQ Expression trees to the internal representation of HQL that NHibernate would have. Of course, I wasn't expecting the transformation to be trivial, but I've done a lot with compilers & optimizers in the past so I was quite looking forward to it.
Turns out that the method used within NHibernate (and, indeed, Hibernate, where NHibernate was ported from) to parse HQL wasn't the sort of traditional lex / yacc / [insert your favorite parsing tool here] / AST approach that I was expecting; instead, it was just a ton of handwritten code. Now, I'm not saying that the code is bad by any means, but parsing a language in this way tends to create a parser that is both fragile and hard to understand. It was clear almost straight away that trying to turn LINQ expressions into the set of classes / flags / relationships etc that the hand-crafted parser built was going to be a real pain. In particular, no one would ever *really* trust that the translation worked for all scenarios.
Another approach would be to take the LINQ expression and turn it into an HQL string. That wasn't considered for very long :)
The only sane route forward, and one that the whole NHibernate developer community was very keen on, was to get a sane parse engine built for HQL, which would during it's processing create a sane AST. Any other language (be it LINQ or something else) could then target that AST.
Luckily for me, the Java boys had already been here, and the latest Hibernate source has a rather splendid ANTLR based parser. That made my life easier - ANTLR can emit to C#, so it's just a case of porting the code. I made the decision right at the start to use ANTLR 3 rather than ANTLR 2, which is what Hibernate uses; 3 looked to have better C# support, and had a large number of improvements that made it look much nicer. That has made the port less than trivial, but at least it makes it more interesting :)
So, after all that history, where are we? At a pretty good place, even if I do say so myself. I've currently got about half of the Java parser ported over (there's around 62kloc in the Java parser, just to get an idea of scale), and right at the end of Friday there was a "WooHoo" moment when it finally went end-to-end against the database for a fairly simple query ("from Animal a where a.Legs > 7", if you're that interested). If you want to take a look, the current code is on the uNhAddins repository up on Google code. Ayende certainly seems happy with progress so far :)
Tasks for the next couple of weeks are to get the remainder of the parser ported and to get a ton of unit tests in place so that we can get some confidence on the quality of the port. After that, I get to the real meat of the project and start on the LINQ stuff.
Watch this space for more updates...