My current Perl Ironman Challenge status is: My Ironman Badge

Sunday, April 26, 2009

On implementing an HTTP abstraction

We all have our little itches that need scratching. This is a narrative on one particular itch that I scratched this past week: POE::Filter::SimpleHTTP.

HTTP. If the criss crossing interconnecting highspeed networks across the world were roads, HTTP would be the unkempt, ball cap wearing truckers transporting our goods.

Now I had a need for such a transport. Specifically I needed to support HTTP in a POE::Filter that could easily be inserted into POE::Filter::Stackable. There were some other solutions out there, that ultimately didn't do what I wanted them to do: POE::Filter::HTTPD only covered things from a server context and only HTTP/0.9, and POE::Component::Client::HTTP was too complex to use when I only needed a filter (Sure I could have ripped the filters out for use, but then the project would have a dependency on the whole distribution when I only wanted the filters). So I guess it was time to write one.

Can't be too hard right? I mean, the world goes 'round with web browsers, and servers, right? So I pull up the RFC and start reading. This is doable I thought to myself. There are some quirks when it comes to Transfer-Encoding, but nothing I couldn't handle.

But I should back up a little bit and give a bit more context on what I am writing, what tools I am using, and what the ultimate goal is.

POE is a framework for creating event-driven cooperative multitasking programs in Perl. It is quite possibly one of the best things invented since hand sanitizer dispensers that are triggered via motion. POE is Perl's answer to other languages' event driven frameworks (Twisted, for example, in the Python world). POE makes it trivial to write large, complex, applications that touch a number of domains. No more dumb select loops. No more event dispatch tables (by hand this is). No more wondering how to merge multiple Net::* modules each with their own event loops. Take look at the evolution of a POE server for what I mean.

POE also provides a number of abstractions that are sweet, sweet sugar. POE::Filter is one of those. POE::Filter defines a simple API for taking in data and churning out transformations, objects, and whatever else you can imagine. And once you have one these awesome little contraptions, you can plug it into a POE::Wheel (which defines the common semantics of setting up IO watchers, etc), and out pops your filtered data.

The idea is to implement your protocol abstractions inside Filters, stack them (via POE::Filter::Stackable [which is a filter itself with the same API]), and plug them into the framework.

Okay, so POE is Mecca. What else am I using? Moose. It is a bit unorthodox at the moment especially in the POE world, but the world view is slowly being adjusted through exploration. Moose is a postmodern object system for Perl 5 that takes the tedium out of writing object-oriented Perl. It borrows all the best features from Perl 6, CLOS (LISP), Smalltalk, Java, BETA, OCaml, Ruby and more, while still keeping true to its Perl 5 roots. It pretty much takes your bland oatmeal and turns into mouth melting awesomeness unbound by the universal awesomeness maximum. There are efforts[ 1 , 2 ] right now to bring Moose to POE in a meaningful way to hide some of the inherent complexity in using POE. And I guess I am now part of the revolution.

And finally to make my trifecta of standing on the shoulders of giants complete, I rely on a few modules from LWP, such as HTTP::Status, HTTP::Request, and HTTP::Response. These are pretty much key, as it means I don't have to implement my own objects to represent HTTP constructs.

And ultimately, this HTTP filter will be only a small part of the greater whole of an XMLRPC client/server implementation. I have a good chunk of the other pieces complete (POE::Filter::XML, POE::Filter::XML::RPC, POE::Component::PubSub), and what was left was the HTTP transport.

So I cracked open the RFC and started digging in. Now, in the POE::Filter context, your only goal is to turn line noise into something meaningful. That means that any socket manipulation should be left for another layer of your application. That said, HTTP/1.1 defines some behavior for dealing with keep-alives, etc which are beyond the scope of the Filter. And so I pretty much ignored it for the Filter implementation.

The RFC defines an army of BNF expressions that represent the textual formation of the protocol itself. BNF is pretty clean, even if you have to jump around a lot in the document to follow a chain of symbols down to the root symbol. I chose to convert, by hand, the BNF to Perl regular expressions. Talk about a great mental exercise. BNF as a grammar has a number of simlarities to Perl regular expressions anyhow, so it wasn't much of a stretch, but there are enough quirks to make it interesting.

As a side note, someone needs to write an Augmented BNF to Perl compatible regular expression translator.

Just to give a taste of some of the regex involved, take a look at the Regex module. Essentially, I built up my tokens and combined them into larger expressions making judicious use of non-capturing groups, character generators, and character exclusions via function.

One special thing to note is that I took a shortcut when it came to URIs and made use of the Regexp::Common::URI module. Only I introduced a bug when I did. I failed to make notice that the Request-Line could take not only an HTTP URI, but also an absolute path. Regexp::Common::URI didn't include this (which I think is a bug), and so I had to implement, specifically, absolute paths from the URI RFC.

But once the regex is complete, the rest of the module is mostly stepping through chunks of data deliniated by network newlines (\x0d\x0a) and keeping track of where we are in the parsing process. That is until we get to the Transfer-Encoding header processing.

The RFC is helpful in that it provides a skeleton of pseudo-code for processing transfer encodings. And that was a great starting point. Something that my Filter does that other don't is the processing of compression in the transfer encoding not just dechunkifying. Unfortunately, this filter is rather simple for this first version, and so the data isn't spooled to disk to avoid OOM exceptions, but will likely happen down the road.

Once we have all of the content decompressed (if it was compressed in the transfer), we store it into the appropriate HTTP::Message subclass and send it on its way.

So where does Moose fit into all of this? The Filter is actually a Moose::Object. And provides a number of attributes for use during runtime for configuration. Also, because it is Moosy, my constructor is generated automagically for me based on what attributes I have defined. No more argument processing. It rocks.

Now here is where it gets cool. Astute readers will have noticed that if I use this Filter unmodified in an XMLRPC stack with the XML parsing filter directly after this one, it won't work. Filter::XML is expecting raw XML, not an HTTP::Message object. So we will have to inject a shim to strip the content out of the message, right? Not with Moose. With Moose, in my stackable implementation, I can subclass SimpleHTTP and apply advice around get_one() and put() to do just that. I don't have to write a container class to call the parent filter or deal with calling SUPER methods that may not resolve correctly if someone else subclasses me or whatever. So many headaches disappear.

But with great magic comes some quirks. Mainly, Stackable has a horrible bug that uses UNIVERSAL::isa as a function to determine that Filters subclass from POE::Filter. Well in the Moose world, I can't get a magical inline constructor if my parent has a constructor defined. And POE::Filter defines a silly constructor. So what do we do?

First, install UNIVERSAL::isa from the CPAN and use it. Second, provide our own isa() implementation that says, yes, we are a proper POE::Filter subclass. Now all of our automagical goodness works as expected.

Throw in some tests that cover our error states, double check the attributes, and make sure our dechunkifying works, write some pod, and push to the CPAN.

And that was it. A new release is on the horizon largely to fix the POD (it's ugly, I know), and to add some more tests (can't have enough of those, right?).

You can follow development of POE::Filter::SimpleHTTP here.

UPDATE:

The second link on the Moose + POE effort was supposed to link to Evan Carroll's MooseyPoopoe on github (http://github.com/EvanCarroll/MooseyPoopoe/tree/master)

No comments:

Post a Comment

Post a Comment