Plugins all the way down

When I set out to build a new Open Source webmail program last year, one of the main things I wanted to accomplish was a robust plugin system. In the past, I designed functionality not unlike the way WordPress plugins work (which is not unlike the way Squirrelmail plugins work, my first exposure to the concept). I think the strength of a plugin system really helps grow an Open Source project. Would WordPress be such a dominant CMS if the plugin ecosystem was not so vast? During its heyday, Squirrelmail had a vibrant add-on community, and a lot of new users were drawn to the project because of it. Many core developers of that project started out writing plugins, including me.

I decided to take a different approach this time around. What if instead of having a core flow of execution that plugins can add to or alter, everything was a plugin? If all the work needed to process a request was modular, it would provide a way for sites to customize almost anything about the application without having to hack a core file or implement a crazy workaround. It would also make it possible to customize the app simply by including the plugins you want. I decided to build the application this way, and while it turned out a teeny-tiny bit complicated, the results are pretty cool if I do say so myself. And I do.

The system is composed of three distinct components. The “framework” which processes a page request and provides high level constructs for things like session management and page routing. “modules sets” that provide specific types of functionality, such as an IMAP module set that let’s you access IMAP E-mail accounts, and individual “modules” inside of sets, that do a small piece of work. The framework provides an execution environment for module sets. Module sets assign modules to request identifiers and define allowed input and output types. Finally, individual modules do the work of processing page input and building page output. Let’s pretend that was not complicated, because there’s more!

Module sets contain two types of modules, “handler” modules, and “output” modules. Handler modules have access to the session, site configuration, user configuration, and white-listed/sanitized input values. Output modules only have access to the data produced by handler modules (and other output modules). A module set combines handler modules that process and take action on user input, with output modules that format the response to be sent to the browser. The result is a one way data flow through module execution, with default (though configurable) immutability. It’s kind of like the principles of React, without the icky JavaScript part.

Module sets can override each other and replace module assignments done by other sets, or insert a module before or after a module assigned by another set. There is only one required set, the “core” modules. Sites can override core functionality in this set, but it must be included for the others to work. Of course, some or all of it could be replaced with something completely site specific, which means you could change the program into virtually any type of web application you want, and still get the benefit of the lightweight framework.

One of the best things about this system is it requires module sets to white-list and type-cast user input. Modules cannot access standard PHP super globals – they must use the provided data framework, because those super globals are mercilessly unset before modules execute. Modules also have easy output escaping and string translating functionality built-in, so there are no complicated procedures to safely output untrusted data. Modules are designed to have a single purpose, so they end up being concise, which makes auditing and testing easier. Modules use inheritance to access data they need, there is no need to resort to global scope (the application framework and all the included module sets don’t use any PHP globals).

Developing modules can be tricky, so there is a built-in “debug mode” that makes it a lot easier to catch common mistakes and see immediate results. In debug mode, module assignments are dynamically determined – you don’t have to rebuild the configuration file for a newly created module to fire. Assets like CSS and JavaScript includes are un-minified and separately requested – so troubleshooting problems from a browser developer console is a lot easier. Debug mode also enables a constant stream of information to the PHP error log about each page request. Of course there is also a “hello world” module set included in the source, with loads of comments and examples.

To date, I have built more than 15 module sets using this system. So far, I think it’s pretty neat. It’s complicated, but it provides a structured way to modify the program with a safer-than-most API while maintaining a concise separation of concerns. It’s definitely more complex than a lot of plugin systems out there, but also more powerful with an eye on overall performance. At the very least, I think we can all agree that “module set” sounds way cooler than “plugin”, so that’s a plus.

Cypht: New Open Source Webmail

I suck at product announcements, probably because I have no perceptible marketing skills. Usually I just whip out a list of technical details that even I can’t edit because I fall asleep by the third bullet point – A list that significantly limits the number of people who care one iota about what I’m blathering on about. What was I blathering on about? That’s right, sucky product announcements. Let’s pile another one on the heap, shall we?

This is the official wiz-bang super exciting announcement for a NEW shiny NEW Open Source webmail program called “Cypht” (pronounced “sift”). Did I mention it was NEW? New does not necessarily mean better you say? Well, I had not thought about that before deciding to emphasize it. Seriously however, in the world of Open Source webmail, new is actually exciting (and subjective I guess since I have been working on this for like a year already).

Will this software disrupt the E-mail paradigm while streamlining workflow to maximize interpersonal communication channels using a ground breaking application stack and development process? Nope. From a user perspective it’s like a lot of web-based E-mail programs you are already familiar with. From a developer perspective the code is experimental, with a focus on a smart overall design that offsets some of the downfalls of building a complex program in PHP.

Wait, don’t go just because I said PHP. It’s not like the other PHP based webmail programs I have worked on, but it’s also not just another vanilla web-based E-mail client. One of the core ideas and most interesting features is the concept of viewing aggregated E-mail from multiple sources, without actually forwarding the messages to a central account. Let me esplain:

I have a few Gmail accounts. An old Yahoo E-mail. Throw in a few other domains I own and I have about 5 or so accounts I want to keep up with (to varying degrees). I don’t want to auto-forward them all to a single account, I like keeping my work E-mail separate from my personal messages. So I said to myself,

“Self, why not build a webmail application that does that? One that could give self fast access to a list of all the unread messages in all self’s inboxes from the last 2 days. Or something like that. Oh and search and stuff.”

And so it came to pass, that said webmail was created and continues to be tinkered with. I have droned on endlessly about this software on this blog in the past, so if you are one of the people who came here for the “Linux on a Macbook Pro” post (which is why most people come here) and clicked on the wrong link, this probably sounds familiar.

Since this is not really new, not even new to this blog, why the awful product announcement? Is it just an excuse to rehash old post material until someone pays attention to your stupid program? Maybe, but I think we are getting a little off-track, and for the record its my blog so I will be the one asking the questions around here partner.

Did I mention I like lists of technical details? Here comes one!

  • Really small page sizes, like the entire page + JS + CSS + Ajax requests is less than the Google home page. With normal browser caching and HTML5 local session storage the data transferred is reduced to a stupifyingly small amount.
  • An extra emphasis on security and privacy throughout the application. This is an Open Source program meant to be safe and useful, not an ad generating machine mining your E-mail.
  • It’s not your dad’s webmail! Unless you are one of my daughters in which case it is your dad’s webmail.
  • Valid HTML5 pages with accessibility friendly markup and mobile views.
  • A module system that is like a plugin system on crank. It’s all friggen modules!
  • At this point in development, Cypht is a decent E-mail and news reader with limited outbound message support (in this context limited means don’t use it).
  • A huge following on Github of 1, combined with a large developer base of 1. This thing is taking off I tell you!
  • A website with tons of impressive sounding acronyms like IMAP, POP3, SMTP, HHVM, RSS, SSL, PHP, 2FA, PBKDF2, AES, and more!
  • Other stuff!

Still here? I’m as surprised as you are. This concludes today’s craptacular product announcement. This blog will now return to its irregular schedule of posts about video games or goats or whatever.

Fun With PHPUnit

What is the first word that pops into your head when you think about unit-testing? I’m guessing “Fun”, amiright? Ok maybe not fun. Unless repeatedly slamming your head against a brick wall is your idea of fun. Trying to build comprehensive tests for existing code is an exercise in patience. Like the Olympics of patience. Sometimes when I come up for air and wipe the blood from my head (and wall), I realize that while I was toiling away in the unit-test dungeon, I stumbled on something useful. Maybe it’s a feature I had not explored before, or a solution to a tricky situation. Maybe it’s something that will help inject a little fun into your PHPUnit experience.

Before I start throwing out random ideas, let me say that I really like PHPUnit: lots of knobs, good documentation, active development. PHP is the Yoga instructor of programming languages, but that flexibility means it’s easy to write poor quality code. When code and unit-tests don’t get along, the right answer is to send the code to bed without dinner, and if it’s really bad, ground it for the rest of the week. We can blame the code all we want, but sometimes there is no choice but to tweak the system to find a way to make it all work.

One of the biggest obstacles in testing is state. Things like global variables, static values or class instances can easily create a situation in which a test passes when run alone, but fails when run as a part of a suite. There are other types of states to consider aside from just the disastrous PHP global namespace. Sessions for example. Or the state of data in a test db or on disk. Tests that fail intermittently are hard to debug, and usually it’s due to an overlooked state issue.

Tests should be as insulated and independent as possible, and PHPUnit has a feature called “runInSeperateProcess” that forces each test to run in its own PHP process. Using this feature is not as simple as it sounds. If you have a bootstrap file defined in your phpunit.xml file, and it includes any code that defines a constant, you will get an error about redefining said constant in your tests. What gives? I thought each test runs in its own process? It does, but from what I can determine (using the throw-it-against-the-wall method), only the test code itself is run per-process. Assuming that poorly substantiated statement to be true, here is a pattern that does work with process separation.

Process Separation

First the phpunit.xml file, WITHOUT a bootstrap

<phpunit strict="true" colors="true">
    <testsuite name="my_awesome_tests">
<file>./my_awesome_tests.php</file>

Then the my_awesome_tests.php file, with the bootstrap included in the setUp() method and the @runInSeperateProcess and @preserveGlobalState annotations.

<?php

class My_Awesome_Tests extends PHPUnit_Framework_TestCase {

    public function setUp() {
        require 'bootstrap.php';
    }

    /**
     * @preserveGlobalState disabled
     * @runInSeparateProcess
     */
    public function my_test_one() {
    }
    /**
     * @preserveGlobalState disabled
     * @runInSeparateProcess
     */
    public function my_test_two() {
    }
}
?>

Finally, the bootstrap looks something like this

<?php

/* all the things */
error_reporting(E_ALL | E_STRICT);

/* determine current absolute path used for require statements */
define('APP_PATH', dirname(dirname(__FILE__)).'/');

/* get mock objects */
require APP_PATH.'tests/mocks.php';

/* get the code we want to test */
require APP_PATH.'lib/framework.php';

/* get the stubs */
require APP_PATH.'tests/stubs.php';

?>

mocks.php contains stand-in objects used as arguments to methods we want to test. The stubs.php file contains wrappers around abstract classes and traits so we can test them more easily. One advantage of this pattern is it makes it possible to pre-define constants in the setUp() method before the code being tested is loaded, so a test can exercise a code path that triggers on a non-default constant value (Assuming the code being tested checks for an already defined constant). Since mocks are loaded before the code being tested, we can also leverage this to override things unfriendly to testing.

Overriding Stuff

It’s good practice to limit mocking and overriding to a minimum. The more code that is mocked out, the less actual code that is being tested. There are however some built-in PHP functions that simply don’t play nice, like setcookie or header or session_start or error_log or die – you get the idea. Using the pattern as described above in “Process Separation”, we can easily add some override behavior to deal with these problems (sadly this does require changes to the code being tested).

In our mocks.php file we create a class of all static methods for built-in functions that don’t play well with others.

class BuiltIns {
    public static function php_header($header_str) { return true; }
    public static function php_die($msg) { return true; }
    public static function php_error_log($str) { return true; }
}

In the code to be tested, we setup a mirror image of this class that runs the actual built in functions, but only loads if the class is not yet defined.

if (!class_exists('BuiltIns')) {
    class BuiltIns {
        public static function php_header($header_str) {
            return header($header_str);
        }
        public static function php_die($msg=false) {
            return die($msg);
        }
        public static function php_error_log($str) {
            return error_log($str);
        }
    }
}

Then we replace occurrences of these built-in functions in the code to be tested. So this:

if ($error_str) {
   error_log($error_str);
}

Becomes this:

if ($error_str) {
   Builtins::php_error_log($error_str);
}

WOOT! Now we don’t have to worry about an errand error_log spoiling our unit-test party. We can even do something useful in the mocked out versions, maybe a fake session_start() call can populate $_SESSION or another constant in setUp can toggle success or failure from the mocked out function. The sky is the limit people!

Coverage

I have only recently started to look at PHPUnit’s coverage options. When I first tried it out, it bailed with a cryptic message and I was sad. Some head scratching and a few apt-get installs later, I was blown away. The HTML coverage report is incredibly useful. By default it sucks up other included files, so if you are dealing with a big code-base it can be handy to limit coverage to just code you are actively testing. I like to define these limits and enable the coverage report in my phpunit.xml file with something like this:

    <filter>
        <whitelist addUncoveredFilesFromWhitelist="false" processUncoveredFilesFromWhitelist="false">
            <directory suffix=".php">../lib</directory>
            <exclude>
            <file>../lib/not_tested.php</file>
            </exclude>
        </whitelist>
    </filter>
    <logging>
        <log type="coverage-html" target="./coverage.html" charset="UTF-8" highlight="false" lowUpperBound="35" highLowerBound="70"/>
    </logging>

The report is comprehensive. It has summary charts for coverage, complexity, risk analysis, and even line by line detail that makes it brain-dead easy to see what your tests are hitting, and more importantly, what they are missing. Coverage alone does not make a good unit test, but it’s a great tool to help improve your tests.

Extending PHPUnit_Framework_TestCase

This post is really dragging on so I will leave you with one additional trick I came across building unit-tests for a billing system. We wanted a test suite that we could run across a variety of products, but the tests code would be nearly identical. Duplicating the tests for each product was a maintenance nightmare. We needed a way to run virtually the same test code across multiple products. Here is what I came up with:

Start by extending the PHPUnit_Framework_TestCase class with an abstract class. This will be where the actual test code lives.

<?php

abstract class Base_Tests extends PHPUnit_Framework_TestCase {
    public function test_something() {
    /* your normal test code goes here */
    }
}
?>

Next extend that class for each product you want to test, and define the product as a member variable in the setUp() method so it can be accessed from the test method scope:

<?php
class Product_A_Tests extends Base_Tests {
    public function setUp() {
        $this->product = 'product a';
    }
}
?>

PHPUnit will not try to run the tests in the abstract class, but it will find and run the tests in the classes that extend it. For each product you want to test, just extend the base class and define the product details in setUp().

I’m hardly a PHPUnit expert, and there are surely improvements to these ideas or even completely better ways to accomplish the same thing. All of these examples minus the last one were taken from a unit-test suite for some Open Source software I’m working on. You can see the still-in-progress test code at Github, and an example of the coverage report from PHPUnit at jasonmunro.net.

The Itch I Can’t Stop Scratching

My first official contribution to an Open Source software project was way back in 2002.  I was solving a problem for my employer, and ended up becoming a developer for the venerable Squirrelmail project. It was an exciting time. The community was vibrant, active, and surprisingly welcoming to a near-complete novice willing to get their hands dirty. Looking back at the code I wrote lo those many years ago makes me want to gouge my eyes out with red-hot sporks, but I can’t deny the impact contributing to that project had on both my mindset and career path. Since then my involvement in Open Source has waxed and waned, but has always remained. That seemingly innocent interaction sparked a lifelong interest in webmail applications, and I have been tinkering with them ever since.

After a brief 5-year stint writing mostly Python and C++ , I started working with PHP full-time again last May when I joined Automattic. I realized pretty soon after starting that my skills were rusty. Like PHP4 rusty. I needed to experiment with the latest-greatest the language had to offer, but in a safe way, and on my own terms. For the third time in my life, I decided to unleash yet another Open Source webmail client on the world. That surge of excitement you are not feeling at this point is totally understandable. Especially considering the code I wrote the first two times would best be stowed away in the “how not to write complex software” file.

I set out with a newly provisioned github repo, the enthusiasm of someone half my age, and some lofty goals:

  1. Build a client with combined views from multiple E-mail accounts, able to speak both IMAP and POP3, and flexible enough to merge other data sources
  2. Turn security up to 11. Perhaps 12
  3. Make it fast, compact, and compliant
  4. Utilize a modular system that all components outside the bare bones framework use. Like an uber-plugin system the whole app runs on
  5. Do all this while pushing myself to learn what great features new versions of PHP have to offer

To get started, I ferreted out and cleaned up the core IMAP, POP3, and SMTP routines from my last webmail project. While I was at it, I modernized the IMAP library to support some useful protocol extensions, and even built some unit tests *gasp*. These libs have been battle-tested against real world server idiosyncrasies for over a decade, so while they may not be ideal from a code design standpoint, they have an established record of compatibility. This is important when dealing with complex protocols that have a myriad of server implementations. I’m looking at you IMAP.

Next I set out to create a simple request and response processing framework – one that uses “modules” to do the real work of building the resulting page. The framework is lightweight (request processing uses on average 2MB of server memory), and leverages some nifty code features. With a framework in place, the next step was to start cranking out module sets for specific functionality. I started with core requirements like laying out the page content and logging in and out. Next I dove into IMAP, since it would be the primary protocol for E-mail access, and easily the most complicated data source to implement.

9 months later I am happy to say I have a pleasant to use E-mail and RSS reader including preliminary SMTP support for outbound mail (very preliminary). It’s easy on the server and the browser, and has some interesting features for combined content views. It is still very much a work in progress, but here are some highlights:

  • Super small pages with minimal server requests. A single page load only requires 3 HTTP requests with a combined response size of about 30KB (gzipped). Email and Feed data are populated via one parallel AJAX call per source, with response sizes of ~1KB. All interface icons are served inline with data urls to keep request count low.
  • Oodles of security features: TLS/STARTTLS support for all protocols; forced HTTPS for browser requests; secure HTTP-only session level cookies; AES compatible encryption for session and persistent data using unique keys; white-listed and typed user input; built-in HTTP POST nonce enforcement; HTTP header fingerprinting; easy output escaping; a two factor authentication module; probably more I’m forgetting.
  • Modules for IMAP, POP3, SMTP, RSS, and several other app components with more on the way. Modules can be enabled or disabled independently. The module system is super flexible and lends itself to some interesting customization options. It might even turn out to be too flexible.
  • Easy-to-extend session management including stock PHP session support and custom DB sessions. The DB session support is not a registered PHP session handler – it is a completely independent implementation.
  • Authentication is also easy to extend and already supports authenticating via IMAP, POP3, or an included PBKDF2 compliant database schema.
  • Database access is not required (unless used for authentication), but can be leveraged for session and persistent data storage with any PDO supported DB. Table definitions are included for Mysql and Postgresql.
  • Validated HTML5 output, including responsive views for mobile devices and HTLM5 local session storage for caching.
  • Lots of other boring technical details really neat stuff!

I could ramble on about this forever, better stop now before I get carried away. No post about half-done probably soon-to-be obsolete software is complete without at least one screenshot. Here is a look at the interface with a combined view of 9 different RSS feeds.

hm3_feedsIt’s not only been a great learning experience to work on this code, it’s been a lot of fun too. The repository is at http://github.com/jasonmunro/hm3/ for anyone who wants to take a look. Documentation is scarce and things are changing quickly, so if you do check it out, use caution :).

IMAP, I Stab At Thee

I have a love-hate relationship with IMAP (let’s just ignore for the moment how weird it is to have any sort of emotional engagement with a protocol specification). IMAP stands for “Internet Message Access Protocol” and it is a powerful way for client programs to read and manage the messages of an E-mail account. The 107 page definition as laid out in RFC 3501 is not trivial – it is a complex process that can be tricky to get right. This is especially true for client writers as the IMAP server requirements can be very flexible, which in turn requires clients to adapt to a variety of possible implementations. There are also a slew of extensions with their own RFCs, just to keep the water muddy.

My loathing adoration for IMAP started innocently enough. I was getting involved in Open Source software development and was contributing to the Squirrelmail project. Squirrelmail is a webmail program written in PHP, which happens to use IMAP for E-mail account access. My first official submission to this project was on February 20th, 2002, and was completely unrelated to the protocol. However as time wore on, I became more involved in the project. I started digging deeper into the code base. Deeper and deeper. This is where I came face to face with the beast I would wrestle with for the next decade. It was like having my very own virtual Balrog. “BUGS SHALL NOT PASS!“.

PHP has native functions to communicate with IMAP servers, but back in those days they were not always available, and they had a reputation for being unreliable. Squirrelmail had its own client library, using PHP to read and write to the listening socket of the IMAP server. Seems simple, no? Turns out it’s not, but I was fascinated with this aspect of the code. I started spending a lot of time digging into the RFC to ferret out the nuance of one situation or another. I subscribed to esoteric IMAP mailing lists and tried to keep up with the discussion. Soon I had a hard copy of the RFC with me everywhere I went. Not long after that, IMAP commands and responses started streaming through my dreams like ASCII dragons searching for the tender flesh of programmers to feast on.

Leveraging the arrogance of immaturity, I left the Squirrelmail project about a year later, and started my own much less popular webmail project called Hastymail, which I still maintain. I wrote an initial IMAP client implementation in 2003, then again as a part of a complete rewrite to “Hastymail2” in 2008.  Since that time I have just been bolting on features and fixing bugs (in that order), but I have been toying with a new idea recently. All I needed was an opportunity. As luck would have it, I found myself on a long plane trip with some battery life in my laptop, and more importantly that always elusive motivational impulse to jump into something I had not looked at in a long time.

My idea was to decouple the IMAP class from the core Hastymail code so that it could be easily used in any PHP program (hardly breaking new ground here I know, I will get to that). The class was reasonably self-contained, but it did directly use aspects of the Hastymail framework in places, and those areas would need to be marked as deleted then expunged (that’s an IMAP joke). This code was also PHP4 compatible, meaning it was only a “class” in the most basic sense of the word, if at all. I decided to beat it around the head with a PHP5 OOP stick for a while, to see if a better overall structure for the logic could be put in place without ripping too much apart.

I’m genuinely surprised to say it was a pretty successful effort. At some point in the process my mindset evolved from “let’s see if this can work” to “must finish at all costs before thinking about anything else“, and it became a near obsession to reach a particular point of completeness. As to the not breaking new ground issue: I’m well aware of that. I have no illusions that the world needs another PHP IMAP library, or even this one in particular. But it was fun to build, and there is a greater than zero chance that at least one other person on the planet might find it useful – if even as an example of what not to do – so why not share. You can see all the gory details in the newly created github repo.

https://github.com/jasonmunro/hm-imap

Even with all the recent re-factoring there is still plenty of code that has not felt the loving embrace of a developer’s debugger in many a year. I’m going to try to address those lonely lines in the coming months, but I can’t make any promises. A little voice in the back of my head keeps saying this could be the basis of a new generation of the Hastymail project, so we’ll see. On the other hand that same voice tells me that rainbows are government monitoring devices that can only be thwarted by wearing underwear made of pop-tarts, so the jury is still out on whether or not it can be trusted.