The IMAP BODYSTRUCTURE command, and a bug in the Gmail IMAP service

Now that is one catchy post title. Who DOESN’T like to discuss the nuances of the IMAP BODYSTRUCTURE command? I guess if there are other “E-mail geeks” out there like me – toiling away in the evening’s waning hours writing E-mail software for no good reason – they might find it mildly interesting. At best. The only saving grace to this entire post is the fact that I found what appears to be a legitimate bug in the Gmail IMAP service. Of course this means I have bested all of Google’s E-mail engineers at their own game. I expect penance in the form of exhalation of my prowess, and perhaps a competitive job offer. I doubt either will materialize, but I am prepared to wait.

In the meantime let’s get on with the boring. It’s no secret I have a love-hate relationship with the IMAP protocol, and one of its painfully wonderful features is the BODYSTRUCTURE command. This command returns a string representation of the structure of a MIME formatted message. What makes this command wonderful is that it provides access to all the message parts in a bandwidth limited fashion. What makes it painful is the fact that it’s a disaster to parse, like most IMAP responses, though this one really takes the cake. The BODYSTRUCTURE command gives a mail client enough information to determine the “message part ids” needed to access particular sections of the message. Unlike more simplistic protocols like POP3, we can use this information to selectively choose the message part we want to display, and only fetch that content without having to download the entire thing. Take that simplistic protocols like POP3!

MIME message structure can get really complicated, since message parts can be contained inside other message parts that are inside other message parts (etc). It is critical for a client to accurately represent the structure, and to properly assign the “message part ids” so they can be individually viewed or downloaded. This is the point at which I stumbled on a problem with the Gmail IMAP service. The message in question is a digest E-mail from the Bugtraq mailing list. This message is formatted as a “MESSAGE/DIGEST” MIME type. Digests generally have a text part summary of the included messages, then a list of RFC822 parts containing the original E-mails sent to the list (an RFC822 part is like a container for an entire E-mail message). The Bugtraq digest E-mail follows this pattern. The BODYSTRUCTURE response from Gmail’s IMAP interface for these types of messages appears correct, however the “message part ids” derived from it do not work – they are rejected as invalid.

At first I assumed I was doing something wrong, usually a solid assumption when you are working on an IMAP client. Or you are me. The BODYSTRUCTURE response parsing code in my client was ungainly as sin, so I took this opportunity to re-factor it into something slightly less ugly. Even with the improved code, the message part ids continued to fail, so I decided to copy the message to a local IMAP account using Dovecot to compare the results. Surprisingly, it appears that Gmail’s IMAP server is doing it wrong. The BODYSTRUCTURE response from both IMAP servers is identical. The way my client is parsing the responses and determining the message part ids is also identical. Attempting to access the individual message parts using these ids fails in Gmail, but works in Dovecot. Dun-dun-dun!

Combined with some deep diving in the form of casually skimming the IMAP and MIME RFC’s, I’m convinced this is a bug in Gmail’s IMAP service. Interestingly the Gmail web interface displays digest messages as one big text blob and dumps all the parts out in a row. This might be simpler for users, but wading through raw message text is cumbersome for large digests. For the I’m-sticking-it-to-Gmail-in-this-post record, it also violates the RFC recommendations for clients displaying complex message structure.

Honestly though, I love Gmail, and it’s great that they allow IMAP access. They have also added some neat extensions to the IMAP protocol, such as a Google-like search command that kicks the crap out of the default IMAP search. Since the BODYSTRUCTURE response from Gmail’s IMAP service is correct, I suspect the problem with message part ids not working is a relatively simple fix to their IMAP implementation. True to form, I suspect this without any clue as to the inner workings of the Gmail IMAP implementation.