Grow as a Drupal developer: deep dive into a bug

There are moments in your life as a developer when you ask yourself if you are good and how you can improve. And then you may decide to learn a new library, or new tools, or read a tech book or a few articles. Still, you don’t know if that’s the right approach. For me the best way to improve my skills and grow as Developer is by contributing.
This is a series of articles where I explain the WHYs, HOWs and WHATs of contributing to Open Source projects like Drupal.
In this post, I’ll write about an amazing adventure I found myself in while trying to fix a nasty bug for a client’s website.
This is the story of Issue #2936032: Sites named with special characters cannot send mail.
The Issue
On a client website, automated emails (e.g. those sent on form submission) were correctly sent but never received by users using some mail providers, like Gmail, while it seemed to work for others using e.g. Outlook.
The returned error messages were not very useful for determining the source of the problem, as they referred to a missing or malformed header value, while it was clearly and correctly printed in the source of the e-mail.
After a quick search on Google, and the Drupal issues queue, I found an issue reporting a similar behaviour and error message:
Our site name is “Hello World, LLC” […], but this is rejected by Google with “Messages with multiple addresses in From: 550 5.7.1 header are not accepted.”
And a workaround:
I can confirm that removing the “, LLC” from our site name allows the messages to be sent.
I went back to the client website, the site name was Foo’s Bar. Could that single-quote be the issue?
I removed it, and it worked! Emails were received correctly in every mail provider.
Initial Investigation
I started analysing the problem and providing feedback on the issue (comments #8, #9 and #10), including an attempt of a patch to fix the issue. I was hoping for someone, anyone, to reply and give some more clues.
But no one came to the rescue.
I now had two options:
A. To hold my nose and use my initial, untested, unreviewed, unreliable patch.
B. To take a deep breath, read all RFCs inherent to email source syntax and provide a full documented, reliable and tested solution.
If you’d choose option A) then you are probably reading the wrong blog post. :p
If you want to grow into a better developer, the right option is B).
The full story
TL;DR: Current implementation of Drupal\Core\Mail\MailManager::doMail() doesn’t produce valid From/Sender/Reply-to header field if ()<>[]:;@\,. characters are used in a ASCII string.
And Drupal makes this even worse because during building mail headers it html-encodes some characters and, for example, site names like mine, Foo’s bar, are encoded to Foo's Bar …but the semicolon is forbidden and triggers the bug.
I ended up spending a full weekend reading a bunch of RFCs and then other RFCs superseding the first ones.
It was tough, I can’t deny it. But was hugely rewarding. Below is the output of my research.
Originator fields
RFC 822 and RFC 2822 define originator fields (RFC-2822 section 3.6.2):
The originator fields of a message consist of the from field, the sender field (when applicable), and optionally the reply-to field. Those are “From”, “Sender” and “Reply-to” header fields.
The labels are called “field-name” (i.e. “From” is the field-name), and after the “:” (colon) we have the “field-body” which for originator fields it can be:
- from = “From:” mailbox-list CRLF
- sender = “Sender:” mailbox CRLF
- reply-to = “Reply-To:” address-list CRLF
What mailbox, mailbox-list and address-list are is defined in RFC-2822 section 3.4. You can read the whole specification, but in summary:
- mailbox = name-addr / addr-spec
- name-addr = [display-name] angle-addr
- angle-addr = “<” addr-spec “>”
- display-name = phrase
So for example “From” header field can contain comma separated instances of:
- test@example.org (addr-spec)
- <test@example.org> (angle-addr)
- Test <test@example.org> (display-name adds-spec)
In the last example, Test is the “display-name” component of our “From” field-body and can only be a phrase and it must follow its syntax rules (see below).
The phrase, word and atom components
RFC-2822 from section 3.2.4 to section 3.2.6 tells us a phrase can be made of 1 or more whitespace separated words, and a word is either an atom or a quoted-string.
An atom can contain any ASCII alpha-numeric characters as well as !#$%&’*+-/=?^_`{|}~.
So basically, it can NOT contain what the RFC defines as specials: ()<>[]:;@\,. or the “ (double-quote) character.
It’s worth mentioning too that an atom may not contain space or CTRL characters. If they exist, the parser will simply read the spaced atom as a phrase of multiple atoms. So we can say they are not allowed but can be used. However, the result may be unexpected (i.e. if using “Q” encoded-word with spaces, see below).
As an atom can not contain specials, i.e. a , (comma) – the RFC let us use those within a word if represented as quoted-string, which is a string wrapped with double-quotes:
- NOT allowed: From: My Company, LTD <me@my-company.com>
- Allowed – From: “My Company, LTD” <me@my-company.com>
The only limitation with quoted-string is “ (double-quote), \ (backslash) and CTRL characters can be used only if escaped like \”, \\ and \[CTRL] specifically.
What if our header field-body string has non-ASCII characters?
RFC-2822 allows only ASCII text in headers, but RFC 2047 (revision of the original RFC-1342) helps us by defining the rules for non-ASCII Mail headers by introducing encodings and encoded-words:
An “encoded-word” is a sequence of printable ASCII characters that begins with “=?”, ends with “?=”, and has two “?”s in between. It specifies a character set and an encoding method, and also includes the original text encoded as ASCII characters, according to the rules for that encoding method.
A mail composer that implements this specification will provide a means of inputting non-ASCII text in header fields, but will translate these fields (or appropriate portions of these fields) into encoded-words before inserting them into the message header.
And it continues:
[an “encoded-word” can be used] As a replacement for a “word” entity within a “phrase”, for example, one that precedes an address in a From, To, or Cc header.
So a phrase can now be a space/CTRL delimited list of words or encoded-words.
Let’s come back to our issue
If the display-name component of a “From” header (or similar):
- has only ASCII characters and no specials, it can be left as it is
- has only ASCII characters including any specials, it should be transformed to either a quoted-string (preferred) or a “Q” encoded-word (strongly discouraged, since specials are still not allowed!)
- has non-ASCII characters, it must be encoded either as a single encoded-word or multiple ones, whatever is easier/better. As we “B” encode (Base64 encoding) the string, we don’t have problems if the string contains specials, as they will be safely encoded
Current implementation of Drupal\Core\Mail\MailManager::doMail() covers perfectly the last scenario, but leaves us uncovered with the first two. That explains our original bug: if the display-name with ASCII characters includes specials, Gmail – and similar providers – refuses processing of the email due to a malformed header.
Conclusion
The investigation then continued with providing failing tests to prove the issue, and a patch to fix it. The final solution provided a helper rather than a fix, so that any contrib module dealing with emails – i.e. webform – would also benefit from the change.
I received some feedback and additional enquiries, which I completed. I leave to the curious reader the task to go and check, and – if you fancy – review and progress the issue.
These are my takeaways:
- I know more about the email syntax protocol
- I know more about Drupal mail handling
- I know more about swiftmailer php library
- I have a fix for my website issue
- And by extension every Drupal site can benefit from the fix
- The investigation results will be part of the web for whoever requires them – instead of re-reading all RFCs. Whether they’re a Drupalist or not.
That’s it. This is in my opinion the best way to become a better developer: getting challenged with new issues, fully exploring the problems and providing your own documented solutions.
It’s about being part of a community, rather than not. It’s about being a user and a member, rather than just a consumer.
The next blog post will be about how to take the first steps towards contribution. Because you are not alone in this journey! A lot of developers have the same aims and the same fears, so the Drupal community already has all the tools you need. In the meantime, why not check out James Dodd’s 2017 post on Drupal contribution.
Stay tuned!
——
If you live in UK, we are organising the first UK Distributed Drupal Sprint, with events happening in several locations. See the closest to you or, if you want to join with your local community, get in touch!
Leave a reply