Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special character handling for email subject #64

Open
GevatterGaul opened this issue Jan 22, 2015 · 2 comments
Open

Special character handling for email subject #64

GevatterGaul opened this issue Jan 22, 2015 · 2 comments

Comments

@GevatterGaul
Copy link

Currently the message class assigns the subject as-is to the MailBase and then to the email.message message, not performing any character escaping, in contrast to the message body. If the subject contains special characters as ü the message cannot be converted to a string and will raise a UnicodeEncodeError.
Tested under Python 3.2, using pyramid_mailer v0.14 and the DebugMailer.

Is there any reason that there is no character escaping of the subject?

A possible solution would be to use the email.header Header class like this:

base = MailBase([
    ('To', ', '.join(self.recipients)),
    ('From', self.sender),
    ('Subject', Header(self.subject, 'utf8')),
    ])

Or using a more appropriate encoding, determined by best_charset.

Unfortunately one cannot pass a Header instance to the pyramid mailer Message class as subject, as the validate function of Message will fail.

By the way, the problem extends to To and From headers as well.

@jvanasco
Copy link
Contributor

jvanasco commented Oct 2, 2018

UTF-8 characters are allowed in email headers (See RFC 6532 https://tools.ietf.org/html/rfc6532), and should not be escaped.

This seems to work fine in the current version -- the subject is encoded as utf8 properly like so:

Subject: =?utf-8?b?xxxxxxxxxxxxxxxxxxxxxxx?=

It looks to me like we just need some tests to assert several test cases pass.

I started writing some, and have run into this issue with Python2: passing a unicode instance as the subject or body appears to work as intended. However, passing in a string instance with utf-8 data in it will generate an exception.

Perhaps a flag/kwarg to the Message constructor can change the behavior of how best_charset works? IMHO, I think if a body contains utf-8 data it should be handled as utf-8 and not do the us-ascii encoding or raise exceptions.

Can @mmerickel or @tseaver give an opinion?

@jvanasco
Copy link
Contributor

jvanasco commented Oct 3, 2018

I've begun working on this:

jvanasco@4497d07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants