Discussion:
[mh-e:bugs] #486 Inconsistent behavior in replying to emails with UTF-8 (rfc2047) From fields
Jeff Morgenthaler
2017-01-13 16:48:37 UTC
Permalink
---

** [bugs:#486] Inconsistent behavior in replying to emails with UTF-8 (rfc2047) From fields**

**Status:** unread
**Milestone:** Unassigned
**Labels:** UTF-8 rfc2047
**Created:** Fri Jan 13, 2017 04:48 PM UTC by Jeff Morgenthaler
**Last Updated:** Fri Jan 13, 2017 04:48 PM UTC
**Owner:** nobody
**Attachments:**

- [Initial.txt](https://sourceforge.net/p/mh-e/bugs/486/attachment/Initial.txt) (395 Bytes; text/plain)


Hi All,

Related to an mh-e-users post of the same subject last night, I attach an email which shows the inconsistent behavior in the "Replied:" fields. The Sender's name is encoded in rfc2047 and, at least for generic Debian 8.6 install, FSF emacs 24.4.1, is never decoded in the draft buffer (it displays fine in the folder list). Failure to display the Sender's name nicely in the draft buffer might itself be considered a bug, but I am more concerned about sending people their names back in garbled strings.

Reply 1: Plane ascii in the message body and never goes into any special encoding. The rfc2047 string passes through to nmh send and is displayed properly on the other end.

Reply 2: use mh-yank-cur-msg. That brought in some encoded text (the "'From" name), which was correctly displayed by emacs and caused the buffer to be saved in an encoded state. However, at that point, the "From" field was still just a garble-string in rfc2047 in the file (it was not translated into plane UTF-8 encoding). When the encoded file was sent to mh-mml-to-mime, that garble-string was "protected" by the =?us-ascii construct and ends up as a rfc2047 string on the other end (or in my Sent_Mail folder) that does not unwrap to the original UTF-8 meaning.

Suggested solutions (which may cause more trouble with other things, which is why I am asking about this):

Solution 1: Call rfc2047-decode-region on the headers of the email as the draft buffer is being first created (like it apparently is for show). Then the names and other strings (e,g. in Subject) will display properly and mh-mml-to-mime will do the right thing when the message is sent. This would probably have the side-effect that all messages with rfc2047 headers would end up with encoded bodies too, even if they are just ascii. But that could be checked and prevented if someone cares. Note that scan of +drafts does the right thing in this case for the name, but show shows the raw UTF strings in the show buffer.

Solution 2: leave the stuff in the headers in raw rfc2047 and convince mh-mml-to-mime to not mess with them on the way out. Note that I tried tweaking rfc2047-header-encoding-alist to do this, but either I don't understand how to do that right, or it isn't connected to what needs to (not) happen here.

Solution 3 (which might already work in some non-FSF emacsen?): teach emacs to recognize rfc2047 encoding and display it correctly, while leaving it in "raw" format in the underlying file. This would probably require tweaking mml-to-mime simlar to Solution 2. I also have no clue if one buffer can support multiple encodings.

Note, people probably haven't seen this because generally the person's email program on the other end displays their name as they have set it up, not as you have sent it back to them. However, the current state of affairs does leave the "To:" field unintelligable in my Sent_Mail folder, so it would be nice to fix it somehow.

Thanks for any synapses devoted to the issue.

jpm




---

Sent from sourceforge.net because mh-e-***@lists.sourceforge.net is subscribed to https://sourceforge.net/p/mh-e/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/mh-e/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Jeff Morgenthaler
2017-01-13 19:09:05 UTC
Permalink
One line solution (plus comment) implements Solution 1: replace lines 1033 & 1034 in mh-comp.el with:

;; Cleanup possibly RFC2047 encoded header fields
(mh-decode-message-header)



---

** [bugs:#486] Inconsistent behavior in replying to emails with UTF-8 (rfc2047) From fields**

**Status:** unread
**Milestone:** Unassigned
**Labels:** UTF-8 rfc2047
**Created:** Fri Jan 13, 2017 04:48 PM UTC by Jeff Morgenthaler
**Last Updated:** Fri Jan 13, 2017 04:48 PM UTC
**Owner:** nobody
**Attachments:**

- [Initial.txt](https://sourceforge.net/p/mh-e/bugs/486/attachment/Initial.txt) (395 Bytes; text/plain)


Hi All,

Related to an mh-e-users post of the same subject last night, I attach an email which shows the inconsistent behavior in the "Replied:" fields. The Sender's name is encoded in rfc2047 and, at least for generic Debian 8.6 install, FSF emacs 24.4.1, is never decoded in the draft buffer (it displays fine in the folder list). Failure to display the Sender's name nicely in the draft buffer might itself be considered a bug, but I am more concerned about sending people their names back in garbled strings.

Reply 1: Plane ascii in the message body and never goes into any special encoding. The rfc2047 string passes through to nmh send and is displayed properly on the other end.

Reply 2: use mh-yank-cur-msg. That brought in some encoded text (the "'From" name), which was correctly displayed by emacs and caused the buffer to be saved in an encoded state. However, at that point, the "From" field was still just a garble-string in rfc2047 in the file (it was not translated into plane UTF-8 encoding). When the encoded file was sent to mh-mml-to-mime, that garble-string was "protected" by the =?us-ascii construct and ends up as a rfc2047 string on the other end (or in my Sent_Mail folder) that does not unwrap to the original UTF-8 meaning.

Suggested solutions (which may cause more trouble with other things, which is why I am asking about this):

Solution 1: Call rfc2047-decode-region on the headers of the email as the draft buffer is being first created (like it apparently is for show). Then the names and other strings (e,g. in Subject) will display properly and mh-mml-to-mime will do the right thing when the message is sent. This would probably have the side-effect that all messages with rfc2047 headers would end up with encoded bodies too, even if they are just ascii. But that could be checked and prevented if someone cares. Note that scan of +drafts does the right thing in this case for the name, but show shows the raw UTF strings in the show buffer.

Solution 2: leave the stuff in the headers in raw rfc2047 and convince mh-mml-to-mime to not mess with them on the way out. Note that I tried tweaking rfc2047-header-encoding-alist to do this, but either I don't understand how to do that right, or it isn't connected to what needs to (not) happen here.

Solution 3 (which might already work in some non-FSF emacsen?): teach emacs to recognize rfc2047 encoding and display it correctly, while leaving it in "raw" format in the underlying file. This would probably require tweaking mml-to-mime simlar to Solution 2. I also have no clue if one buffer can support multiple encodings.

Note, people probably haven't seen this because generally the person's email program on the other end displays their name as they have set it up, not as you have sent it back to them. However, the current state of affairs does leave the "To:" field unintelligable in my Sent_Mail folder, so it would be nice to fix it somehow.

Thanks for any synapses devoted to the issue.

jpm




---

Sent from sourceforge.net because mh-e-***@lists.sourceforge.net is subscribed to https://sourceforge.net/p/mh-e/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/mh-e/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Mike Kupfer
2017-01-27 01:53:19 UTC
Permalink
- **status**: unread --> open
- **Comment**:

I have seen this issue, too, and Jeff's proposed fix makes the problem go away for me.

The concern about accidentally decoding text that should be left alone is legitimate, I think. But I think that applying this change (replace the call to #'mh-decode-message-subject with a call to #'mh-decode-message-header) would be an improvement over the current state of affairs.



---

** [bugs:#486] Inconsistent behavior in replying to emails with UTF-8 (rfc2047) From fields**

**Status:** open
**Milestone:** Unassigned
**Labels:** UTF-8 rfc2047
**Created:** Fri Jan 13, 2017 04:48 PM UTC by Jeff Morgenthaler
**Last Updated:** Fri Jan 13, 2017 07:09 PM UTC
**Owner:** nobody
**Attachments:**

- [Initial.txt](https://sourceforge.net/p/mh-e/bugs/486/attachment/Initial.txt) (395 Bytes; text/plain)


Hi All,

Related to an mh-e-users post of the same subject last night, I attach an email which shows the inconsistent behavior in the "Replied:" fields. The Sender's name is encoded in rfc2047 and, at least for generic Debian 8.6 install, FSF emacs 24.4.1, is never decoded in the draft buffer (it displays fine in the folder list). Failure to display the Sender's name nicely in the draft buffer might itself be considered a bug, but I am more concerned about sending people their names back in garbled strings.

Reply 1: Plane ascii in the message body and never goes into any special encoding. The rfc2047 string passes through to nmh send and is displayed properly on the other end.

Reply 2: use mh-yank-cur-msg. That brought in some encoded text (the "'From" name), which was correctly displayed by emacs and caused the buffer to be saved in an encoded state. However, at that point, the "From" field was still just a garble-string in rfc2047 in the file (it was not translated into plane UTF-8 encoding). When the encoded file was sent to mh-mml-to-mime, that garble-string was "protected" by the =?us-ascii construct and ends up as a rfc2047 string on the other end (or in my Sent_Mail folder) that does not unwrap to the original UTF-8 meaning.

Suggested solutions (which may cause more trouble with other things, which is why I am asking about this):

Solution 1: Call rfc2047-decode-region on the headers of the email as the draft buffer is being first created (like it apparently is for show). Then the names and other strings (e,g. in Subject) will display properly and mh-mml-to-mime will do the right thing when the message is sent. This would probably have the side-effect that all messages with rfc2047 headers would end up with encoded bodies too, even if they are just ascii. But that could be checked and prevented if someone cares. Note that scan of +drafts does the right thing in this case for the name, but show shows the raw UTF strings in the show buffer.

Solution 2: leave the stuff in the headers in raw rfc2047 and convince mh-mml-to-mime to not mess with them on the way out. Note that I tried tweaking rfc2047-header-encoding-alist to do this, but either I don't understand how to do that right, or it isn't connected to what needs to (not) happen here.

Solution 3 (which might already work in some non-FSF emacsen?): teach emacs to recognize rfc2047 encoding and display it correctly, while leaving it in "raw" format in the underlying file. This would probably require tweaking mml-to-mime simlar to Solution 2. I also have no clue if one buffer can support multiple encodings.

Note, people probably haven't seen this because generally the person's email program on the other end displays their name as they have set it up, not as you have sent it back to them. However, the current state of affairs does leave the "To:" field unintelligable in my Sent_Mail folder, so it would be nice to fix it somehow.

Thanks for any synapses devoted to the issue.

jpm




---

Sent from sourceforge.net because mh-e-***@lists.sourceforge.net is subscribed to https://sourceforge.net/p/mh-e/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/mh-e/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Loading...