Numbered SGML entities in header addresses

felixs besteck455 at gmail.com
Tue Apr 9 11:11:59 UTC 2019


On Mon, Apr 08, 2019 at 08:40:09PM -0700, Ian Zimmerman wrote:
> On 2019-04-07 23:13, felixs wrote:
> 
> > > From: "Foo Bariì" <foo-baric at gmail.com>
> > > 
> > > where the entity refers to the character U0107 in Unicode code point
> > > space.  I would like to automatically see the correct glyph at least
> > > when it is in one of the visible headers.  Is there a display filtering
> > > feature in mutt that would allow me to do that (I don't mind if it
> > > requires a bit of configuration)?
> > 
> > And if you add
> > 
> > set charset="utf-8"
> > 
> > to your muttrc conf file?
> 
> That doesn't look at all plausible to me.  For one thing, UTF-8 is the
> systemwide default, meaning it ends up in my LANG and LC_ variables.  I
> am as sure as I can be about anything that mutt picks up those if the
> "charset" mutt variable is not set.

Yes, you are right, mutt reads the LC_* variables and is usually able to
represent characters in utf-8 if that is set by them. But in case of
problems, as I thought you might have, it may be a help to explicitly set it. 
> For another thing, why should it help?  Those ASCII characters are
> perfectly valid in the name part of a From header, and normally I expect
> mutt to show them to me as they are.  It is only in this case where some
> HTML-addled MUA decided to use them together to encode a ISO 8859-2
> character (_not_ UTF-8 or anything related) that I want a way to see the
> character really intended by the sender.

You have asked for a "display filter" setting in mutt to be able to see
the "real" character, which is a character that is part of the Unicode
Database. Even if the message you supposedly received was encoded in
ISO-8859-2, mutt, when opening the message, would convert it into Unicode 
(usually, utf-8) if your LC_variables
are correctly set to use it. Or, see above, set them explicitly to be sure.
Please take note that I did not reproduce your issue. So I actually do
not know why this happens in your case. Do you have some more information?

To know, by other means, what the intended character was, in *Python* you
might use the chr() function. Given the fact that chr() works with
integers, you first have to convert the hexadecimal into an integer.

chr(int('0x0107', base=16))

Maybe I can find some other way using just mutt's conf options.
Patience, please. :-)
> Nevertheless, your suggestion should do no harm, so I'll try it and
> report back.
Ok.

Cheers,

felixs


More information about the Mutt-users mailing list