|
|
|
date: Wed, 11 Jun 2008 00:53:00 -0700,
group: microsoft.public.inetserver.iis.smtp_nntp
back
CDOSYS and Unicode Friendly Names
We are MSDN Universal Subscribers (munter@microsoft.com)
Thought I had a good handle on encoding Unicode emails using different
charsets (UTF-8, Big5, etc) and different encodings (quoted-printable,
base64). I can for instance create emails with Unicode characters in the text
or HTML body, as well as subject lines with Unicode, etc.
However, putting Unicode into the "friendly name" of the From property
(displayed sender) seems to be unsupported. This sample code is C# accessing
CDOSYS via COM interop.
CDO.Message msg = new CDO.MessageClass();
msg.From = "Testing æäº "
msg.Subject = "Something æäº"
msg.To = "jane.fonda@microsoft.com"
msg.BodyPart.Charset = "UTF-8"; etc
The subject appears to be correctly encoded when I inspect the resulting
email:
Subject: =?UTF-8?Q?Something_=E6=9F=90=E4=BA=8B?=
However the From: field just contains ?? characters, ie.
From: "Testing ??"
If I manually edit the email and replace this with
From: "=?UTF-8?Q?Testing_=E6=9F=90=E4=BA=8B?="
it turns up in my Inbox correctly.
Is there some reason why CDO doesn't encode the From header field
automatically? Do I have to do this sort of Q or B encoding myself or is
there some way to force CDO to do this?
Some references:
RFC 1522
As a replacement for a "word" entity within a "phrase", for example, one
that precedes an address in a From, To, or Cc header. The ABNF definition
for phrase from RFC 822 thus becomes:
phrase = 1*(encoded-word / word)
In this case the set of characters that may be used in a "Q"- encoded
encoded-word is restricted to: <upper and lower case ASCII letters, decimal
digits, "!", "*", "+", "-", "/", "=", and "_" (underscore, ASCII 95.)>. An
encoded- word that appears within a "phrase" MUST be separated from any
adjacent "word",
"text" or "special" by linear-white-space.
Not sure what this means.
Setting Message Header Fields from
http://msdn.microsoft.com/en-us/library/ms988660(EXCHG.65).aspx
The most common message header fields are exposed as properties on the
IMessage interface. However, all header fields are accessible in the
IMessage.Fields collection. The header fields you can set by using this
collection reside in the urn:schemas:mailheader: and urn:schemas:httpmail:
namespaces.
Note Make sure to encode non-US-ASCII characters when using the
urn:schemas:mailheader: namespace using the mechanism defined in Request for
Comments (RFC) 1522.
date: Wed, 11 Jun 2008 00:53:00 -0700
author: Muntz am
Re: CDOSYS and Unicode Friendly Names
> However, putting Unicode into the "friendly name" of the From property
> (displayed sender) seems to be unsupported.
Well, if you are quoting the 'display-name', it becomes a
'quoted-string'. You can't put an 'encoded-word' in a 'quoted-string'
[RFC 2047 following 5(3)].
So that's *one* thing you know for sure is non-RFC.
However, in theory you no longer need to escape spaces by wrapping the
display-name in DQ (because you can encode the spaces instead). So
that still leaves the question of whether an UNquoted display-name can
be an encoded-word. I happen to believe this is valid, although not
encouraged. However, its support by well-known mail readers is spotty.
Be careful of the "Be liberal in what you accept..." concept when
testing MUAs' ability to read your encoded headers. Some MUAs will
render blatantly non-RFC encoding, which is confusing. You should test
multiple vendors to decide whether certain encoding is "RFC but only
occasionally supported," "non-RFC but industry-standard" or "non-RFC
and occasionally supported."
I would also advise that you aim for B type for widest compatibility.
In this very specific case of using CDOSYS to construct the messages,
it does not surprise me that you would have additional problems during
message creation, since CDOSYS has always been shady about when you
are setting headers and when you are setting transport information.
For example, you are using the From property, which is more like a
macro that sets both header (RFC 822) and transport (RFC 821)
variables simultaneously. That could lead to data being cleansed of
encoding as it is "rounded down" to 821 compatibility. Try using
mailheader:from and mailheader:sender separately, with only the
addr-spec in the sender.
--Sandy
------------------------------------
Sanford Whiteman, Chief Technologist
Broadleaf Systems, a division of
Cypress Integrated Systems, Inc.
------------------------------------
date: Wed, 11 Jun 2008 22:10:45 -0400
author: Sanford Whiteman
Re: CDOSYS and Unicode Friendly Names
Thanks for your thoughts Sanford.
Yes, I was certainly suspicious that using Outlook as my
I'm starting to think about using the System.Net.Mail classes in .NET 2.0 as
a replacement for my current interop code. They expose a range of encoding
options for a range of MailMessage properties:
MailMessage.From is a MailAddress, which supports the address and display
name. The constructor for this also allows you to specify an encoding
(actually the charset).
MailAddress from = new MailAddress ("jane.doe@hotmail.com","Jane
Doe",System.Text.Encoding.UTF8)
I haven't had a chance to play with this yet. I would be interested to know
whether this uses B or Q encoding, and again, how RFC compliant this
implementation is.
The possible snag though is that I am accessing some of the lower level CDO
fields to manipulate attached files and images and I wouldn't want to lose
this capability.
For instance:
foreach(IBodyPart bp in msg.Attachments)
{
string name =
bp.Fields["urn:schemas:httpmail:attachmentfilename"].Value.ToString();
//etc
string disp =
bp.Fields["urn:schemas:mailheader:content-disposition"].Value.ToString();
//etc
Might be stuck between a rock and a hard place...
But I'll also try your separate treatment of sender and from as well.
"Sanford Whiteman" wrote:
> > However, putting Unicode into the "friendly name" of the From property
> > (displayed sender) seems to be unsupported.
>
> Well, if you are quoting the 'display-name', it becomes a
> 'quoted-string'. You can't put an 'encoded-word' in a 'quoted-string'
> [RFC 2047 following 5(3)].
>
> So that's *one* thing you know for sure is non-RFC.
>
> However, in theory you no longer need to escape spaces by wrapping the
> display-name in DQ (because you can encode the spaces instead). So
> that still leaves the question of whether an UNquoted display-name can
> be an encoded-word. I happen to believe this is valid, although not
> encouraged. However, its support by well-known mail readers is spotty.
> Be careful of the "Be liberal in what you accept..." concept when
> testing MUAs' ability to read your encoded headers. Some MUAs will
> render blatantly non-RFC encoding, which is confusing. You should test
> multiple vendors to decide whether certain encoding is "RFC but only
> occasionally supported," "non-RFC but industry-standard" or "non-RFC
> and occasionally supported."
>
> I would also advise that you aim for B type for widest compatibility.
>
> In this very specific case of using CDOSYS to construct the messages,
> it does not surprise me that you would have additional problems during
> message creation, since CDOSYS has always been shady about when you
> are setting headers and when you are setting transport information.
> For example, you are using the From property, which is more like a
> macro that sets both header (RFC 822) and transport (RFC 821)
> variables simultaneously. That could lead to data being cleansed of
> encoding as it is "rounded down" to 821 compatibility. Try using
> mailheader:from and mailheader:sender separately, with only the
> addr-spec in the sender.
>
> --Sandy
>
>
>
> ------------------------------------
> Sanford Whiteman, Chief Technologist
> Broadleaf Systems, a division of
> Cypress Integrated Systems, Inc.
> ------------------------------------
>
date: Thu, 12 Jun 2008 00:33:01 -0700
author: Muntz am
|
|