RTF problem with ZWNJ and ZWJ followed by Space
Originally posted to newsgroup microsoft.word.public.conversions, this group
now appears to be defunct as it doesn't appear in the list of groups at
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx
When entering text in a Word (2007 / 2003) document, and entering a Unicode
Zero Width Non-Joiner / ZWNJ (0x200C) or a Zero Width Joiner / ZWJ (0x200D)
followed by a space, the space disappears when saving the document to RTF,
and the re-opening it in Word.
Using these character is sometimes necessary in Complex script languages,
such as Persian (which uses the ZWNJ to achive the terminal form of a
character in the middle of a word) and Malayalam (which uses ZWJ in typing
the Chillus prior to Unicode 5.1).
This doesn't happen every time. If the ZWNJ / ZWJ is in it's own RTF
grouping ("{\keywords text}") then the space is retained. If the formatting
of the ZWNJ / ZWJ is the same as the surrounding text, then the space
following the ZWNJ / ZWJ will be lost.
The loss of the space happens apparently because of a bug in the way the RTF
is written. RTF keywords are terminated by means of (1) another \ character
(2) a space. So then the \zwnj and \zwj keywords are written into a string
where there is no space, the RTF is written as follows:
...\'ed\'ed\zwnj\'ed\'ed...
This is fine. However, when there is a space, it is written like this:
...\'ed\'ed\zwnj \'ed\'ed...
This is not fine, because the space after \zwnj simply indicates the end of
the keyword. To indicate the need for a space, we need an additional space in
the RTF string.
I am interested in getting this bug confirmed, and then reporting this to
the Word programming team.
How do I go about that?
Paul Willies
====================================================
Hi Paul,
This sounds like one that came up once before, but to confirm that it's a
problem, what are the steps to reproduce this, using one of the MS Office
supplied multilingual fonts and what languages are enabled with the Office
Language Settings Tool. If you have two scenarios, one where it works the way
you're trying to go and one that doesn't please provide both repro steps.
Also do you have a page/blog, etc that you can provide a link to for good/bad
example documents?
Bob Buckland ?:-)
MS Office System Products MVP
====================================================
The steps to repeat are quite simple:
1. Open a blank document, select the Arabic keyboard, press Right Ctrl+Shift
to set RTL
2. Type on the keyboard 'a', run the macro below called "InsertZWNJ", type
space, type a again
3. Save as a Word document for reference
4. Save to RTF. Close, and re-open. The space remains.
5. Save to RTF again with a different name. Close and re-open. The space is
gone.
6. Examine the RTF - the first version RTF document has \zwnj and space in
their own separate RTF groups - the second version has them running together
with "\zwnj " in the RTF, and the space is lost.
To download my sample documents, go to
http://pswillies.blogspot.com/2008/05/word-2003-2007-issue-rtf-problem-with.html
date: Tue, 13 May 2008 01:04:01 -0700
author: PaulWill tspam