Ureader.com  
Microsoft software help and Community
   home   |   control panel login   |   archive   |  
 
Word
application.errors
conversions
docmanagement
drawing.graphics
formatting.longdocs
international
internet.assistant
mail
mailmerge.fields
menustoolbars
newusers
numbering
oleinterop
pagelayout
printingfonts
setup.networking
spelling.grammar
tables
vba.addins
vba.beginners
vba.customization
vba.general
vba.userforms
web.authoring
word6-7macros
word97vba
  
 
date: Mon, 6 Oct 2008 14:02:13 -0700 (PDT),    group: microsoft.public.word.vba.general        back       


determine document language   
Hi All,

I need a code that can check a document for the language it was
written in.
If you'd have a noise list with frequently used words like in English
(the at for with him etc..)
you could try to match these with the paragraphs in the text. Then
when a certain number of matches
is reached the code could tell if the doc was in English or French or
so...
Anyone an idea or some sample code?

Regards
Marco
date: Mon, 6 Oct 2008 14:02:13 -0700 (PDT)   author:   Co

Re: determine document language   
Hi Marco,

Check out the DetectLanguage Method in Word's VBA help file. There's even a working example there of how the method can be used.

-- 
Cheers
macropod
[MVP - Microsoft Word]


"Co"  wrote in message news:4a1caf96-7eb6-4a2c-811f-fca23518468f@x41g2000hsb.googlegroups.com...
> Hi All,
> 
> I need a code that can check a document for the language it was
> written in.
> If you'd have a noise list with frequently used words like in English
> (the at for with him etc..)
> you could try to match these with the paragraphs in the text. Then
> when a certain number of matches
> is reached the code could tell if the doc was in English or French or
> so...
> Anyone an idea or some sample code?
> 
> Regards
> Marco
date: Tue, 7 Oct 2008 10:51:36 +1100   author:   macropod lid

Re: determine document language   
... and the method you want to implement in your macro (list of "stopwords") 
is pretty much what Word uses out of the box, if you allow it to 
automatically detect the language -- Only, it does it locally rather than at 
the document level.
Somewhere in a Ressource Kit, I've seen the lists of those stop words that 
Word employs, for the different languages.

Regards,
Klaus


"macropod" <macropod@invalid.invalid> wrote:
> Hi Marco,
>
> Check out the DetectLanguage Method in Word's VBA help file. There's even 
> a working example there of how the method can be used.
>
> -- 
> Cheers
> macropod
> [MVP - Microsoft Word]
>
>
> "Co"  wrote in message 
> news:4a1caf96-7eb6-4a2c-811f-fca23518468f@x41g2000hsb.googlegroups.com...
>> Hi All,
>>
>> I need a code that can check a document for the language it was
>> written in.
>> If you'd have a noise list with frequently used words like in English
>> (the at for with him etc..)
>> you could try to match these with the paragraphs in the text. Then
>> when a certain number of matches
>> is reached the code could tell if the doc was in English or French or
>> so...
>> Anyone an idea or some sample code?
>>
>> Regards
>> Marco
date: Tue, 7 Oct 2008 03:30:14 +0200   author:   Klaus Linke

Re: determine document language   
Hi Macropod,

> Check out the DetectLanguage Method in Word's VBA help file. There's even a working example there of how the 
method can be used.
>
The one problem with this (vs. using the LanguageID property of a Range or Style definition) is that it requires 
that the language be installed in the Windows Control Panel/Regional Settings AND recognized for Office. This works 
fine if you can be sure there will be a limited number of languages, but it will cause problems if Word can't find 
the language on the system.

Plus, don't forget that this actually works on a Range, and the document may contain multiple languages if what the 
user types isn't something in the dictionary, or is misspelled, or if the user pastes something (from the Internet, 
for example).

Personally, I think this method should *not* be used to determine in which language a document was written. But 
there are developers who use it in this manner.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail 
:-)
date: Tue, 07 Oct 2008 18:29:30 +0200   author:   Cindy M.

Re: determine document language   
On 7 okt, 18:29, Cindy M.  wrote:
> Hi Macropod,
>
> > Check out the DetectLanguage Method in Word's VBA help file. There's even a working example there of how the
> method can be used.
>
> The one problem with this (vs. using the LanguageID property of a Range or Style definition) is that it requires
> that the language be installed in the Windows Control Panel/Regional Settings AND recognized for Office. This works
> fine if you can be sure there will be a limited number of languages, but it will cause problems if Word can't find
> the language on the system.
>
> Plus, don't forget that this actually works on a Range, and the document may contain multiple languages if what the
> user types isn't something in the dictionary, or is misspelled, or if the user pastes something (from the Internet,
> for example).
>
> Personally, I think this method should *not* be used to determine in which language a document was written. But
> there are developers who use it in this manner.
>
> Cindy Meister
> INTER-Solutions, Switzerlandhttp://homepage.swissonline.ch/cindymeister(last update Jun 17 2005)http://www.word.mvps.org
>
> This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail
> :-)

Cindy,

appreciate your comments.
How would you solve such a problem?

Marco
date: Tue, 7 Oct 2008 10:39:43 -0700 (PDT)   author:   Co

Re: determine document language   
> Cindy,
>
> appreciate your comments.
> How would you solve such a problem?


Not Cindy, but start with ActiveDocument.Content.LanguageID?
If that's wdUndefined (mixed languages), look further to see what language 
is applied to most of the text.

In my experience, the LanguageID tends to be mostly applied properly.
If you're sure you have docs in which it isn't, you could use the method you 
proposed originally... Maybe the stopword list I mentioned would come in 
handy.

Regards,
Klaus
date: Wed, 8 Oct 2008 12:18:09 +0200   author:   Klaus Linke

Re: determine document language   
On 8 okt, 12:18, "Klaus Linke"  wrote:
> > Cindy,
>
> > appreciate your comments.
> > How would you solve such a problem?
>
> Not Cindy, but start with ActiveDocument.Content.LanguageID?
> If that's wdUndefined (mixed languages), look further to see what language
> is applied to most of the text.
>
> In my experience, the LanguageID tends to be mostly applied properly.
> If you're sure you have docs in which it isn't, you could use the method you
> proposed originally... Maybe the stopword list I mentioned would come in
> handy.
>
> Regards,
> Klaus

Klaus,

Is there a way to retrieve this Word stopword list for say English,
French, German, Dutch and Italian?

Marco
date: Wed, 8 Oct 2008 04:27:18 -0700 (PDT)   author:   Co

Re: determine document language   
Hi Klaus,

> but start with ActiveDocument.Content.LanguageID?
> If that's wdUndefined (mixed languages), look further to see what language 
> is applied to most of the text.
>
Agreed.

If we're talking 2003 or 2007, I might then pick up XML property and parse 
through that, rather than "walk" the object model. 

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or 
reply in the newsgroup and not by e-mail :-)
date: Wed, 08 Oct 2008 13:41:06 +0200   author:   Cindy M.

Re: determine document language   
My memory probably deceived me: The stopword list in the ORK is a list of 
words that aren't indexed:
http://www.microsoft.com/downloads/details.aspx?FamilyID=74B29874-4F1E-4909-8CB3-6473CEC6EE0C&displaylang=en

Still, I'm pretty sure I saw a list of the words used by language 
autodetection...

If I find it, I'll post it.

Klaus


"Co"  wrote:
> On 8 okt, 12:18, "Klaus Linke"  wrote:
>> > Cindy,
>>
>> > appreciate your comments.
>> > How would you solve such a problem?
>>
>> Not Cindy, but start with ActiveDocument.Content.LanguageID?
>> If that's wdUndefined (mixed languages), look further to see what 
>> language
>> is applied to most of the text.
>>
>> In my experience, the LanguageID tends to be mostly applied properly.
>> If you're sure you have docs in which it isn't, you could use the method 
>> you
>> proposed originally... Maybe the stopword list I mentioned would come in
>> handy.
>>
>> Regards,
>> Klaus
>
> Klaus,
>
> Is there a way to retrieve this Word stopword list for say English,
> French, German, Dutch and Italian?
>
> Marco
date: Wed, 8 Oct 2008 14:56:42 +0200   author:   Klaus Linke

Re: determine document language   
"Cindy M."  wrote:
> Hi Klaus,
>
>> but start with ActiveDocument.Content.LanguageID?
>> If that's wdUndefined (mixed languages), look further to see what 
>> language
>> is applied to most of the text.
>>
> Agreed.
>
> If we're talking 2003 or 2007, I might then pick up XML property and parse
> through that, rather than "walk" the object model.

True... Though if you have a list of languages you're interested in, getting 
the number of characters formatted with a certain language should be pretty 
quick using Find with

  Selection.Find.LanguageID = wdEnglishUS

and

While .Find.Execute
    nLang = nLang + Selection.End - Selection.Start
    Selection.Collapse(wdCollapseEnd)
Wend

Regards,
Klaus
date: Wed, 8 Oct 2008 15:04:09 +0200   author:   Klaus Linke

Re: determine document language   
On 8 okt, 13:41, Cindy M.  wrote:
> Hi Klaus,
>
> > but start with ActiveDocument.Content.LanguageID?
> > If that's wdUndefined (mixed languages), look further to see what language
> > is applied to most of the text.
>
> Agreed.
>
> If we're talking 2003 or 2007, I might then pick up XML property and parse
> through that, rather than "walk" the object model.
>
> Cindy Meister
> INTER-Solutions, Switzerlandhttp://homepage.swissonline.ch/cindymeister(last update Jun 17 2005)http://www.word.mvps.org
>
> This reply is posted in the Newsgroup; please post any follow question or
> reply in the newsgroup and not by e-mail :-)

Cindy,

What exactly do you mean with that:
" I might then pick up XML property and parse
through that, rather than "walk" the object model"

Could you give me an example here?

MArco
date: Wed, 8 Oct 2008 09:10:05 -0700 (PDT)   author:   Co

Google
 
Web ureader.com


    COPYRIGHT 2007, YARDI TECHNOLOGY LIMITED, ALL RIGHT RESERVE  |   contact us