|
|
|
date: Wed, 11 Jun 2008 22:44:48 -0400,
group: microsoft.public.xml
back
LINQ to XML
How would I use LINQ to XML (vb .net) to extract just the text (not the
markup) of the following:
<description><![CDATA[<p><a
href="http://www.msnbc.msn.com/id/25085248/"><img align="left" border="0"
src="http://msnbcmedia.msn.com/j/msnbc/Components/ArtAndPhoto-Fronts/TECH/080610/g-080610-tec-tencent-4x3-10a.thumb.jpg"
alt="" style="margin:0 5px 5px 0" /></a>In an interview, David Hajdu, author
of the book "The Ten-Cent Plague: The Great Comic-Book Scare and How it
Changed America," discusses the rise and fall of the rise and fall of
comics, and how their unwelcome reception compares to today's criticism of
violent video games.</p><br clear="all" />]]></description>
In other words, I'm only interested in:
In an interview, David Hajdu, author of the book "The Ten-Cent Plague: The
Great Comic-Book Scare and How it Changed America," discusses the rise and
fall of the rise and fall of comics, and how their unwelcome reception
compares to today's criticism of violent video games.
Thanks.
date: Wed, 11 Jun 2008 22:44:48 -0400
author: Scott M. am
Re: LINQ to XML
Given that the data is in a CDAT section, not a good way to store mark-up,
then it is treated as text. To turn it into XML I think you'd have to
extract the whole of it and reload it into an XElement/XDocument and then
extract the text.
--
Joe Fawcett (MVP - XML)
http://joe.fawcett.name
"Scott M." <smar@nospam.nospam> wrote in message
news:OGAElZDzIHA.6096@TK2MSFTNGP06.phx.gbl...
> How would I use LINQ to XML (vb .net) to extract just the text (not the
> markup) of the following:
>
> <description><![CDATA[<p><a
> href="http://www.msnbc.msn.com/id/25085248/"><img align="left" border="0"
> src="http://msnbcmedia.msn.com/j/msnbc/Components/ArtAndPhoto-Fronts/TECH/080610/g-080610-tec-tencent-4x3-10a.thumb.jpg"
> alt="" style="margin:0 5px 5px 0" /></a>In an interview, David Hajdu,
> author of the book "The Ten-Cent Plague: The Great Comic-Book Scare and
> How it Changed America," discusses the rise and fall of the rise and fall
> of comics, and how their unwelcome reception compares to today's criticism
> of violent video games.</p><br clear="all" />]]></description>
>
> In other words, I'm only interested in:
>
> In an interview, David Hajdu, author of the book "The Ten-Cent Plague: The
> Great Comic-Book Scare and How it Changed America," discusses the rise and
> fall of the rise and fall of comics, and how their unwelcome reception
> compares to today's criticism of violent video games.
>
> Thanks.
>
date: Thu, 12 Jun 2008 08:34:36 +0100
author: Joe Fawcett am
Re: LINQ to XML
Since when is a CDATA section not a good way to store markup?
"Joe Fawcett" <joefawcett@newsgroup.nospam> wrote in message
news:%23LehF7FzIHA.2220@TK2MSFTNGP06.phx.gbl...
> Given that the data is in a CDAT section, not a good way to store mark-up,
> then it is treated as text. To turn it into XML I think you'd have to
> extract the whole of it and reload it into an XElement/XDocument and then
> extract the text.
>
>
> --
>
> Joe Fawcett (MVP - XML)
>
> http://joe.fawcett.name
>
> "Scott M." <smar@nospam.nospam> wrote in message
> news:OGAElZDzIHA.6096@TK2MSFTNGP06.phx.gbl...
>> How would I use LINQ to XML (vb .net) to extract just the text (not the
>> markup) of the following:
>>
>> <description><![CDATA[<p><a
>> href="http://www.msnbc.msn.com/id/25085248/"><img align="left" border="0"
>> src="http://msnbcmedia.msn.com/j/msnbc/Components/ArtAndPhoto-Fronts/TECH/080610/g-080610-tec-tencent-4x3-10a.thumb.jpg"
>> alt="" style="margin:0 5px 5px 0" /></a>In an interview, David Hajdu,
>> author of the book "The Ten-Cent Plague: The Great Comic-Book Scare and
>> How it Changed America," discusses the rise and fall of the rise and
>> fall of comics, and how their unwelcome reception compares to today's
>> criticism of violent video games.</p><br clear="all" />]]></description>
>>
>> In other words, I'm only interested in:
>>
>> In an interview, David Hajdu, author of the book "The Ten-Cent Plague:
>> The Great Comic-Book Scare and How it Changed America," discusses the
>> rise and fall of the rise and fall of comics, and how their unwelcome
>> reception compares to today's criticism of violent video games.
>>
>> Thanks.
>>
>
>
date: Thu, 12 Jun 2008 10:04:08 -0400
author: Scott M. am
Re: LINQ to XML
First, I tried this:
>>> Dim desc = item.Element("description").value
>>> Dim data = XElement.Parse("<data>" + desc.Value + "</data>")
>>> Console.WriteLine(data.Value)
And that produced the contents of the CDATA section (including markup).
But, now I've got it working as desired using this:
Dim cd As New XCData(item.Element("description").Value)
'Now, use System.Xml and the XML DOM to get the text out of the
CDATA section
Dim data As New Xml.XmlDocument
data.LoadXml("<data>" + cd.Value + "</data>")
Console.WriteLine(" {0}", data.InnerText)
Thanks for your help!
"Martin Honnen" wrote in message
news:e8niBaJzIHA.5816@TK2MSFTNGP02.phx.gbl...
> Scott M. wrote:
>>> Here is how you can do it, using Joe's suggestion:
>>>
>>> XElement desc = XElement.Load(@"file.xml");
>>> XElement data = XElement.Parse("<data>" + desc.Value +
>>> "</data>");
>>> Console.WriteLine(data.Value);
>>
>> After tweaking, I got this code running, but it doesn't strip out the
>> markup, which is what I want.
>
> Can you show us your code? My sample works for me as you described,
> outputting the text without markup.
>
> --
>
> Martin Honnen --- MVP XML
> http://JavaScript.FAQTs.com/
date: Thu, 12 Jun 2008 10:26:57 -0400
author: Scott M. am
Re: LINQ to XML
Well, now that I've found the solution (and it wasn't that complex), I don't
see how this creates any kind of lasting "problem".
"Joe Fawcett" <joefawcett@newsgroup.nospam> wrote in message
news:u96ozFKzIHA.2384@TK2MSFTNGP02.phx.gbl...
> "Scott M." <smar@nospam.nospam> wrote in message
> news:OsuYwtJzIHA.2384@TK2MSFTNGP02.phx.gbl...
>> That's not true. First, this XML is coming from an MSNBC rss feed, so I
>> can't change what they are sending. But, more importantly, CDATA
>> sections are for *any* data that you want an XML parser to ignore the
>> markup of. For maximum flexibility, it makes sense to store this data in
>> a CDATA section when you don't know what the parser will be.
>>
>>
> Yes, but if you store it as CDATA then you have this sort of problem later
> when you do want it to be mark-up.
> I think it's better to store as XHTML, obviously if you are getting this
> from a third-party source that's not always possible.
>
>
> --
>
> Joe Fawcett (MVP - XML)
>
> http://joe.fawcett.name
>
>
>
date: Thu, 12 Jun 2008 12:09:30 -0400
author: Scott M. am
|
|