Ureader.com  
Microsoft software help and Community
   home   |   control panel login   |   archive   |  
 
inet
active_desktop
active_scrptng
asp.components
asp.db
asp.general
comctl32
comp.packaging
components.dev
dbweb
dhtml_editing
docobjects
html_authoring
html_objmodel
iis
iis.ftp
iis.security
iis.smtp_nntp
indexserver
misc
mshtml_hosting
scripting.jscript
scripting.vbscript
sdk_setup
shell_objmodel
urlmonikers
webbrowser_ctl
wininet
  
 
date: Wed, 18 Jun 2008 22:46:22 -0700,    group: microsoft.public.inetsdk.programming.html_objmodel        back       


Creating and querying   
I need to query some HTML; the type of thing that is often called scraping. 
I am creating a document from a fragment in the clipboard. I have everything 
working except I am trying to enumerate the elements. I originally did that 
to verify that I have the HTML in the document. The problem is that I am not 
getting the first few elements. I am using IHTMLDOMNode::get_first and 
IHTMLDOMNode::get_next to enumerate the elements.

The beginning of the input HTML is:

<TABLE cellSpacing=0 cellPadding=0 width="100%" bgColor=white border=0>
<TBODY>
<TR>
<TD vAlign=top><FONT size=1><A href="http://www.microsoft.com">Microsoft</A>

The beginning of the output is:

FONT
/TD
TD
A

I don't know where those elements are coming from but it is not the 
beginning of the document; the first element should be the table element. 
That is however only the beginning of the data; there is much more that 
follows it.

My code is as follows; is there are reason why I am not getting the siblings 
starting from the table element?:

 MSHTML::IHTMLDocument2Ptr FromDocument;
 MSHTML::IHTMLDOMNodePtr AppendedNode;
 MSHTML::IHTMLDOMNodePtr ChildNode, SavedNode;
 MSHTML::IHTMLElementPtr FromBodyElement;
 MSHTML::IHTMLDOMNodePtr FromBodyNode;
 MSHTML::IHTMLElementPtr FromElement;
 MSHTML::IHTMLDOMNodePtr FromNode;
 std::string Name;
 BSTR bsName;
 std::ostringstream oss;
 HRESULT hr;
// FromDocument has been Initialized using IPersistStreamInit:InitNew
hr = FromDocument->get_body(&FromBodyElement);
FromDocument->createElement(_bstr_t("Div"), &FromElement);
FromElement->put_innerHTML(_bstr_t(Text.c_str()));
FromBodyNode = FromBodyElement;
FromNode = FromElement;
hr = FromBodyNode->appendChild(FromNode, &AppendedNode);
hr = AppendedNode->get_firstChild(&ChildNode);
while (hr == S_OK && ChildNode != 0) {
    ChildNode->get_nodeName(&bsName);
    oss << Name << '\n';
    SavedNode = ChildNode;
    hr = SavedNode->get_nextSibling(&ChildNode);
    }
date: Wed, 18 Jun 2008 22:46:22 -0700   author:   Sam Hobbs _change_social_to_socal

Re: Creating and querying   
I don't need the Div element that I appended to. I am now appending to the 
body element and using that to do the get_firstChild. The problem I 
described in my original question still occurs.
date: Thu, 19 Jun 2008 05:52:32 -0700   author:   Sam Hobbs _change_social_to_socal

Google
 
Web ureader.com


    COPYRIGHT 2007, YARDI TECHNOLOGY LIMITED, ALL RIGHT RESERVE  |   contact us