com.gargoylesoftware.htmlunit.html
Class HTMLParser

java.lang.Object
  extended by com.gargoylesoftware.htmlunit.html.HTMLParser

public final class HTMLParser
extends java.lang.Object

SAX parser implementation that uses the NekoHTML HTMLConfiguration to parse HTML into a HtmlUnit-specific DOM (HU-DOM) tree.

Version:
$Revision: 4871 $
Author:
Christian Sell, David K. Taylor, Chris Erskine, Ahmed Ashour, Marc Guillemot, Ethan Glasser-Camp, Sudhan Moghe

Field Summary
static java.lang.String XHTML_NAMESPACE
          XHTML namespace.
 
Method Summary
static IElementFactory getFactory(java.lang.String tagName)
           
static boolean getIgnoreOutsideContent()
          Deprecated. As of 2.6 without replacement (HtmlUnit tries to mimic browser's behavior and browsers don't ignore outside content)
static HtmlPage parse(WebResponse webResponse, WebWindow webWindow)
          Deprecated. as of version 2.6, please use parseHtml(WebResponse, WebWindow) instead
static void parseFragment(DomNode parent, java.lang.String source)
          Parses the HTML content from the given string into an object tree representation.
static HtmlPage parseHtml(WebResponse webResponse, WebWindow webWindow)
          Parses the HTML content from the specified WebResponse into an object tree representation.
static XHtmlPage parseXHtml(WebResponse webResponse, WebWindow webWindow)
          Parses the XHTML content from the specified WebResponse into an object tree representation.
static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
          Deprecated. As of 2.6 without replacement (HtmlUnit tries to mimic browser's behavior and browsers don't ignore outside content)
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

XHTML_NAMESPACE

public static final java.lang.String XHTML_NAMESPACE
XHTML namespace.

See Also:
Constant Field Values
Method Detail

setIgnoreOutsideContent

@Deprecated
public static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
Deprecated. As of 2.6 without replacement (HtmlUnit tries to mimic browser's behavior and browsers don't ignore outside content)

Sets the flag to control validation of the HTML content that is outside of the BODY and HTML tags. This flag is false by default to maintain compatibility with current NekoHTML defaults.

Parameters:
ignoreOutsideContent - - boolean flag to set

getIgnoreOutsideContent

@Deprecated
public static boolean getIgnoreOutsideContent()
Deprecated. As of 2.6 without replacement (HtmlUnit tries to mimic browser's behavior and browsers don't ignore outside content)

Gets the state of the flag to ignore content outside the BODY and HTML tags.

Returns:
the current state

getFactory

public static IElementFactory getFactory(java.lang.String tagName)
Parameters:
tagName - an HTML element tag name
Returns:
a factory for creating HtmlElements representing the given tag

parseFragment

public static void parseFragment(DomNode parent,
                                 java.lang.String source)
                          throws org.xml.sax.SAXException,
                                 java.io.IOException
Parses the HTML content from the given string into an object tree representation.

Parameters:
parent - the parent for the new nodes
source - the (X)HTML to be parsed
Throws:
org.xml.sax.SAXException - if a SAX error occurs
java.io.IOException - if an IO error occurs

parse

@Deprecated
public static HtmlPage parse(WebResponse webResponse,
                                        WebWindow webWindow)
                      throws java.io.IOException
Deprecated. as of version 2.6, please use parseHtml(WebResponse, WebWindow) instead

Parses the HTML content from the given WebResponse into an object tree representation.

Parameters:
webResponse - the response data
webWindow - the web window into which the page is to be loaded
Returns:
the page object which is the root of the DOM tree
Throws:
java.io.IOException - if there is an IO error

parseHtml

public static HtmlPage parseHtml(WebResponse webResponse,
                                 WebWindow webWindow)
                          throws java.io.IOException
Parses the HTML content from the specified WebResponse into an object tree representation.

Parameters:
webResponse - the response data
webWindow - the web window into which the page is to be loaded
Returns:
the page object which is the root of the DOM tree
Throws:
java.io.IOException - if there is an IO error

parseXHtml

public static XHtmlPage parseXHtml(WebResponse webResponse,
                                   WebWindow webWindow)
                            throws java.io.IOException
Parses the XHTML content from the specified WebResponse into an object tree representation.

Parameters:
webResponse - the response data
webWindow - the web window into which the page is to be loaded
Returns:
the page object which is the root of the DOM tree
Throws:
java.io.IOException - if there is an IO error


Copyright © 2002-2011 Gargoyle Software Inc.. All Rights Reserved.