org.htmlparser.visitors

Class TextExtractingVisitor


public class TextExtractingVisitor
extends NodeVisitor

Extracts text from a web page. Usage: Parser parser = new Parser(...); TextExtractingVisitor visitor = new TextExtractingVisitor(); parser.visitAllNodesWith(visitor); String textInPage = visitor.getExtractedText();

Constructor Summary

TextExtractingVisitor()
Creates a node visitor that recurses itself and it's children.

Method Summary

String
getExtractedText()
void
visitEndTag(Tag tag)
Called for each Tag visited that is an end tag.
void
visitStringNode(Text stringNode)
Called for each StringNode visited.
void
visitTag(Tag tag)
Called for each Tag visited.

Methods inherited from class org.htmlparser.visitors.NodeVisitor

beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitEndTag, visitRemarkNode, visitStringNode, visitTag

Constructor Details

TextExtractingVisitor

public TextExtractingVisitor()
Creates a node visitor that recurses itself and it's children.

Method Details

getExtractedText

public String getExtractedText()

visitEndTag

public void visitEndTag(Tag tag)
Called for each Tag visited that is an end tag.
Overrides:
visitEndTag in interface NodeVisitor
Parameters:
tag - The end tag being visited.

visitStringNode

public void visitStringNode(Text stringNode)
Called for each StringNode visited.
Overrides:
visitStringNode in interface NodeVisitor
Parameters:

visitTag

public void visitTag(Tag tag)
Called for each Tag visited.
Overrides:
visitTag in interface NodeVisitor
Parameters:
tag - The tag being visited.

HTML Parser is an open source library released under LGPL. SourceForge.net