Package org.htmlparser.parserapplications

Example applications.

Class Summary

LinkExtractor LinkExtractor extracts all the links from the given webpage and prints them on standard output.
SiteCapturer Save a web site locally.
StringExtractor Extract plaintext strings from a web page.
WikiCapturer Save a wikiwikiweb locally.
Example applications.

Link Extractor
Extract links/mail addresses from a web page.
org.htmlparser.parserapplications.LinkExtractor
bin/linkextractor http://website_url [-maillinks]
the optional -maillinks argument causes mailto: links to be printed
String Extractor
Extract text from a web page.
org.htmlparser.parserapplications.StringExtractor
bin/stringextractor http://website_url [-links]
the optional -links argument causes hyperlinks to be shown within the text
Site Capturer
Save a web site locally.
org.htmlparser.parserapplications.SiteCapturer
bin/sitecapturer http://source_website /target_directory/ [true|false]
the optional boolean argument determines whether resources such as images,
audio and video are to be captured
Wiki Capturer
Save a wiki locally.
org.htmlparser.parserapplications.WikiCapturer Subclass of SiteCapturer (see above) that eliminates specific Wiki pages.
<!-- Put @see and @since tags down here. -->

HTML Parser is an open source library released under LGPL. SourceForge.net