A node factory based on the prototype pattern.
This factory uses the prototype pattern to generate new nodes.
These are cloned as needed to form new
Text
,
Remark
and
Tag
nodes.
Text and remark nodes are generated from prototypes accessed
via the
textPrototype
and
remarkPrototype
properties respectively.
Tag nodes are generated as follows:
Prototype tags, in the form of undifferentiated tags, are held in a hash
table. On a request for a tag, the attributes are examined for the name
of the tag to be created. If a prototype of that name has been registered
(exists in the hash table), it is cloned and the clone is given the
characteristics (
Attributes
, start and end position)
of the requested tag.
In the case that no tag has been registered under that name,
a generic tag is created from the prototype acessed via the
tagPrototype
property.
The hash table of registered tags can be automatically populated with
all the known tags from the
org.htmlparser.tags
package when
the factory is constructed, or it can start out empty and be populated
explicitly.
Here is an example of how to override all text issued from
Text.toPlainTextString()
,
in this case decoding (converting character references),
which illustrates the use of setting the text prototype:
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.setTextPrototype (
// create a inner class that is a subclass of TextNode
new TextNode () {
public String toPlainTextString()
{
String original = super.toPlainTextString ();
return (org.htmlparser.util.Translate.decode (original));
}
});
Parser parser = new Parser ();
parser.setNodeFactory (factory);
Here is an example of using a custom link tag, in this case just
printing the URL, which illustrates registering a tag:
class PrintingLinkTag extends LinkTag
{
public void doSemanticAction ()
throws
ParserException
{
System.out.println (getLink ());
}
}
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.registerTag (new PrintingLinkTag ());
Parser parser = new Parser ();
parser.setNodeFactory (factory);
clear
public void clear()
Clean out the registry.
createRemarkNode
public Remark createRemarkNode(Page page,
int start,
int end)
Create a new remark node.
- createRemarkNode in interface NodeFactory
page
- The page the node is on.start
- The beginning position of the remark.end
- The ending positiong of the remark.
- A remark node comprising the indicated characters from the page.
createStringNode
public Text createStringNode(Page page,
int start,
int end)
Create a new string node.
- createStringNode in interface NodeFactory
page
- The page the node is on.start
- The beginning position of the string.end
- The ending position of the string.
- A text node comprising the indicated characters from the page.
createTagNode
public Tag createTagNode(Page page,
int start,
int end,
Vector attributes)
Create a new tag node.
Note that the attributes vector contains at least one element,
which is the tag name (standalone attribute) at position zero.
This can be used to decide which type of node to create, or
gate other processing that may be appropriate.
- createTagNode in interface NodeFactory
page
- The page the node is on.start
- The beginning position of the tag.end
- The ending positiong of the tag.attributes
- The attributes contained in this tag.
- A tag node comprising the indicated characters from the page.
get
public Tag get(String id)
Gets a tag from the registry.
id
- The name of the tag to return.
- The tag registered under the
id
name,
or null
if none.
getRemarkPrototype
public Remark getRemarkPrototype()
Get the object that is cloned to generate remark nodes.
- The prototype for
Remark
nodes.
getTagNames
public Set getTagNames()
Get the list of tag names.
- The names of the tags currently registered.
getTagPrototype
public Tag getTagPrototype()
Get the object that is cloned to generate tag nodes.
Clones of this object are returned from
createTagNode(Page,int,int,Vector)
when no
specific tag is found in the list of registered tags.
- The prototype for
Tag
nodes.
getTextPrototype
public Text getTextPrototype()
Get the object that is cloned to generate text nodes.
- The prototype for
Text
nodes.
put
public Tag put(String id,
Tag tag)
Adds a tag to the registry.
id
- The name under which to register the tag.
For proper operation, the id should be uppercase so it
will be matched by a Map lookup.tag
- The tag to be returned from a createTagNode(Page,int,int,Vector)
call.
- The tag previously registered with that id if any,
or
null
if none.
registerTag
public void registerTag(Tag tag)
Register a tag.
Registers the given tag under every
id
that the
tag has (i.e. all names returned by
tag.getIds()
.
For proper operation, the ids are converted to uppercase so
they will be matched by a Map lookup.
tag
- The tag to register.
registerTags
public PrototypicalNodeFactory registerTags()
Register all known tags in the tag package.
Registers tags from the
tag package
by
calling
registerTag()
.
- 'this' nodefactory as a convenience.
remove
public Tag remove(String id)
Remove a tag from the registry.
id
- The name of the tag to remove.
- The tag that was registered with that
id
,
or null
if none.
setRemarkPrototype
public void setRemarkPrototype(Remark remark)
Set the object to be used to generate remark nodes.
remark
- The prototype for Remark
nodes.
If null
the prototype is set to the default
(RemarkNode
).
setTagPrototype
public void setTagPrototype(Tag tag)
Set the object to be used to generate tag nodes.
Clones of this object are returned from
createTagNode(Page,int,int,Vector)
when no
specific tag is found in the list of registered tags.
tag
- The prototype for Tag
nodes.
If null
the prototype is set to the default
(TagNode
).
setTextPrototype
public void setTextPrototype(Text text)
Set the object to be used to generate text nodes.
text
- The prototype for Text
nodes.
If null
the prototype is set to the default
(TextNode
).
unregisterTag
public void unregisterTag(Tag tag)
Unregister a tag.
Unregisters the given tag from every
id
the tag has.
The ids are converted to uppercase to undo the operation
of registerTag.
tag
- The tag to unregister.