public final class ICUTokenizer
extends org.apache.lucene.analysis.Tokenizer
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig
ICUTokenizerConfig| Constructor and Description |
|---|
ICUTokenizer(Reader input)
Construct a new ICUTokenizer that breaks text into words from the given
Reader.
|
ICUTokenizer(Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
| Modifier and Type | Method and Description |
|---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
void |
reset(Reader input) |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic ICUTokenizer(Reader input)
The default script-specific handling is used.
input - Reader containing text to tokenize.DefaultICUTokenizerConfigpublic ICUTokenizer(Reader input, ICUTokenizerConfig config)
input - Reader containing text to tokenize.config - Tailored BreakIterator configurationpublic boolean incrementToken()
throws IOException
incrementToken in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic void reset(Reader input) throws IOException
reset in class org.apache.lucene.analysis.TokenizerIOExceptionpublic void end()
throws IOException
end in class org.apache.lucene.analysis.TokenStreamIOExceptionCopyright © 2000-2015 Apache Software Foundation. All Rights Reserved.