Package org.apache.lucene.analysis.cn
Class ChineseTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.cn.ChineseTokenizer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
@Deprecated public final class ChineseTokenizer extends org.apache.lucene.analysis.Tokenizer
Deprecated.UseStandardTokenizer
instead, which has the same functionality. This filter will be removed in Lucene 5.0Tokenize Chinese text as individual chinese characters.The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.
For example, if the Chinese text "C1C2C3C4" is to be indexed:
- The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
- The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index created by CJKTokenizer is much larger.
The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.
- Version:
- 1.0
-
-
Constructor Summary
Constructors Constructor Description ChineseTokenizer(Reader in)
Deprecated.ChineseTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
Deprecated.ChineseTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)
Deprecated.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
end()
Deprecated.boolean
incrementToken()
Deprecated.void
reset()
Deprecated.void
reset(Reader input)
Deprecated.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Constructor Detail
-
ChineseTokenizer
public ChineseTokenizer(Reader in)
Deprecated.
-
ChineseTokenizer
public ChineseTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)
Deprecated.
-
ChineseTokenizer
public ChineseTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
Deprecated.
-
-
Method Detail
-
incrementToken
public boolean incrementToken() throws IOException
Deprecated.- Specified by:
incrementToken
in classorg.apache.lucene.analysis.TokenStream
- Throws:
IOException
-
end
public final void end()
Deprecated.- Overrides:
end
in classorg.apache.lucene.analysis.TokenStream
-
reset
public void reset() throws IOException
Deprecated.- Overrides:
reset
in classorg.apache.lucene.analysis.TokenStream
- Throws:
IOException
-
reset
public void reset(Reader input) throws IOException
Deprecated.- Overrides:
reset
in classorg.apache.lucene.analysis.Tokenizer
- Throws:
IOException
-
-