Class CJKBigramFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class CJKBigramFilter
    extends org.apache.lucene.analysis.TokenFilter
    Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

    CJK types are set by these tokenizers, but you can also use CJKBigramFilter(TokenStream, int) to explicitly control which of the CJK scripts are turned into bigrams.

    In all cases, all non-CJK input is passed thru unmodified.

    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String DOUBLE_TYPE
      when we emit a bigram, its then marked as this type
      static int HAN
      bigram flag for Han Ideographs
      static int HANGUL
      bigram flag for Hangul
      static int HIRAGANA
      bigram flag for Hiragana
      static int KATAKANA
      bigram flag for Katakana
      static String SINGLE_TYPE
      when we emit a unigram, its then marked as this type
      • Fields inherited from class org.apache.lucene.analysis.TokenFilter

        input
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean incrementToken()  
      void reset()  
      • Methods inherited from class org.apache.lucene.analysis.TokenFilter

        close, end
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Constructor Detail

      • CJKBigramFilter

        public CJKBigramFilter​(org.apache.lucene.analysis.TokenStream in,
                               int flags)
        Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
        Parameters:
        flags - OR'ed set from HAN, HIRAGANA, KATAKANA, HANGUL
    • Method Detail

      • incrementToken

        public boolean incrementToken()
                               throws IOException
        Specified by:
        incrementToken in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • reset

        public void reset()
                   throws IOException
        Overrides:
        reset in class org.apache.lucene.analysis.TokenFilter
        Throws:
        IOException