Class CzechAnalyzer

    • Constructor Detail

      • CzechAnalyzer

        public CzechAnalyzer​(org.apache.lucene.util.Version matchVersion)
        Builds an analyzer with the default stop words (getDefaultStopSet()).
        Parameters:
        matchVersion - Lucene version to match See {@link above}
      • CzechAnalyzer

        public CzechAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             Set<?> stopwords)
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a stopword set
      • CzechAnalyzer

        public CzechAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             Set<?> stopwords,
                             Set<?> stemExclusionTable)
        Builds an analyzer with the given stop words and a set of work to be excluded from the CzechStemFilter.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a stopword set
        stemExclusionTable - a stemming exclusion set
      • CzechAnalyzer

        @Deprecated
        public CzechAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             String... stopwords)
        Deprecated.
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a stopword set
      • CzechAnalyzer

        @Deprecated
        public CzechAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             HashSet<?> stopwords)
        Deprecated.
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a stopword set
      • CzechAnalyzer

        @Deprecated
        public CzechAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             File stopwords)
                      throws IOException
        Deprecated.
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - a file containing stopwords
        Throws:
        IOException
    • Method Detail

      • getDefaultStopSet

        public static final Set<?> getDefaultStopSet()
        Returns a set of default Czech-stopwords
        Returns:
        a set of default Czech-stopwords
      • loadStopWords

        @Deprecated
        public void loadStopWords​(InputStream wordfile,
                                  String encoding)
        Deprecated.
        use WordlistLoader.getWordSet(Reader, String, Version) and CzechAnalyzer(Version, Set) instead
        Loads stopwords hash from resource stream (file, database...).
        Parameters:
        wordfile - File containing the wordlist
        encoding - Encoding used (win-1250, iso-8859-2, ...), null for default system encoding
      • createComponents

        protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents​(String fieldName,
                                                                                                         Reader reader)
        Creates ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.
        Specified by:
        createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase
        Returns:
        ReusableAnalyzerBase.TokenStreamComponents built from a StandardTokenizer filtered with StandardFilter, LowerCaseFilter, StopFilter , and CzechStemFilter (only if version is >= LUCENE_31). If a version is >= LUCENE_31 and a stem exclusion set is provided via CzechAnalyzer(Version, Set, Set) a KeywordMarkerFilter is added before CzechStemFilter.