Package org.apache.lucene.analysis
Class StopAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.ReusableAnalyzerBase
-
- org.apache.lucene.analysis.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.StopAnalyzer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class StopAnalyzer extends StopwordAnalyzerBase
FiltersLetterTokenizer
withLowerCaseFilter
andStopFilter
.You must specify the required
Version
compatibility when creating StopAnalyzer:- As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords
- As of 2.9, position increments are preserved
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
ReusableAnalyzerBase.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static Set<?>
ENGLISH_STOP_WORDS_SET
An unmodifiable set containing some common English words that are not usually useful for searching.-
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
-
-
Constructor Summary
Constructors Constructor Description StopAnalyzer(Version matchVersion)
Builds an analyzer which removes words inENGLISH_STOP_WORDS_SET
.StopAnalyzer(Version matchVersion, File stopwordsFile)
Builds an analyzer with the stop words from the given file.StopAnalyzer(Version matchVersion, Reader stopwords)
Builds an analyzer with the stop words from the given reader.StopAnalyzer(Version matchVersion, Set<?> stopWords)
Builds an analyzer with the stop words from the given set.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected ReusableAnalyzerBase.TokenStreamComponents
createComponents(String fieldName, Reader reader)
CreatesReusableAnalyzerBase.TokenStreamComponents
used to tokenize all the text in the providedReader
.-
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
-
-
-
-
Field Detail
-
ENGLISH_STOP_WORDS_SET
public static final Set<?> ENGLISH_STOP_WORDS_SET
An unmodifiable set containing some common English words that are not usually useful for searching.
-
-
Constructor Detail
-
StopAnalyzer
public StopAnalyzer(Version matchVersion)
Builds an analyzer which removes words inENGLISH_STOP_WORDS_SET
.- Parameters:
matchVersion
- See above
-
StopAnalyzer
public StopAnalyzer(Version matchVersion, Set<?> stopWords)
Builds an analyzer with the stop words from the given set.- Parameters:
matchVersion
- See abovestopWords
- Set of stop words
-
StopAnalyzer
public StopAnalyzer(Version matchVersion, File stopwordsFile) throws IOException
Builds an analyzer with the stop words from the given file.- Parameters:
matchVersion
- See abovestopwordsFile
- File to load stop words from- Throws:
IOException
- See Also:
WordlistLoader.getWordSet(Reader, Version)
-
StopAnalyzer
public StopAnalyzer(Version matchVersion, Reader stopwords) throws IOException
Builds an analyzer with the stop words from the given reader.- Parameters:
matchVersion
- See abovestopwords
- Reader to load stop words from- Throws:
IOException
- See Also:
WordlistLoader.getWordSet(Reader, Version)
-
-
Method Detail
-
createComponents
protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
CreatesReusableAnalyzerBase.TokenStreamComponents
used to tokenize all the text in the providedReader
.- Specified by:
createComponents
in classReusableAnalyzerBase
- Parameters:
fieldName
- the name of the fields content passed to theReusableAnalyzerBase.TokenStreamComponents
sink as a readerreader
- the reader passed to theTokenizer
constructor- Returns:
ReusableAnalyzerBase.TokenStreamComponents
built from aLowerCaseTokenizer
filtered withStopFilter
-
-