Class FSTLookup
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.fst.FSTLookup
-
@Deprecated public class FSTLookup extends Lookup
Deprecated.UseFSTCompletionLookup
instead.Finite state automata based implementation ofLookup
query suggestion/ autocomplete interface.Implementation details
The construction step in
build(TermFreqIterator)
works as follows:- A set of input terms (String) and weights (float) is given.
- The range of weights is determined and then all weights are discretized into a fixed set
of values (
buckets
). Note that this means that minor changes in weights may be lost during automaton construction. In general, this is not a big problem because the "priorities" of completions can be split into a fixed set of classes (even as rough as: very frequent, frequent, baseline, marginal). If you need exact, fine-grained weights, useTSTLookup
instead. - All terms in the input are preprended with a synthetic pseudo-character being the weight
of that term. For example a term
abc
with a discretized weight equal '1' would become1abc
. - The terms are sorted by their raw value of utf16 character values (including the synthetic term in front).
- A finite state automaton (
FST
) is constructed from the input. The root node has arcs labeled with all possible weights. We cache all these arcs, highest-weight first.
At runtime, in
lookup(CharSequence, boolean, int)
, the automaton is utilized as follows:- For each possible term weight encoded in the automaton (cached arcs from the root above), starting with the highest one, we descend along the path of the input key. If the key is not a prefix of a sequence in the automaton (path ends prematurely), we exit immediately. No completions.
- Otherwise, we have found an internal automaton node that ends the key. The entire subautomaton (all paths) starting from this node form the key's completions. We start the traversal of this subautomaton. Every time we reach a final state (arc), we add a single suggestion to the list of results (the weight of this suggestion is constant and equal to the root path we started from). The tricky part is that because automaton edges are sorted and we scan depth-first, we can terminate the entire procedure as soon as we collect enough suggestions the user requested.
- In case the number of suggestions collected in the step above is still insufficient, we proceed to the next (smaller) weight leaving the root node and repeat the same algorithm again.
Runtime behavior and performance characteristic
The algorithm described above is optimized for finding suggestions to short prefixes in a top-weights-first order. This is probably the most common use case: it allows presenting suggestions early and sorts them by the global frequency (and then alphabetically).
If there is an exact match in the automaton, it is returned first on the results list (even with by-weight sorting).
Note that the maximum lookup time for any prefix is the time of descending to the subtree, plus traversal of the subtree up to the number of requested suggestions (because they are already presorted by weight on the root level and alphabetically at any node level).
To order alphabetically only (no ordering by priorities), use identical term weights for all terms. Alphabetical suggestions are returned even if non-constant weights are used, but the algorithm for doing this is suboptimal.
"alphabetically" in any of the documentation above indicates utf16 codepoint order, nothing else.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
-
Field Summary
Fields Modifier and Type Field Description static String
FILENAME
Deprecated.Serialized automaton file name (storage).-
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
-
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
build(TermFreqIterator tfit)
Deprecated.Builds up a new internalLookup
representation based on the givenTermFreqIterator
.Float
get(CharSequence key)
Deprecated.Get the (approximated) weight of a single key (if there is a perfect match for it in the automaton).boolean
load(InputStream input)
Deprecated.Discard current lookup data and load it from a previously saved copy.List<Lookup.LookupResult>
lookup(CharSequence key, boolean onlyMorePopular, int num)
Deprecated.Lookup autocomplete suggestions tokey
.boolean
store(OutputStream output)
Deprecated.Persist the constructed lookup data to a directory.
-
-
-
Field Detail
-
FILENAME
public static final String FILENAME
Deprecated.Serialized automaton file name (storage).- See Also:
- Constant Field Values
-
-
Method Detail
-
build
public void build(TermFreqIterator tfit) throws IOException
Deprecated.Description copied from class:Lookup
Builds up a new internalLookup
representation based on the givenTermFreqIterator
. The implementation might re-sort the data internally.- Specified by:
build
in classLookup
- Throws:
IOException
-
get
public Float get(CharSequence key)
Deprecated.Get the (approximated) weight of a single key (if there is a perfect match for it in the automaton).- Returns:
- Returns the approximated weight of the input key or
null
if not found.
-
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, boolean onlyMorePopular, int num)
Deprecated.Lookup autocomplete suggestions tokey
.- Specified by:
lookup
in classLookup
- Parameters:
key
- The prefix to which suggestions should be sought.onlyMorePopular
- Return most popular suggestions first. This is the default behavior for this implementation. Setting it tofalse
has no effect (use constant term weights to sort alphabetically only).num
- At most this number of suggestions will be returned.- Returns:
- Returns the suggestions, sorted by their approximated weight first (decreasing) and then alphabetically (utf16 codepoint order).
-
store
public boolean store(OutputStream output) throws IOException
Deprecated.Description copied from class:Lookup
Persist the constructed lookup data to a directory. Optional operation.- Specified by:
store
in classLookup
- Parameters:
output
-OutputStream
to write the data to.- Returns:
- true if successful, false if unsuccessful or not supported.
- Throws:
IOException
- when fatal IO error occurs.
-
load
public boolean load(InputStream input) throws IOException
Deprecated.Description copied from class:Lookup
Discard current lookup data and load it from a previously saved copy. Optional operation.- Specified by:
load
in classLookup
- Parameters:
input
- theInputStream
to load the lookup data.- Returns:
- true if completed successfully, false if unsuccessful or not supported.
- Throws:
IOException
- when fatal IO error occurs.
-
-