query module

This module contains objects that query the search index. These query objects are composable to form complex query trees.

See also whoosh.qparser which contains code for parsing user queries into query objects.

Base classes

The following abstract base classes are subclassed to create the the “real” query operations.

class whoosh.query.Query

Abstract base class for all queries.

Note that this base class implements __or__, __and__, and __sub__ to allow slightly more convenient composition of query objects:

>>> Term("content", u"a") | Term("content", u"b")
Or([Term("content", u"a"), Term("content", u"b")])

>>> Term("content", u"a") & Term("content", u"b")
And([Term("content", u"a"), Term("content", u"b")])

>>> Term("content", u"a") - Term("content", u"b")
And([Term("content", u"a"), Not(Term("content", u"b"))])
accept(fn)

Applies the given function to this query’s subqueries (if any) and then to this query itself:

def boost_phrases(q):
    if isintance(q, Phrase):
        q.boost *= 2.0
    return q

myquery = myquery.accept(boost_phrases)

This method automatically creates copies of the nodes in the original tree before passing them to your function, so your function can change attributes on nodes without altering the original tree.

This method is less flexible than using Query.apply() (in fact it’s implemented using that method) but is often more straightforward.

all_terms(termset=None, phrases=True)

Returns a set of all terms in this query tree.

This method exists for backwards compatibility. For more flexibility use the Query.iter_all_terms() method instead, which simply yields the terms in the query.

Parameters:phrases – Whether to add words found in Phrase queries.
Return type:set
all_tokens(boost=1.0)

Returns an iterator of analysis.Token objects corresponding to all terms in this query tree. The Token objects will have the fieldname, text, and boost attributes set. If the query was built by the query parser, they Token objects will also have startchar and endchar attributes indexing into the original user query.

apply(fn)

If this query has children, calls the given function on each child and returns a new copy of this node with the new children returned by the function. If this is a leaf node, simply returns this object.

This is useful for writing functions that transform a query tree. For example, this function changes all Term objects in a query tree into Variations objects:

def term2var(q):
    if isinstance(q, Term):
        return Variations(q.fieldname, q.text)
    else:
        return q.apply(term2var)

q = And([Term("f", "alfa"),
         Or([Term("f", "bravo"),
             Not(Term("f", "charlie"))])])
q = term2var(q)

Note that this method does not automatically create copies of nodes. To avoid modifying the original tree, your function should call the Query.copy() method on nodes before changing their attributes.

children()

Returns an iterator of the subqueries of this object.

copy()

Deprecated, just use copy.deepcopy.

docs(searcher)

Returns an iterator of docnums matching this query.

>>> searcher = my_index.searcher()
>>> list(my_query.docs(searcher))
[10, 34, 78, 103]
Parameters:searcher – A whoosh.searching.Searcher object.
estimate_min_size(ixreader)

Returns an estimate of the minimum number of documents this query could potentially match.

estimate_size(ixreader)

Returns an estimate of how many documents this query could potentially match (for example, the estimated size of a simple term query is the document frequency of the term). It is permissible to overestimate, but not to underestimate.

existing_terms(ixreader, termset=None, reverse=False, phrases=True, expand=False)

Returns a set of all terms in this query tree that exist in the given ixreaderder.

This method exists for backwards compatibility. For more flexibility use the Query.iter_all_terms() method instead, which simply yields the terms in the query.

Parameters:
  • ixreader – A whoosh.reading.IndexReader object.
  • reverse – If True, this method adds missing terms rather than existing terms to the set.
  • phrases – Whether to add words found in Phrase queries.
  • expand – If True, queries that match multiple terms (such as Wildcard and Prefix) will return all matching expansions.
Return type:

set

field()

Returns the field this query matches in, or None if this query does not match in a single field.

has_terms()

Returns True if this specific object represents a search for a specific term (as opposed to a pattern, as in Wildcard and Prefix) or terms (i.e., whether the replace() method does something meaningful on this instance).

is_leaf()

Returns True if this is a leaf node in the query tree, or False if this query has sub-queries.

is_range()

Returns True if this object searches for values within a range.

iter_all_terms()

Returns an iterator of (“fieldname”, “text”) pairs for all terms in this query tree.

>>> qp = qparser.QueryParser("text", myindex.schema)
>>> q = myparser.parse("alfa bravo title:charlie")
>>> # List the terms in a query
>>> list(q.iter_all_terms())
[("text", "alfa"), ("text", "bravo"), ("title", "charlie")]
>>> # Get a set of all terms in the query that don't exist in the index
>>> r = myindex.reader()
>>> missing = set(t for t in q.iter_all_terms() if t not in r)
set([("text", "alfa"), ("title", "charlie")])
>>> # All terms in the query that occur in fewer than 5 documents in
>>> # the index
>>> [t for t in q.iter_all_terms() if r.doc_frequency(t[0], t[1]) < 5]
[("title", "charlie")]
leaves()

Returns an iterator of all the leaf queries in this query tree as a flat series.

matcher(searcher)

Returns a Matcher object you can use to retrieve documents and scores matching this query.

Return type:whoosh.matching.Matcher
normalize()

Returns a recursively “normalized” form of this query. The normalized form removes redundancy and empty queries. This is called automatically on query trees created by the query parser, but you may want to call it yourself if you’re writing your own parser or building your own queries.

>>> q = And([And([Term("f", u"a"),
...               Term("f", u"b")]),
...               Term("f", u"c"), Or([])])
>>> q.normalize()
And([Term("f", u"a"), Term("f", u"b"), Term("f", u"c")])

Note that this returns a new, normalized query. It does not modify the original query “in place”.

replace(fieldname, oldtext, newtext)

Returns a copy of this query with oldtext replaced by newtext (if oldtext was anywhere in this query).

Note that this returns a new query with the given text replaced. It does not modify the original query “in place”.

requires()

Returns a set of queries that are known to be required to match for the entire query to match. Note that other queries might also turn out to be required but not be determinable by examining the static query.

>>> a = Term("f", u"a")
>>> b = Term("f", u"b")
>>> And([a, b]).requires()
set([Term("f", u"a"), Term("f", u"b")])
>>> Or([a, b]).requires()
set([])
>>> AndMaybe(a, b).requires()
set([Term("f", u"a")])
>>> a.requires()
set([Term("f", u"a")])
simplify(ixreader)

Returns a recursively simplified form of this query, where “second-order” queries (such as Prefix and Variations) are re-written into lower-level queries (such as Term and Or).

terms()

Yields zero or more (“fieldname”, “text”) pairs searched for by this query object. You can check whether a query object targets specific terms before you call this method using Query.has_terms().

To get all terms in a query tree, use Query.iter_all_terms().

tokens(boost=1.0)

Yields zero or more analysis.Token objects corresponding to the terms searched for by this query object. You can check whether a query object targets specific terms before you call this method using Query.has_terms().

The Token objects will have the fieldname, text, and boost attributes set. If the query was built by the query parser, they Token objects will also have startchar and endchar attributes indexing into the original user query.

To get all tokens for a query tree, use Query.all_tokens().

with_boost(boost)

Returns a COPY of this query with the boost set to the given value.

If a query type does not accept a boost itself, it will try to pass the boost on to its children, if any.

class whoosh.query.CompoundQuery(subqueries, boost=1.0)

Abstract base class for queries that combine or manipulate the results of multiple sub-queries .

class whoosh.query.MultiTerm

Abstract base class for queries that operate on multiple terms in the same field.

class whoosh.query.ExpandingTerm

Intermediate base class for queries such as FuzzyTerm and Variations that expand into multiple queries, but come from a single term.

class whoosh.query.WrappingQuery(child)

Query classes

class whoosh.query.Term(fieldname, text, boost=1.0)

Matches documents containing the given term (fieldname+text pair).

>>> Term("content", u"render")
class whoosh.query.Variations(fieldname, text, boost=1.0)

Query that automatically searches for morphological variations of the given word in the same field.

class whoosh.query.FuzzyTerm(fieldname, text, boost=1.0, maxdist=1, prefixlength=1, constantscore=True)

Matches documents containing words similar to the given term.

Parameters:
  • fieldname – The name of the field to search.
  • text – The text to search for.
  • boost – A boost factor to apply to scores of documents matching this query.
  • maxdist – The maximum edit distance from the given text.
  • prefixlength – The matched terms must share this many initial characters with ‘text’. For example, if text is “light” and prefixlength is 2, then only terms starting with “li” are checked for similarity.
class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)

Matches documents containing a given phrase.

Parameters:
  • fieldname – the field to search.
  • words – a list of words (unicode strings) in the phrase.
  • slop – the number of words allowed between each “word” in the phrase; the default of 1 means the phrase must match exactly.
  • boost – a boost factor that to apply to the raw score of documents matched by this query.
  • char_ranges – if a Phrase object is created by the query parser, it will set this attribute to a list of (startchar, endchar) pairs corresponding to the words in the phrase
class whoosh.query.And(subqueries, boost=1.0)

Matches documents that match ALL of the subqueries.

>>> And([Term("content", u"render"),
...      Term("content", u"shade"),
...      Not(Term("content", u"texture"))])
>>> # You can also do this
>>> Term("content", u"render") & Term("content", u"shade")
class whoosh.query.Or(subqueries, boost=1.0, minmatch=0)

Matches documents that match ANY of the subqueries.

>>> Or([Term("content", u"render"),
...     And([Term("content", u"shade"), Term("content", u"texture")]),
...     Not(Term("content", u"network"))])
>>> # You can also do this
>>> Term("content", u"render") | Term("content", u"shade")
class whoosh.query.DisjunctionMax(subqueries, boost=1.0, tiebreak=0.0)

Matches all documents that match any of the subqueries, but scores each document using the maximum score from the subqueries.

class whoosh.query.Not(query, boost=1.0)

Excludes any documents that match the subquery.

>>> # Match documents that contain 'render' but not 'texture'
>>> And([Term("content", u"render"),
...      Not(Term("content", u"texture"))])
>>> # You can also do this
>>> Term("content", u"render") - Term("content", u"texture")
Parameters:
  • query – A Query object. The results of this query are excluded from the parent query.
  • boost – Boost is meaningless for excluded documents but this keyword argument is accepted for the sake of a consistent interface.
class whoosh.query.Prefix(fieldname, text, boost=1.0, constantscore=True)

Matches documents that contain any terms that start with the given text.

>>> # Match documents containing words starting with 'comp'
>>> Prefix("content", u"comp")
class whoosh.query.Wildcard(fieldname, text, boost=1.0, constantscore=True)

Matches documents that contain any terms that match a “glob” pattern. See the Python fnmatch module for information about globs.

>>> Wildcard("content", u"in*f?x")
class whoosh.query.Regex(fieldname, text, boost=1.0, constantscore=True)

Matches documents that contain any terms that match a regular expression. See the Python re module for information about regular expressions.

class whoosh.query.TermRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)

Matches documents containing any terms in a given range.

>>> # Match documents where the indexed "id" field is greater than or equal
>>> # to 'apple' and less than or equal to 'pear'.
>>> TermRange("id", u"apple", u"pear")
Parameters:
  • fieldname – The name of the field to search.
  • start – Match terms equal to or greater than this.
  • end – Match terms equal to or less than this.
  • startexcl – If True, the range start is exclusive. If False, the range start is inclusive.
  • endexcl – If True, the range end is exclusive. If False, the range end is inclusive.
  • boost – Boost factor that should be applied to the raw score of results matched by this query.
class whoosh.query.NumericRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)

A range query for NUMERIC fields. Takes advantage of tiered indexing to speed up large ranges by matching at a high resolution at the edges of the range and a low resolution in the middle.

>>> # Match numbers from 10 to 5925 in the "number" field.
>>> nr = NumericRange("number", 10, 5925)
Parameters:
  • fieldname – The name of the field to search.
  • start – Match terms equal to or greater than this number. This should be a number type, not a string.
  • end – Match terms equal to or less than this number. This should be a number type, not a string.
  • startexcl – If True, the range start is exclusive. If False, the range start is inclusive.
  • endexcl – If True, the range end is exclusive. If False, the range end is inclusive.
  • boost – Boost factor that should be applied to the raw score of results matched by this query.
  • constantscore – If True, the compiled query returns a constant score (the value of the boost keyword argument) instead of actually scoring the matched terms. This gives a nice speed boost and won’t affect the results in most cases since numeric ranges will almost always be used as a filter.
class whoosh.query.DateRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)

This is a very thin subclass of NumericRange that only overrides the initializer and __repr__() methods to work with datetime objects instead of numbers. Internally this object converts the datetime objects it’s created with to numbers and otherwise acts like a NumericRange query.

>>> DateRange("date", datetime(2010, 11, 3, 3, 0),
...           datetime(2010, 11, 3, 17, 59))
class whoosh.query.Every(fieldname=None, boost=1.0)

A query that matches every document containing any term in a given field. If you don’t specify a field, the query matches every document.

>>> # Match any documents with something in the "path" field
>>> q = Every("path")
>>> # Matcher every document
>>> q = Every()

The unfielded form (matching every document) is efficient.

The fielded is more efficient than a prefix query with an empty prefix or a ‘*’ wildcard, but it can still be very slow on large indexes. It requires the searcher to read the full posting list of every term in the given field.

Instead of using this query it is much more efficient when you create the index to include a single term that appears in all documents that have the field you want to match.

For example, instead of this:

# Match all documents that have something in the "path" field
q = Every("path")

Do this when indexing:

# Add an extra field that indicates whether a document has a path
schema = fields.Schema(path=fields.ID, has_path=fields.ID)

# When indexing, set the "has_path" field based on whether the document
# has anything in the "path" field
writer.add_document(text=text_value1)
writer.add_document(text=text_value2, path=path_value2, has_path="t")

Then to find all documents with a path:

q = Term("has_path", "t")
Parameters:fieldname – the name of the field to match, or None or * to match all documents.
whoosh.query.NullQuery

Binary queries

class whoosh.query.Require(a, b)

Binary query returns results from the first query that also appear in the second query, but only uses the scores from the first query. This lets you filter results without affecting scores.

class whoosh.query.AndMaybe(a, b)

Binary query takes results from the first query. If and only if the same document also appears in the results from the second query, the score from the second query will be added to the score from the first query.

class whoosh.query.AndNot(a, b)

Binary boolean query of the form ‘a ANDNOT b’, where documents that match b are removed from the matches for a.

class whoosh.query.Otherwise(a, b)

A binary query that only matches the second clause if the first clause doesn’t match any documents.

Special queries

class whoosh.query.ConstantScoreQuery(child, score=1.0)

Wraps a query and uses a matcher that always gives a constant score to all matching documents. This is a useful optimization when you don’t care about scores from a certain branch of the query tree because it is simply acting as a filter. See also the AndMaybe query.

Exceptions

exception whoosh.query.QueryError

Error encountered while running a query.

Table Of Contents

Previous topic

qparser module

Next topic

reading module

This Page