Package xappy :: Module indexerconnection :: Class IndexerConnection
[frames] | no frames]

Class IndexerConnection

source code

object --+
         |
        IndexerConnection

A connection to the search engine for indexing.

Instance Methods
 
__init__(self, indexpath)
Create a new connection to the index.
source code
 
set_max_mem_use(self, max_mem=None, max_mem_proportion=None)
Set the maximum memory to use.
source code
 
add_field_action(self, fieldname, fieldtype, **kwargs)
Add an action to be performed on a field.
source code
 
clear_field_actions(self, fieldname)
Clear all actions for the specified field.
source code
 
get_fields_with_actions(self)
Get a list of field names which have actions defined.
source code
 
process(self, document)
Process an UnprocessedDocument with the settings in this database.
source code
 
add(self, document)
Add a new document to the search engine index.
source code
 
replace(self, document)
Replace a document in the search engine index.
source code
 
add_synonym(self, original, synonym, field=None, original_field=None, synonym_field=None)
Add a synonym to the index.
source code
 
remove_synonym(self, original, synonym, field=None)
Remove a synonym from the index.
source code
 
clear_synonyms(self, original, field=None)
Remove all synonyms for a word (or phrase).
source code
 
add_subfacet(self, subfacet, facet)
Add a subfacet-facet relationship to the facet hierarchy.
source code
 
remove_subfacet(self, subfacet)
Remove any existing facet hierarchy relationship for a subfacet.
source code
 
get_subfacets(self, facet)
Get a list of subfacets of a facet.
source code
 
set_facet_for_query_type(self, query_type, facet, association)
Set the association between a query type and a facet.
source code
 
get_facets_for_query_type(self, query_type, association)
Get the set of facets associated with a query type.
source code
 
set_metadata(self, key, value)
Set an item of metadata stored in the connection.
source code
 
get_metadata(self, key)
Get an item of metadata stored in the connection.
source code
 
delete(self, id)
Delete a document from the search engine index.
source code
 
flush(self)
Apply recent changes to the database.
source code
 
close(self)
Close the connection to the database.
source code
 
get_doccount(self)
Count the number of documents in the database.
source code
 
iterids(self)
Get an iterator which returns all the ids in the database.
source code
 
get_document(self, id)
Get the document with the specified unique ID.
source code
 
iter_synonyms(self, prefix='')
Get an iterator over the synonyms.
source code
 
iter_subfacets(self)
Get an iterator over the facet hierarchy.
source code
 
iter_facet_query_types(self, association)
Get an iterator over query types and their associated facets.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables
  FacetQueryType_Preferred = 1
  FacetQueryType_Never = 2
Properties

Inherited from object: __class__

Method Details

__init__(self, indexpath)
(Constructor)

source code 

Create a new connection to the index.

There may only be one indexer connection for a particular database open at a given time. Therefore, if a connection to the database is already open, this will raise a xapian.DatabaseLockError.

If the database doesn't already exist, it will be created.

Overrides: object.__init__

set_max_mem_use(self, max_mem=None, max_mem_proportion=None)

source code 

Set the maximum memory to use.

This call allows the amount of memory to use to buffer changes to be set. This will affect the speed of indexing, but should not result in other changes to the indexing.

Note: this is an approximate measure - the actual amount of memory used max exceed the specified amount. Also, note that future versions of xapian are likely to implement this differently, so this setting may be entirely ignored.

The absolute amount of memory to use (in bytes) may be set by setting max_mem. Alternatively, the proportion of the available memory may be set by setting max_mem_proportion (this should be a value between 0 and 1).

Setting too low a value will result in excessive flushing, and very slow indexing. Setting too high a value will result in excessive buffering, leading to swapping, and very slow indexing.

A reasonable default for max_mem_proportion for a system which is dedicated to indexing is probably 0.5: if other tasks are also being performed on the system, the value should be lowered.

add_field_action(self, fieldname, fieldtype, **kwargs)

source code 

Add an action to be performed on a field.

Note that this change to the configuration will not be preserved on disk until the next call to flush().

clear_field_actions(self, fieldname)

source code 

Clear all actions for the specified field.

This does not report an error if there are already no actions for the specified field.

Note that this change to the configuration will not be preserved on disk until the next call to flush().

process(self, document)

source code 

Process an UnprocessedDocument with the settings in this database.

The resulting ProcessedDocument is returned.

Note that this processing will be automatically performed if an UnprocessedDocument is supplied to the add() or replace() methods of IndexerConnection. This method is exposed to allow the processing to be performed separately, which may be desirable if you wish to manually modify the processed document before adding it to the database, or if you want to split processing of documents from adding documents to the database for performance reasons.

add(self, document)

source code 

Add a new document to the search engine index.

If the document has a id set, and the id already exists in the database, an exception will be raised. Use the replace() method instead if you wish to overwrite documents.

Returns the id of the newly added document (making up a new unique ID if no id was set).

The supplied document may be an instance of UnprocessedDocument, or an instance of ProcessedDocument.

replace(self, document)

source code 

Replace a document in the search engine index.

If the document does not have a id set, an exception will be raised.

If the document has a id set, and the id does not already exist in the database, this method will have the same effect as add().

add_synonym(self, original, synonym, field=None, original_field=None, synonym_field=None)

source code 

Add a synonym to the index.

  • original is the word or words which will be synonym expanded in searches (if multiple words are specified, each word should be separated by a single space).
  • synonym is a synonym for original.
  • field is the field which the synonym is specific to. If no field is specified, the synonym will be used for searches which are not specific to any particular field.

remove_synonym(self, original, synonym, field=None)

source code 

Remove a synonym from the index.

  • original is the word or words which will be synonym expanded in searches (if multiple words are specified, each word should be separated by a single space).
  • synonym is a synonym for original.
  • field is the field which this synonym is specific to. If no field is specified, the synonym will be used for searches which are not specific to any particular field.

clear_synonyms(self, original, field=None)

source code 

Remove all synonyms for a word (or phrase).

  • field is the field which this synonym is specific to. If no field is specified, the synonym will be used for searches which are not specific to any particular field.

add_subfacet(self, subfacet, facet)

source code 

Add a subfacet-facet relationship to the facet hierarchy.

Any existing relationship for that subfacet is replaced.

Raises a KeyError if either facet or subfacet is not a field, and an IndexerError if either facet or subfacet is not a facet field.

set_facet_for_query_type(self, query_type, facet, association)

source code 

Set the association between a query type and a facet.

The value of association must be one of IndexerConnection.FacetQueryType_Preferred, IndexerConnection.FacetQueryType_Never or None. A value of None removes any previously set association.

get_facets_for_query_type(self, query_type, association)

source code 

Get the set of facets associated with a query type.

Only those facets associated with the query type in the specified manner are returned; association must be one of IndexerConnection.FacetQueryType_Preferred or IndexerConnection.FacetQueryType_Never.

If the query type has no facets associated with it, None is returned.

set_metadata(self, key, value)

source code 

Set an item of metadata stored in the connection.

The value supplied will be returned by subsequent calls to get_metadata() which use the same key.

Keys with a leading underscore are reserved for internal use - you should not use such keys unless you really know what you are doing.

This will store the value supplied in the database. It will not be visible to readers (ie, search connections) until after the next flush.

The key is limited to about 200 characters (the same length as a term is limited to). The value can be several megabytes in size.

To remove an item of metadata, simply call this with a value parameter containing an empty string.

get_metadata(self, key)

source code 

Get an item of metadata stored in the connection.

This returns a value stored by a previous call to set_metadata.

If the value is not found, this will return the empty string.

delete(self, id)

source code 

Delete a document from the search engine index.

If the id does not already exist in the database, this method will have no effect (and will not report an error).

flush(self)

source code 

Apply recent changes to the database.

If an exception occurs, any changes since the last call to flush() may be lost.

close(self)

source code 

Close the connection to the database.

It is important to call this method before allowing the class to be garbage collected, because it will ensure that any un-flushed changes will be flushed. It also ensures that the connection is cleaned up promptly.

No other methods may be called on the connection after this has been called. (It is permissible to call close() multiple times, but only the first call will have any effect.)

If an exception occurs, the database will be closed, but changes since the last call to flush may be lost.

get_doccount(self)

source code 

Count the number of documents in the database.

This count will include documents which have been added or removed but not yet flushed().

iterids(self)

source code 

Get an iterator which returns all the ids in the database.

The unqiue_ids are currently returned in binary lexicographical sort order, but this should not be relied on.

get_document(self, id)

source code 

Get the document with the specified unique ID.

Raises a KeyError if there is no such document. Otherwise, it returns a ProcessedDocument.

iter_synonyms(self, prefix='')

source code 

Get an iterator over the synonyms.

  • prefix: if specified, only synonym keys with this prefix will be returned.

The iterator returns 2-tuples, in which the first item is the key (ie, a 2-tuple holding the term or terms which will be synonym expanded, followed by the fieldname specified (or None if no fieldname)), and the second item is a tuple of strings holding the synonyms for the first item.

These return values are suitable for the dict() builtin, so you can write things like:

>>> conn = IndexerConnection('foo')
>>> conn.add_synonym('foo', 'bar')
>>> conn.add_synonym('foo bar', 'baz')
>>> conn.add_synonym('foo bar', 'foo baz')
>>> dict(conn.iter_synonyms())
{('foo', None): ('bar',), ('foo bar', None): ('baz', 'foo baz')}

iter_subfacets(self)

source code 

Get an iterator over the facet hierarchy.

The iterator returns 2-tuples, in which the first item is the subfacet and the second item is its parent facet.

The return values are suitable for the dict() builtin, for example:

>>> conn = IndexerConnection('db')
>>> conn.add_field_action('foo', FieldActions.FACET)
>>> conn.add_field_action('bar', FieldActions.FACET)
>>> conn.add_field_action('baz', FieldActions.FACET)
>>> conn.add_subfacet('foo', 'bar')
>>> conn.add_subfacet('baz', 'bar')
>>> dict(conn.iter_subfacets())
{'foo': 'bar', 'baz': 'bar'}

iter_facet_query_types(self, association)

source code 

Get an iterator over query types and their associated facets.

Only facets associated with the query types in the specified manner are returned; association must be one of IndexerConnection.FacetQueryType_Preferred or IndexerConnection.FacetQueryType_Never.

The iterator returns 2-tuples, in which the first item is the query type and the second item is the associated set of facets.

The return values are suitable for the dict() builtin, for example:

>>> conn = IndexerConnection('db')
>>> conn.add_field_action('foo', FieldActions.FACET)
>>> conn.add_field_action('bar', FieldActions.FACET)
>>> conn.add_field_action('baz', FieldActions.FACET)
>>> conn.set_facet_for_query_type('type1', 'foo', conn.FacetQueryType_Preferred)
>>> conn.set_facet_for_query_type('type1', 'bar', conn.FacetQueryType_Never)
>>> conn.set_facet_for_query_type('type1', 'baz', conn.FacetQueryType_Never)
>>> conn.set_facet_for_query_type('type2', 'bar', conn.FacetQueryType_Preferred)
>>> dict(conn.iter_facet_query_types(conn.FacetQueryType_Preferred))
{'type1': set(['foo']), 'type2': set(['bar'])}
>>> dict(conn.iter_facet_query_types(conn.FacetQueryType_Never))
{'type1': set(['bar', 'baz'])}