Package org.apache.lucene.facet.index
Class CategoryDocumentBuilder
- java.lang.Object
-
- org.apache.lucene.facet.index.CategoryDocumentBuilder
-
- Direct Known Subclasses:
EnhancementsDocumentBuilder
public class CategoryDocumentBuilder extends Object
A utility class which allows attachment ofCategoryPath
s orCategoryAttribute
s to a given document using a taxonomy.
Construction could be done with either a givenFacetIndexingParams
or the default implementationDefaultFacetIndexingParams
.
A CategoryDocumentBuilder can be reused by repeatedly setting the categories and building the document. Categories are provided either asCategoryAttribute
elements throughsetCategories(Iterable)
, or asCategoryPath
elements throughsetCategoryPaths(Iterable)
.Note that both
setCategories(Iterable)
andsetCategoryPaths(Iterable)
return thisCategoryDocumentBuilder
, allowing the following pattern:new CategoryDocumentBuilder(taxonomy, params).setCategories(categories).build(doc)
.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,List<CategoryAttribute>>
categoriesMap
protected ArrayList<org.apache.lucene.document.Field>
fieldList
A list of fields which is filled at ancestors' construction and used duringbuild(Document)
.protected FacetIndexingParams
indexingParams
Parameters to be used when indexing categories.protected TaxonomyWriter
taxonomyWriter
ATaxonomyWriter
for adding categories and retrieving their ordinals.
-
Constructor Summary
Constructors Constructor Description CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter)
Creating a facets document builder with default facet indexing parameters.
See:CategoryDocumentBuilder(TaxonomyWriter, FacetIndexingParams)
CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter, FacetIndexingParams params)
Creating a facets document builder with a given facet indexing parameters object.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.lucene.document.Document
build(org.apache.lucene.document.Document doc)
Adds the fields created in one of the "set" methods to the documentprotected void
fillCategoriesMap(Iterable<CategoryAttribute> categories)
Fills the categories mapping between a field name and a list of categories that belongs to it according to this builder'sFacetIndexingParams
objectprotected CategoryListTokenizer
getCategoryListTokenizer(org.apache.lucene.analysis.TokenStream categoryStream)
Get a category list tokenizer (or a series of such tokenizers) to create the category list tokens.protected CategoryTokenizer
getCategoryTokenizer(org.apache.lucene.analysis.TokenStream categoryStream)
Get aCategoryTokenizer
to create the category tokens.protected CountingListTokenizer
getCountingListTokenizer(org.apache.lucene.analysis.TokenStream categoryStream)
Get aCountingListTokenizer
for creating counting list token.protected org.apache.lucene.analysis.TokenStream
getParentsStream(CategoryAttributesStream categoryAttributesStream)
Get a stream of categories which includes the parents, according to policies defined in indexing parameters.CategoryDocumentBuilder
setCategories(Iterable<CategoryAttribute> categories)
Set the categories of the document builder from anIterable
ofCategoryAttribute
objects.CategoryDocumentBuilder
setCategoryPaths(Iterable<CategoryPath> categoryPaths)
Set the categories of the document builder from anIterable
ofCategoryPath
objects.
-
-
-
Field Detail
-
taxonomyWriter
protected final TaxonomyWriter taxonomyWriter
ATaxonomyWriter
for adding categories and retrieving their ordinals.
-
indexingParams
protected final FacetIndexingParams indexingParams
Parameters to be used when indexing categories.
-
fieldList
protected final ArrayList<org.apache.lucene.document.Field> fieldList
A list of fields which is filled at ancestors' construction and used duringbuild(Document)
.
-
categoriesMap
protected Map<String,List<CategoryAttribute>> categoriesMap
-
-
Constructor Detail
-
CategoryDocumentBuilder
public CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter) throws IOException
Creating a facets document builder with default facet indexing parameters.
See:CategoryDocumentBuilder(TaxonomyWriter, FacetIndexingParams)
- Parameters:
taxonomyWriter
- to which new categories will be added, as well as translating known categories to ordinals- Throws:
IOException
-
CategoryDocumentBuilder
public CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter, FacetIndexingParams params) throws IOException
Creating a facets document builder with a given facet indexing parameters object.- Parameters:
taxonomyWriter
- to which new categories will be added, as well as translating known categories to ordinalsparams
- holds all parameters the indexing process should use such as category-list parameters- Throws:
IOException
-
-
Method Detail
-
setCategoryPaths
public CategoryDocumentBuilder setCategoryPaths(Iterable<CategoryPath> categoryPaths) throws IOException
Set the categories of the document builder from anIterable
ofCategoryPath
objects.- Parameters:
categoryPaths
- An iterable of CategoryPath objects which holds the categories (facets) which will be added to the document atbuild(Document)
- Returns:
- This CategoryDocumentBuilder, to enable this one line call:
new
CategoryDocumentBuilder(TaxonomyWriter)
.setCategoryPaths(Iterable)
.build(Document)
. - Throws:
IOException
-
setCategories
public CategoryDocumentBuilder setCategories(Iterable<CategoryAttribute> categories) throws IOException
Set the categories of the document builder from anIterable
ofCategoryAttribute
objects.- Parameters:
categories
- An iterable ofCategoryAttribute
objects which holds the categories (facets) which will be added to the document atbuild(Document)
- Returns:
- This CategoryDocumentBuilder, to enable this one line call:
new
CategoryDocumentBuilder(TaxonomyWriter)
.setCategories(Iterable)
.build(Document)
. - Throws:
IOException
-
getParentsStream
protected org.apache.lucene.analysis.TokenStream getParentsStream(CategoryAttributesStream categoryAttributesStream)
Get a stream of categories which includes the parents, according to policies defined in indexing parameters.- Parameters:
categoryAttributesStream
- The input stream- Returns:
- The parents stream.
- See Also:
OrdinalPolicy (for policy of adding category tokens for parents)
,PathPolicy (for policy of adding category list tokens for parents)
-
fillCategoriesMap
protected void fillCategoriesMap(Iterable<CategoryAttribute> categories) throws IOException
Fills the categories mapping between a field name and a list of categories that belongs to it according to this builder'sFacetIndexingParams
object- Parameters:
categories
- Iterable over the category attributes- Throws:
IOException
-
getCategoryListTokenizer
protected CategoryListTokenizer getCategoryListTokenizer(org.apache.lucene.analysis.TokenStream categoryStream)
Get a category list tokenizer (or a series of such tokenizers) to create the category list tokens.- Parameters:
categoryStream
- A stream containingCategoryAttribute
with the relevant data.- Returns:
- The category list tokenizer (or series of tokenizers) to be used in creating category list tokens.
-
getCountingListTokenizer
protected CountingListTokenizer getCountingListTokenizer(org.apache.lucene.analysis.TokenStream categoryStream)
Get aCountingListTokenizer
for creating counting list token.- Parameters:
categoryStream
- A stream containingCategoryAttribute
s with the relevant data.- Returns:
- A counting list tokenizer to be used in creating counting list token.
-
getCategoryTokenizer
protected CategoryTokenizer getCategoryTokenizer(org.apache.lucene.analysis.TokenStream categoryStream) throws IOException
Get aCategoryTokenizer
to create the category tokens. This method can be overridden for adding more attributes to the category tokens.- Parameters:
categoryStream
- A stream containingCategoryAttribute
with the relevant data.- Returns:
- The
CategoryTokenizer
to be used in creating category tokens. - Throws:
IOException
-
build
public org.apache.lucene.document.Document build(org.apache.lucene.document.Document doc)
Adds the fields created in one of the "set" methods to the document
-
-