Class SweetSpotSimilarity

  • All Implemented Interfaces:
    Serializable

    public class SweetSpotSimilarity
    extends org.apache.lucene.search.DefaultSimilarity
    A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.

    For lengthNorm, A global min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.

    A per field min/max can be specified if different fields have different sweet spots.

    For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.

    See Also:
    A Gnuplot file used to generate some of the visualizations refrenced from each function., Serialized Form
    • Field Summary

      • Fields inherited from class org.apache.lucene.search.DefaultSimilarity

        discountOverlaps
      • Fields inherited from class org.apache.lucene.search.Similarity

        NO_DOC_ID_PROVIDED
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      float baselineTf​(float freq)
      Implemented as: (x <= min) ? base : sqrt(x+(base**2)-min) ...but with a special case check for 0.
      float computeLengthNorm​(String fieldName, int numTerms)
      Implemented as: 1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 ) .
      float computeNorm​(String fieldName, org.apache.lucene.index.FieldInvertState state)
      Implemented as state.getBoost() * lengthNorm(fieldName, numTokens) where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.
      float hyperbolicTf​(float freq)
      Uses a hyperbolic tangent function that allows for a hard max...
      void setBaselineTfFactors​(float base, float min)
      Sets the baseline and minimum function variables for baselineTf
      void setHyperbolicTfFactors​(float min, float max, double base, float xoffset)
      Sets the function variables for the hyperbolicTf functions
      void setLengthNormFactors​(int min, int max, float steepness)
      Sets the default function variables used by lengthNorm when no field specific variables have been set.
      void setLengthNormFactors​(String field, int min, int max, float steepness, boolean discountOverlaps)
      Sets the function variables used by lengthNorm for a specific named field.
      float tf​(int freq)
      Delegates to baselineTf
      • Methods inherited from class org.apache.lucene.search.DefaultSimilarity

        coord, getDiscountOverlaps, idf, queryNorm, setDiscountOverlaps, sloppyFreq, tf
      • Methods inherited from class org.apache.lucene.search.Similarity

        decodeNorm, decodeNormValue, encodeNorm, encodeNormValue, getDefault, getNormDecoder, idfExplain, idfExplain, idfExplain, lengthNorm, scorePayload, setDefault
    • Constructor Detail

      • SweetSpotSimilarity

        public SweetSpotSimilarity()
    • Method Detail

      • setBaselineTfFactors

        public void setBaselineTfFactors​(float base,
                                         float min)
        Sets the baseline and minimum function variables for baselineTf
        See Also:
        baselineTf(float)
      • setHyperbolicTfFactors

        public void setHyperbolicTfFactors​(float min,
                                           float max,
                                           double base,
                                           float xoffset)
        Sets the function variables for the hyperbolicTf functions
        Parameters:
        min - the minimum tf value to ever be returned (default: 0.0)
        max - the maximum tf value to ever be returned (default: 2.0)
        base - the base value to be used in the exponential for the hyperbolic function (default: 1.3)
        xoffset - the midpoint of the hyperbolic function (default: 10.0)
        See Also:
        hyperbolicTf(float)
      • setLengthNormFactors

        public void setLengthNormFactors​(int min,
                                         int max,
                                         float steepness)
        Sets the default function variables used by lengthNorm when no field specific variables have been set.
        See Also:
        Similarity.lengthNorm(java.lang.String, int)
      • setLengthNormFactors

        public void setLengthNormFactors​(String field,
                                         int min,
                                         int max,
                                         float steepness,
                                         boolean discountOverlaps)
        Sets the function variables used by lengthNorm for a specific named field.
        Parameters:
        field - field name
        min - minimum value
        max - maximum value
        steepness - steepness of the curve
        discountOverlaps - if true, numOverlapTokens will be subtracted from numTokens; if false then numOverlapTokens will be assumed to be 0 (see DefaultSimilarity.computeNorm(String, FieldInvertState) for details).
        See Also:
        Similarity.lengthNorm(java.lang.String, int)
      • computeNorm

        public float computeNorm​(String fieldName,
                                 org.apache.lucene.index.FieldInvertState state)
        Implemented as state.getBoost() * lengthNorm(fieldName, numTokens) where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.
        Overrides:
        computeNorm in class org.apache.lucene.search.DefaultSimilarity
      • computeLengthNorm

        public float computeLengthNorm​(String fieldName,
                                       int numTerms)
        Implemented as: 1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 ) .

        This degrades to 1/sqrt(x) when min and max are both 1 and steepness is 0.5

        :TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.

        See Also:
        setLengthNormFactors(int, int, float), An SVG visualization of this function
      • tf

        public float tf​(int freq)
        Delegates to baselineTf
        Overrides:
        tf in class org.apache.lucene.search.Similarity
        See Also:
        baselineTf(float)