Class BytesToNameCanonicalizer


  • public final class BytesToNameCanonicalizer
    extends Object
    This class is basically a caching symbol table implementation used for canonicalizing Names, constructed directly from a byte-based input source.
    Author:
    Tatu Saloranta
    • Field Detail

      • MAX_TABLE_SIZE

        protected static final int MAX_TABLE_SIZE
        Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with uniquer (~= random) names.
        Since:
        1.5
        See Also:
        Constant Field Values
    • Method Detail

      • makeChild

        public BytesToNameCanonicalizer makeChild​(boolean canonicalize,
                                                  boolean intern)
        Parameters:
        intern - Whether canonical symbol Strings should be interned or not
      • release

        public void release()
        Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information
      • size

        public int size()
      • maybeDirty

        public boolean maybeDirty()
        Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.
      • getEmptyName

        public static Name getEmptyName()
      • findName

        public Name findName​(int firstQuad)
        Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.

        Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)

        Parameters:
        firstQuad - int32 containing first 4 bytes of the name; if the whole name less than 4 bytes, padded with zero bytes in front (zero MSBs, ie. right aligned)
        Returns:
        Name matching the symbol passed (or constructed for it)
      • findName

        public Name findName​(int firstQuad,
                             int secondQuad)
        Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.

        Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)

        Parameters:
        firstQuad - int32 containing first 4 bytes of the name.
        secondQuad - int32 containing bytes 5 through 8 of the name; if less than 8 bytes, padded with up to 3 zero bytes in front (zero MSBs, ie. right aligned)
        Returns:
        Name matching the symbol passed (or constructed for it)
      • findName

        public Name findName​(int[] quads,
                             int qlen)
        Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.

        Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.

        Parameters:
        quads - Array of int32s, each of which contain 4 bytes of encoded name
        qlen - Number of int32s, starting from index 0, in quads parameter
        Returns:
        Name matching the symbol passed (or constructed for it)
      • addName

        public Name addName​(String symbolStr,
                            int q1,
                            int q2)
        Since:
        1.6.0
      • addName

        public Name addName​(String symbolStr,
                            int[] quads,
                            int qlen)
      • calcHash

        public static final int calcHash​(int firstQuad)
      • calcHash

        public static final int calcHash​(int firstQuad,
                                         int secondQuad)
      • calcHash

        public static final int calcHash​(int[] quads,
                                         int qlen)