org.apache.pdfbox.pdfparser
Class BaseParser

java.lang.Object
  extended by org.apache.pdfbox.pdfparser.BaseParser
Direct Known Subclasses:
ConformingPDFParser, PDFObjectStreamParser, PDFParser, PDFStreamParser, PDFXrefStreamParser, VisualSignatureParser

public abstract class BaseParser
extends java.lang.Object

This class is used to contain parsing logic that will be used by both the PDFParser and the COSStreamParser.

Version:
$Revision: 1.61 $
Author:
Ben Litchfield

Field Summary
static java.lang.String DEF
          This is a string constant that will be used for comparisons.
protected  COSDocument document
          This is the document that will be parsed.
static byte[] ENDOBJ
          This is a byte array that will be used for comparisons.
static byte[] ENDSTREAM
          This is a byte array that will be used for comparisons.
protected static boolean FORCE_PARSING
          Default value of the forceParsing flag.
protected  boolean forceParsing
          Flag to skip malformed or otherwise unparseable input where possible.
protected  PushBackInputStream pdfSource
          This is the stream that will be read from.
static java.lang.String PROP_PUSHBACK_SIZE
          system property allowing to define size of push back buffer.
 
Constructor Summary
  BaseParser()
          Default constructor.
protected BaseParser(byte[] input)
          Constructor.
  BaseParser(java.io.InputStream input)
          Constructor.
  BaseParser(java.io.InputStream input, boolean forceParsingValue)
          Constructor.
 
Method Summary
protected  boolean isClosing()
          This will tell if the next character is a closing brace( close of PDF array ).
protected  boolean isClosing(int c)
          This will tell if the next character is a closing brace( close of PDF array ).
protected  boolean isEndOfName(char ch)
          Determine if a character terminates a PDF name.
protected  boolean isEOL()
          This will tell if the next byte to be read is an end of line byte.
protected  boolean isEOL(int c)
          This will tell if the next byte to be read is an end of line byte.
protected  boolean isWhitespace()
          This will tell if the next byte is whitespace or not.
protected  boolean isWhitespace(int c)
          This will tell if the next byte is whitespace or not.
protected  COSBoolean parseBoolean()
          This will parse a boolean object from the stream.
protected  COSArray parseCOSArray()
          This will parse a PDF array object.
protected  COSDictionary parseCOSDictionary()
          This will parse a PDF dictionary.
protected  COSName parseCOSName()
          This will parse a PDF name from the stream.
protected  COSStream parseCOSStream(COSDictionary dic, RandomAccess file)
          This will read a COSStream from the input stream.
protected  COSString parseCOSString(boolean isDictionary)
          This will parse a PDF string.
protected  COSBase parseDirObject()
          This will parse a directory object from the stream.
protected  java.lang.String readExpectedString(java.lang.String theString)
          This will read bytes until the end of line marker occurs.
protected  int readInt()
          This will read an integer from the stream.
protected  java.lang.String readLine()
          This will read bytes until the first end of line marker occurs.
protected  java.lang.String readString()
          This will read the next string from the stream.
protected  java.lang.String readString(int length)
          This will read the next string from the stream up to a certain length.
 void setDocument(COSDocument doc)
          Set the document for this stream.
protected  void skipSpaces()
          This will skip all spaces and comments that are present.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_PUSHBACK_SIZE

public static final java.lang.String PROP_PUSHBACK_SIZE
system property allowing to define size of push back buffer.

See Also:
Constant Field Values

ENDSTREAM

public static final byte[] ENDSTREAM
This is a byte array that will be used for comparisons.


ENDOBJ

public static final byte[] ENDOBJ
This is a byte array that will be used for comparisons.


DEF

public static final java.lang.String DEF
This is a string constant that will be used for comparisons.

See Also:
Constant Field Values

FORCE_PARSING

protected static final boolean FORCE_PARSING
Default value of the forceParsing flag.


pdfSource

protected PushBackInputStream pdfSource
This is the stream that will be read from.


document

protected COSDocument document
This is the document that will be parsed.


forceParsing

protected final boolean forceParsing
Flag to skip malformed or otherwise unparseable input where possible.

Constructor Detail

BaseParser

public BaseParser()
Default constructor.


BaseParser

public BaseParser(java.io.InputStream input,
                  boolean forceParsingValue)
           throws java.io.IOException
Constructor.

Parameters:
input - The input stream to read the data from.
forceParsingValue - flag to skip malformed or otherwise unparseable input where possible
Throws:
java.io.IOException - If there is an error reading the input stream.
Since:
Apache PDFBox 1.3.0

BaseParser

public BaseParser(java.io.InputStream input)
           throws java.io.IOException
Constructor.

Parameters:
input - The input stream to read the data from.
Throws:
java.io.IOException - If there is an error reading the input stream.

BaseParser

protected BaseParser(byte[] input)
              throws java.io.IOException
Constructor.

Parameters:
input - The array to read the data from.
Throws:
java.io.IOException - If there is an error reading the byte data.
Method Detail

setDocument

public void setDocument(COSDocument doc)
Set the document for this stream.

Parameters:
doc - The current document.

parseCOSDictionary

protected COSDictionary parseCOSDictionary()
                                    throws java.io.IOException
This will parse a PDF dictionary.

Returns:
The parsed dictionary.
Throws:
java.io.IOException - IF there is an error reading the stream.

parseCOSStream

protected COSStream parseCOSStream(COSDictionary dic,
                                   RandomAccess file)
                            throws java.io.IOException
This will read a COSStream from the input stream.

Parameters:
file - The file to write the stream to when reading.
dic - The dictionary that goes with this stream.
Returns:
The parsed pdf stream.
Throws:
java.io.IOException - If there is an error reading the stream.

parseCOSString

protected COSString parseCOSString(boolean isDictionary)
                            throws java.io.IOException
This will parse a PDF string.

Parameters:
isDictionary - indicates if the stream is a dictionary or not
Returns:
The parsed PDF string.
Throws:
java.io.IOException - If there is an error reading from the stream.

parseCOSArray

protected COSArray parseCOSArray()
                          throws java.io.IOException
This will parse a PDF array object.

Returns:
The parsed PDF array.
Throws:
java.io.IOException - If there is an error parsing the stream.

isEndOfName

protected boolean isEndOfName(char ch)
Determine if a character terminates a PDF name.

Parameters:
ch - The character
Returns:
true if the character terminates a PDF name, otherwise false.

parseCOSName

protected COSName parseCOSName()
                        throws java.io.IOException
This will parse a PDF name from the stream.

Returns:
The parsed PDF name.
Throws:
java.io.IOException - If there is an error reading from the stream.

parseBoolean

protected COSBoolean parseBoolean()
                           throws java.io.IOException
This will parse a boolean object from the stream.

Returns:
The parsed boolean object.
Throws:
java.io.IOException - If an IO error occurs during parsing.

parseDirObject

protected COSBase parseDirObject()
                          throws java.io.IOException
This will parse a directory object from the stream.

Returns:
The parsed object.
Throws:
java.io.IOException - If there is an error during parsing.

readString

protected java.lang.String readString()
                               throws java.io.IOException
This will read the next string from the stream.

Returns:
The string that was read from the stream.
Throws:
java.io.IOException - If there is an error reading from the stream.

readExpectedString

protected java.lang.String readExpectedString(java.lang.String theString)
                                       throws java.io.IOException
This will read bytes until the end of line marker occurs.

Parameters:
theString - The next expected string in the stream.
Returns:
The characters between the current position and the end of the line.
Throws:
java.io.IOException - If there is an error reading from the stream or theString does not match what was read.

readString

protected java.lang.String readString(int length)
                               throws java.io.IOException
This will read the next string from the stream up to a certain length.

Parameters:
length - The length to stop reading at.
Returns:
The string that was read from the stream of length 0 to length.
Throws:
java.io.IOException - If there is an error reading from the stream.

isClosing

protected boolean isClosing()
                     throws java.io.IOException
This will tell if the next character is a closing brace( close of PDF array ).

Returns:
true if the next byte is ']', false otherwise.
Throws:
java.io.IOException - If an IO error occurs.

isClosing

protected boolean isClosing(int c)
This will tell if the next character is a closing brace( close of PDF array ).

Parameters:
c - The character to check against end of line
Returns:
true if the next byte is ']', false otherwise.

readLine

protected java.lang.String readLine()
                             throws java.io.IOException
This will read bytes until the first end of line marker occurs. Note: if you later unread the results of this function, you'll need to add a newline character to the end of the string.

Returns:
The characters between the current position and the end of the line.
Throws:
java.io.IOException - If there is an error reading from the stream.

isEOL

protected boolean isEOL()
                 throws java.io.IOException
This will tell if the next byte to be read is an end of line byte.

Returns:
true if the next byte is 0x0A or 0x0D.
Throws:
java.io.IOException - If there is an error reading from the stream.

isEOL

protected boolean isEOL(int c)
This will tell if the next byte to be read is an end of line byte.

Parameters:
c - The character to check against end of line
Returns:
true if the next byte is 0x0A or 0x0D.

isWhitespace

protected boolean isWhitespace()
                        throws java.io.IOException
This will tell if the next byte is whitespace or not.

Returns:
true if the next byte in the stream is a whitespace character.
Throws:
java.io.IOException - If there is an error reading from the stream.

isWhitespace

protected boolean isWhitespace(int c)
This will tell if the next byte is whitespace or not. These values are specified in table 1 (page 12) of ISO 32000-1:2008.

Parameters:
c - The character to check against whitespace
Returns:
true if the next byte in the stream is a whitespace character.

skipSpaces

protected void skipSpaces()
                   throws java.io.IOException
This will skip all spaces and comments that are present.

Throws:
java.io.IOException - If there is an error reading from the stream.

readInt

protected int readInt()
               throws java.io.IOException
This will read an integer from the stream.

Returns:
The integer that was read from the stream.
Throws:
java.io.IOException - If there is an error reading from the stream.