|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.PDFParser
org.apache.pdfbox.pdfparser.NonSequentialPDFParser
public class NonSequentialPDFParser
PDFParser which first reads startxref and xref tables in order to know valid
objects and parse only these objects. Thus it is closer to a conforming parser
than the sequential reading of PDFParser
.
This class can be used as a PDFParser
replacement. First parse()
must be called before page objects can be retrieved, e.g. getPDDocument()
.
This class is a much enhanced version of QuickParser
presented in
PDFBOX-1104
by Jeremy Villalobos.
Field Summary | |
---|---|
static String |
SYSPROP_EOFLOOKUPRANGE
|
static String |
SYSPROP_PARSEMINIMAL
|
Fields inherited from class org.apache.pdfbox.pdfparser.PDFParser |
---|
xrefTrailerResolver |
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser |
---|
DEF, document, ENDOBJ, ENDSTREAM, FORCE_PARSING, forceParsing, pdfSource |
Constructor Summary | |
---|---|
NonSequentialPDFParser(File file,
RandomAccess raBuf)
Constructs parser for given file using given buffer for temporary storage. |
|
NonSequentialPDFParser(File file,
RandomAccess raBuf,
String decryptionPassword)
Constructs parser for given file using given buffer for temporary storage. |
|
NonSequentialPDFParser(String filename)
Constructs parser for given file using memory buffer. |
Method Summary | |
---|---|
PDPage |
getPage(int pageNr)
Returns the page requested with all the objects loaded into it. |
int |
getPageNumber()
Returns the number of pages in a document. |
PDDocument |
getPDDocument()
This will get the PD document that was parsed. |
SecurityHandler |
getSecurityHandler()
Returns security handler of the document or null if document
is not encrypted or parse() wasn't called before. |
void |
parse()
This will parse the stream and populate the COSDocument object. |
protected COSStream |
parseCOSStream(COSDictionary dic,
RandomAccess file)
This will read a COSStream from the input stream using length attribute within dictionary. |
void |
setEOFLookupRange(int byteCount)
Sets how many trailing bytes of PDF file are searched for EOF marker and 'startxref' marker. |
Methods inherited from class org.apache.pdfbox.pdfparser.PDFParser |
---|
getDocument, getFDFDocument, isContinueOnError, parseStartXref, parseTrailer, parseXrefStream, parseXrefTable, setTempDirectory |
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser |
---|
isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedString, readInt, readLine, readString, readString, setDocument, skipSpaces |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String SYSPROP_PARSEMINIMAL
public static final String SYSPROP_EOFLOOKUPRANGE
Constructor Detail |
---|
public NonSequentialPDFParser(String filename) throws IOException
filename
- the filename of the pdf to be parsed
IOException
- If something went wrong.public NonSequentialPDFParser(File file, RandomAccess raBuf) throws IOException
file
- the pdf to be parsedraBuf
- the buffer to be used for parsing
IOException
- If something went wrong.public NonSequentialPDFParser(File file, RandomAccess raBuf, String decryptionPassword) throws IOException
file
- the pdf to be parsedraBuf
- the buffer to be used for parsingdecryptionPassword
- password to be used for decryption
IOException
- If something went wrong.Method Detail |
---|
public void setEOFLookupRange(int byteCount)
DEFAULT_TRAIL_BYTECOUNT
.
In case system property SYSPROP_EOFLOOKUPRANGE
is defined
this value will be set on initialization but can be overwritten later.
byteCount
- number of trailing bytespublic void parse() throws IOException
parse
in class PDFParser
IOException
- If there is an error reading from the stream or corrupt data
is found.public SecurityHandler getSecurityHandler()
null
if document
is not encrypted or parse()
wasn't called before.
public PDDocument getPDDocument() throws IOException
getPDDocument
in class PDFParser
IOException
- If there is an error getting the document.public int getPageNumber() throws IOException
IOException
- if PAGES or other needed object is missingpublic PDPage getPage(int pageNr) throws IOException
pageNr
- starts from 0 to the number of pages.
IOException
- If something went wrong.protected COSStream parseCOSStream(COSDictionary dic, RandomAccess file) throws IOException
parseCOSStream
in class BaseParser
dic
- dictionary that goes with this stream.file
- file to write the stream to when reading.
IOException
- if an error occurred reading the stream, like problems
with reading length attribute, stream does not end with 'endstream'
after data read, stream too short etc.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |