public class PDFMarkedContentExtractor extends PDFStreamEngine
Constructor | Description |
---|---|
PDFMarkedContentExtractor() |
Instantiate a new PDFTextStripper object.
|
PDFMarkedContentExtractor(java.lang.String encoding) |
Constructor.
|
Modifier and Type | Method | Description |
---|---|---|
void |
beginMarkedContentSequence(COSName tag,
COSDictionary properties) |
|
void |
endMarkedContentSequence() |
|
java.util.List<PDMarkedContent> |
getMarkedContents() |
|
void |
processPage(PDPage page) |
This will initialise and process the contents of the stream.
|
protected void |
processTextPosition(TextPosition text) |
This will process a TextPosition object and add the
text to the list of characters on a page.
|
protected void |
showGlyph(Matrix textRenderingMatrix,
PDFont font,
int code,
java.lang.String unicode,
Vector displacement) |
This method was originally written by Ben Litchfield for PDFStreamEngine.
|
void |
xobject(PDXObject xobject) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
addOperator, applyTextAdjustment, beginText, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
public PDFMarkedContentExtractor() throws java.io.IOException
java.io.IOException
public PDFMarkedContentExtractor(java.lang.String encoding) throws java.io.IOException
encoding
- The encoding that the output will be written in.java.io.IOException
public void beginMarkedContentSequence(COSName tag, COSDictionary properties)
public void endMarkedContentSequence()
public void xobject(PDXObject xobject)
protected void processTextPosition(TextPosition text)
text
- The text to process.public java.util.List<PDMarkedContent> getMarkedContents()
public void processPage(PDPage page) throws java.io.IOException
processPage
in class PDFStreamEngine
page
- the page to processjava.io.IOException
- if there is an error accessing the stream.protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, java.lang.String unicode, Vector displacement) throws java.io.IOException
showGlyph
in class PDFStreamEngine
textRenderingMatrix
- the current text rendering matrix, Trmfont
- the current fontcode
- internal PDF character code for the glyphunicode
- the Unicode text for this glyph, or null if the PDF does provide itdisplacement
- the displacement (i.e. advance) of the glyph in text spacejava.io.IOException
- if the glyph cannot be processedCopyright © 2002–2018. All rights reserved.