Class PdfReader

    • Field Detail

      • pageInhCandidates

        static final PdfName[] pageInhCandidates
      • vpnames

        static final PdfName[] vpnames
      • vpints

        static final int[] vpints
      • endstream

        static final byte[] endstream
      • endobj

        static final byte[] endobj
      • xref

        protected int[] xref
      • objStmMark

        protected java.util.HashMap objStmMark
      • newXrefType

        protected boolean newXrefType
      • xrefObj

        private java.util.ArrayList xrefObj
      • acroFormParsed

        protected boolean acroFormParsed
      • encrypted

        public boolean encrypted
      • rebuilt

        protected boolean rebuilt
      • freeXref

        protected int freeXref
      • tampered

        protected boolean tampered
      • lastXref

        protected int lastXref
      • eofPos

        protected int eofPos
      • pdfVersion

        protected char pdfVersion
      • password

        protected byte[] password
      • ownerPasswordUsed

        public boolean ownerPasswordUsed
      • strings

        protected java.util.ArrayList strings
      • sharedStreams

        protected boolean sharedStreams
      • consolidateNamedDestinations

        protected boolean consolidateNamedDestinations
      • remoteToLocalNamedDestinations

        protected boolean remoteToLocalNamedDestinations
      • rValue

        protected int rValue
      • pValue

        protected int pValue
      • objNum

        private int objNum
      • objGen

        private int objGen
      • fileLength

        private int fileLength
      • hybridXref

        private boolean hybridXref
      • lastXrefPartial

        private int lastXrefPartial
      • partial

        private boolean partial
      • encryptionError

        private boolean encryptionError
      • appendable

        private boolean appendable
        Holds value of property appendable.
      • readDepth

        private int readDepth
    • Constructor Detail

      • PdfReader

        protected PdfReader()
      • PdfReader

        public PdfReader​(java.lang.String filename)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.lang.String filename,
                         byte[] ownerPassword)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(byte[] pdfIn)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        pdfIn - the byte array with the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(byte[] pdfIn,
                         byte[] ownerPassword)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        pdfIn - the byte array with the document
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.net.URL url)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        url - the URL of the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.net.URL url,
                         byte[] ownerPassword)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        url - the URL of the document
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.io.InputStream is,
                         byte[] ownerPassword)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. The stream is read to the end but is not closed
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.io.InputStream is)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. The stream is read to the end but is not closed
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(RandomAccessFileOrArray raf,
                         byte[] ownerPassword)
                  throws java.io.IOException
        Reads and parses a pdf document. Contrary to the other constructors only the xref is read into memory. The reader is said to be working in "partial" mode as only parts of the pdf are read as needed. The pdf is left open but may be closed at any time with PdfReader.close(), reopen is automatic.
        Parameters:
        raf - the document location
        ownerPassword - the password or null for no password
        Throws:
        java.io.IOException - on error
    • Method Detail

      • getSafeFile

        public RandomAccessFileOrArray getSafeFile()
                                            throws java.io.IOException
        Gets a new file instance of the original PDF document.
        Returns:
        a new file instance of the original PDF document
        Throws:
        java.io.IOException
      • getPdfReaderInstance

        protected PdfReaderInstance getPdfReaderInstance​(PdfWriter writer)
                                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • getNumberOfPages

        public int getNumberOfPages()
        Gets the number of pages in the document.
        Returns:
        the number of pages in the document
      • getCatalog

        public PdfDictionary getCatalog()
        Returns the document's catalog. This dictionary is not a copy, any changes will be reflected in the catalog.
        Returns:
        the document's catalog
      • getAcroForm

        public PRAcroForm getAcroForm()
        Returns the document's acroform, if it has one.
        Returns:
        the document's acroform
      • getPageRotation

        public int getPageRotation​(int index)
        Gets the page rotation. This value can be 0, 90, 180 or 270.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the page rotation
      • getPageRotation

        public int getPageRotation​(PdfDictionary page)
      • getPageSizeWithRotation

        public Rectangle getPageSizeWithRotation​(int index)
        Gets the page size, taking rotation into account. This is a Rectangle with the value of the /MediaBox and the /Rotate key.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        a Rectangle
      • getPageSizeWithRotation

        public Rectangle getPageSizeWithRotation​(PdfDictionary page)
        Gets the rotated page from a page dictionary.
        Parameters:
        page - the page dictionary
        Returns:
        the rotated page
      • getPageSize

        public Rectangle getPageSize​(int index)
        Gets the page size without taking rotation into account. This is the value of the /MediaBox key.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the page size
      • getPageSize

        public Rectangle getPageSize​(PdfDictionary page)
        Gets the page from a page dictionary
        Parameters:
        page - the page dictionary
        Returns:
        the page
      • getCropBox

        public Rectangle getCropBox​(int index)
        Gets the crop box without taking rotation into account. This is the value of the /CropBox key. The crop box is the part of the document to be displayed or printed. It usually is the same as the media box but may be smaller. If the page doesn't have a crop box the page size will be returned.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the crop box
      • getBoxSize

        public Rectangle getBoxSize​(int index,
                                    java.lang.String boxName)
        Gets the box size. Allowed names are: "crop", "trim", "art", "bleed" and "media".
        Parameters:
        index - the page number. The first page is 1
        boxName - the box name
        Returns:
        the box rectangle or null
      • getInfo

        public java.util.HashMap getInfo()
        Returns the content of the document information dictionary as a HashMap of String.
        Returns:
        content of the document information dictionary
      • getNormalizedRectangle

        public static Rectangle getNormalizedRectangle​(PdfArray box)
        Normalizes a Rectangle so that llx and lly are smaller than urx and ury.
        Parameters:
        box - the original rectangle
        Returns:
        a normalized Rectangle
      • readPdf

        protected void readPdf()
                        throws java.io.IOException
        Throws:
        java.io.IOException
      • readPdfPartial

        protected void readPdfPartial()
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • equalsArray

        private boolean equalsArray​(byte[] ar1,
                                    byte[] ar2,
                                    int size)
      • readDecryptedDocObj

        private void readDecryptedDocObj()
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • getPdfObjectRelease

        public static PdfObject getPdfObjectRelease​(PdfObject obj)
        Parameters:
        obj -
        Returns:
        a PdfObject
      • getPdfObject

        public static PdfObject getPdfObject​(PdfObject obj)
        Reads a PdfObject resolving an indirect reference if needed.
        Parameters:
        obj - the PdfObject to read
        Returns:
        the resolved PdfObject
      • getPdfObjectRelease

        public static PdfObject getPdfObjectRelease​(PdfObject obj,
                                                    PdfObject parent)
        Reads a PdfObject resolving an indirect reference if needed. If the reader was opened in partial mode the object will be released to save memory.
        Parameters:
        obj - the PdfObject to read
        parent -
        Returns:
        a PdfObject
      • getPdfObject

        public static PdfObject getPdfObject​(PdfObject obj,
                                             PdfObject parent)
        Parameters:
        obj -
        parent -
        Returns:
        a PdfObject
      • getPdfObjectRelease

        public PdfObject getPdfObjectRelease​(int idx)
        Parameters:
        idx -
        Returns:
        a PdfObject
      • getPdfObject

        public PdfObject getPdfObject​(int idx)
        Parameters:
        idx -
        Returns:
        aPdfObject
      • resetLastXrefPartial

        public void resetLastXrefPartial()
      • releaseLastXrefPartial

        public void releaseLastXrefPartial()
      • releaseLastXrefPartial

        public static void releaseLastXrefPartial​(PdfObject obj)
        Parameters:
        obj -
      • setXrefPartialObject

        private void setXrefPartialObject​(int idx,
                                          PdfObject obj)
      • readPages

        protected void readPages()
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • findOrphanPages

        private PdfArray findOrphanPages()
      • readDocObjPartial

        protected void readDocObjPartial()
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readSingleObject

        protected PdfObject readSingleObject​(int k)
                                      throws java.io.IOException
        Throws:
        java.io.IOException
      • readOneObjStm

        protected PdfObject readOneObjStm​(PRStream stream,
                                          int idx)
                                   throws java.io.IOException
        Throws:
        java.io.IOException
      • dumpPerc

        public double dumpPerc()
        Returns:
        the percentage of the cross reference table that has been read
      • readDocObj

        protected void readDocObj()
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • checkPRStreamLength

        private void checkPRStreamLength​(PRStream stream)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readObjStm

        protected void readObjStm​(PRStream stream,
                                  IntHashtable map)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • killIndirect

        public static PdfObject killIndirect​(PdfObject obj)
        Eliminates the reference to the object freeing the memory used by it and clearing the xref entry.
        Parameters:
        obj - the object. If it's an indirect reference it will be eliminated
        Returns:
        the object or the already erased dereferenced object
      • ensureXrefSize

        private void ensureXrefSize​(int size)
      • readXref

        protected void readXref()
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • readXrefSection

        protected PdfDictionary readXrefSection()
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • readXRefStream

        protected boolean readXRefStream​(int ptr)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • rebuildXref

        protected void rebuildXref()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • readDictionary

        protected PdfDictionary readDictionary()
                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • readArray

        protected PdfArray readArray()
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • readPRObject

        protected PdfObject readPRObject()
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • FlateDecode

        public static byte[] FlateDecode​(byte[] in)
        Decodes a stream that has the FlateDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • decodePredictor

        public static byte[] decodePredictor​(byte[] in,
                                             PdfObject dicPar)
        Parameters:
        in -
        dicPar -
        Returns:
        a byte array
      • FlateDecode

        public static byte[] FlateDecode​(byte[] in,
                                         boolean strict)
        A helper to FlateDecode.
        Parameters:
        in - the input data
        strict - true to read a correct stream. false to try to read a corrupted stream
        Returns:
        the decoded data
      • ASCIIHexDecode

        public static byte[] ASCIIHexDecode​(byte[] in)
        Decodes a stream that has the ASCIIHexDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • ASCII85Decode

        public static byte[] ASCII85Decode​(byte[] in)
        Decodes a stream that has the ASCII85Decode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • LZWDecode

        public static byte[] LZWDecode​(byte[] in)
        Decodes a stream that has the LZWDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • isRebuilt

        public boolean isRebuilt()
        Checks if the document had errors and was rebuilt.
        Returns:
        true if rebuilt.
      • getPageN

        public PdfDictionary getPageN​(int pageNum)
        Gets the dictionary that represents a page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the page dictionary
      • getPageNRelease

        public PdfDictionary getPageNRelease​(int pageNum)
        Parameters:
        pageNum -
        Returns:
        a Dictionary object
      • releasePage

        public void releasePage​(int pageNum)
        Parameters:
        pageNum -
      • resetReleasePage

        public void resetReleasePage()
      • getPageOrigRef

        public PRIndirectReference getPageOrigRef​(int pageNum)
        Gets the page reference to this page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the page reference
      • getPageContent

        public byte[] getPageContent​(int pageNum,
                                     RandomAccessFileOrArray file)
                              throws java.io.IOException
        Gets the contents of the page.
        Parameters:
        pageNum - the page number. 1 is the first
        file - the location of the PDF document
        Returns:
        the content
        Throws:
        java.io.IOException - on error
      • getPageContent

        public byte[] getPageContent​(int pageNum)
                              throws java.io.IOException
        Gets the contents of the page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the content
        Throws:
        java.io.IOException - on error
      • killXref

        protected void killXref​(PdfObject obj)
      • setPageContent

        public void setPageContent​(int pageNum,
                                   byte[] content)
        Sets the contents of the page.
        Parameters:
        content - the new page content
        pageNum - the page number. 1 is the first
      • setPageContent

        public void setPageContent​(int pageNum,
                                   byte[] content,
                                   int compressionLevel)
        Sets the contents of the page.
        Parameters:
        content - the new page content
        pageNum - the page number. 1 is the first
        Since:
        2.1.3 (the method already existed without param compressionLevel)
      • getStreamBytes

        public static byte[] getStreamBytes​(PRStream stream,
                                            RandomAccessFileOrArray file)
                                     throws java.io.IOException
        Get the content from a stream applying the required filters.
        Parameters:
        stream - the stream
        file - the location where the stream is
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • getStreamBytes

        public static byte[] getStreamBytes​(PRStream stream)
                                     throws java.io.IOException
        Get the content from a stream applying the required filters.
        Parameters:
        stream - the stream
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • getStreamBytesRaw

        public static byte[] getStreamBytesRaw​(PRStream stream,
                                               RandomAccessFileOrArray file)
                                        throws java.io.IOException
        Get the content from a stream as it is without applying any filter.
        Parameters:
        stream - the stream
        file - the location where the stream is
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • getStreamBytesRaw

        public static byte[] getStreamBytesRaw​(PRStream stream)
                                        throws java.io.IOException
        Get the content from a stream as it is without applying any filter.
        Parameters:
        stream - the stream
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • eliminateSharedStreams

        public void eliminateSharedStreams()
        Eliminates shared streams if they exist.
      • isTampered

        public boolean isTampered()
        Checks if the document was changed.
        Returns:
        true if the document was changed, false otherwise
      • setTampered

        public void setTampered​(boolean tampered)
        Sets the tampered state. A tampered PdfReader cannot be reused in PdfStamper.
        Parameters:
        tampered - the tampered state
      • getMetadata

        public byte[] getMetadata()
                           throws java.io.IOException
        Gets the XML metadata.
        Returns:
        the XML metadata
        Throws:
        java.io.IOException - on error
      • getLastXref

        public int getLastXref()
        Gets the byte address of the last xref table.
        Returns:
        the byte address of the last xref table
      • getXrefSize

        public int getXrefSize()
        Gets the number of xref objects.
        Returns:
        the number of xref objects
      • getEofPos

        public int getEofPos()
        Gets the byte address of the %%EOF marker.
        Returns:
        the byte address of the %%EOF marker
      • getPdfVersion

        public char getPdfVersion()
        Gets the PDF version. Only the last version char is returned. For example version 1.4 is returned as '4'.
        Returns:
        the PDF version
      • isEncrypted

        public boolean isEncrypted()
        Returns true if the PDF is encrypted.
        Returns:
        true if the PDF is encrypted
      • getPermissions

        public int getPermissions()
        Gets the encryption permissions. It can be used directly in PdfWriter.setEncryption().
        Returns:
        the encryption permissions
      • is128Key

        public boolean is128Key()
        Returns true if the PDF has a 128 bit key encryption.
        Returns:
        true if the PDF has a 128 bit key encryption
      • getTrailer

        public PdfDictionary getTrailer()
        Gets the trailer dictionary
        Returns:
        the trailer dictionary
      • equalsn

        static boolean equalsn​(byte[] a1,
                               byte[] a2)
      • getFontName

        static java.lang.String getFontName​(PdfDictionary dic)
      • getSubsetPrefix

        static java.lang.String getSubsetPrefix​(PdfDictionary dic)
      • shuffleSubsetNames

        public int shuffleSubsetNames()
        Finds all the font subsets and changes the prefixes to some random values.
        Returns:
        the number of font subsets altered
      • createFakeFontSubsets

        public int createFakeFontSubsets()
        Finds all the fonts not subset but embedded and marks them as subset.
        Returns:
        the number of fonts altered
      • getNamedDestination

        public java.util.HashMap getNamedDestination()
        Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets all the named destinations
      • getNamedDestination

        public java.util.HashMap getNamedDestination​(boolean keepNames)
        Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
        Parameters:
        keepNames - true if you want the keys to be real PdfNames instead of Strings
        Returns:
        gets all the named destinations
        Since:
        2.1.6
      • getNamedDestinationFromNames

        public java.util.HashMap getNamedDestinationFromNames()
        Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets the named destinations
      • getNamedDestinationFromNames

        public java.util.HashMap getNamedDestinationFromNames​(boolean keepNames)
        Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Parameters:
        keepNames - true if you want the keys to be real PdfNames instead of Strings
        Returns:
        gets the named destinations
        Since:
        2.1.6
      • getNamedDestinationFromStrings

        public java.util.HashMap getNamedDestinationFromStrings()
        Gets the named destinations from the /Names key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets the named destinations
      • removeFields

        public void removeFields()
        Removes all the fields from the document.
      • removeAnnotations

        public void removeAnnotations()
        Removes all the annotations and fields from the document.
      • iterateBookmarks

        private void iterateBookmarks​(PdfObject outlineRef,
                                      java.util.HashMap names)
      • makeRemoteNamedDestinationsLocal

        public void makeRemoteNamedDestinationsLocal()
        Replaces remote named links with local destinations that have the same name.
        Since:
        5.0
      • convertNamedDestination

        private boolean convertNamedDestination​(PdfObject obj,
                                                java.util.HashMap names)
        Converts a remote named destination GoToR with a local named destination if there's a corresponding name.
        Parameters:
        obj - an annotation that needs to be screened for links to external named destinations.
        names - a map with names of local named destinations
        Since:
        iText 5.0
      • consolidateNamedDestinations

        public void consolidateNamedDestinations()
        Replaces all the local named links with the actual destinations.
      • replaceNamedDestination

        private boolean replaceNamedDestination​(PdfObject obj,
                                                java.util.HashMap names)
      • close

        public void close()
        Closes the reader
      • removeUnusedNode

        protected void removeUnusedNode​(PdfObject obj,
                                        boolean[] hits)
      • removeUnusedObjects

        public int removeUnusedObjects()
        Removes all the unreachable objects.
        Returns:
        the number of indirect objects removed
      • getAcroFields

        public AcroFields getAcroFields()
        Gets a read-only version of AcroFields.
        Returns:
        a read-only version of AcroFields
      • getJavaScript

        public java.lang.String getJavaScript​(RandomAccessFileOrArray file)
                                       throws java.io.IOException
        Gets the global document JavaScript.
        Parameters:
        file - the document file
        Returns:
        the global document JavaScript
        Throws:
        java.io.IOException - on error
      • getJavaScript

        public java.lang.String getJavaScript()
                                       throws java.io.IOException
        Gets the global document JavaScript.
        Returns:
        the global document JavaScript
        Throws:
        java.io.IOException - on error
      • selectPages

        public void selectPages​(java.lang.String ranges)
        Selects the pages to keep in the document. The pages are described as ranges. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.
        Parameters:
        ranges - the comma separated ranges as described in SequenceList
      • selectPages

        public void selectPages​(java.util.List pagesToKeep)
        Selects the pages to keep in the document. The pages are described as a List of Integer. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.
        Parameters:
        pagesToKeep - the pages to keep in the document
      • getSimpleViewerPreferences

        public int getSimpleViewerPreferences()
        Returns a bitset representing the PageMode and PageLayout viewer preferences. Doesn't return any information about the ViewerPreferences dictionary.
        Returns:
        an int that contains the Viewer Preferences.
      • isAppendable

        public boolean isAppendable()
        Getter for property appendable.
        Returns:
        Value of property appendable.
      • setAppendable

        public void setAppendable​(boolean appendable)
        Setter for property appendable.
        Parameters:
        appendable - New value of property appendable.
      • isNewXrefType

        public boolean isNewXrefType()
        Getter for property newXrefType.
        Returns:
        Value of property newXrefType.
      • getFileLength

        public int getFileLength()
        Getter for property fileLength.
        Returns:
        Value of property fileLength.
      • isHybridXref

        public boolean isHybridXref()
        Getter for property hybridXref.
        Returns:
        Value of property hybridXref.
      • removeUsageRights

        public void removeUsageRights()
        Removes any usage rights that this PDF may have. Only Adobe can grant usage rights and any PDF modification with iText will invalidate them. Invalidated usage rights may confuse Acrobat and it's advisable to remove them altogether.
      • isOpenedWithFullPermissions

        public final boolean isOpenedWithFullPermissions()
        Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
        Returns:
        true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password
      • getCryptoMode

        public int getCryptoMode()
      • isMetadataEncrypted

        public boolean isMetadataEncrypted()
      • computeUserPassword

        public byte[] computeUserPassword()