PoDoFo
0.9.6
|
#include <PdfParser.h>
Public Member Functions | |
PdfParser (PdfVecObjects *pVecObjects) | |
PdfParser (PdfVecObjects *pVecObjects, const char *pszFilename, bool bLoadOnDemand=true) | |
PdfParser (PdfVecObjects *pVecObjects, const char *pBuffer, long lLen, bool bLoadOnDemand=true) | |
PdfParser (PdfVecObjects *pVecObjects, const PdfRefCountedInputDevice &rDevice, bool bLoadOnDemand=true) | |
virtual | ~PdfParser () |
void | ParseFile (const char *pszFilename, bool bLoadOnDemand=true) |
void | ParseFile (const char *pBuffer, long lLen, bool bLoadOnDemand=true) |
void | ParseFile (const PdfRefCountedInputDevice &rDevice, bool bLoadOnDemand=true) |
bool | QuickEncryptedCheck (const char *pszFilename) |
int | GetNumberOfIncrementalUpdates () const |
const PdfVecObjects * | GetObjects () const |
EPdfVersion | GetPdfVersion () const |
const char * | GetPdfVersionString () const |
const PdfObject * | GetTrailer () const |
bool | GetLoadOnDemand () const |
bool | IsLinearized () const |
size_t | GetFileSize () const |
bool | GetEncrypted () const |
const PdfEncrypt * | GetEncrypt () const |
PdfEncrypt * | TakeEncrypt () |
void | SetPassword (const std::string &sPassword) |
bool | IsStrictParsing () const |
void | SetStrictParsing (bool bStrict) |
![]() | |
virtual bool | GetNextToken (const char *&pszToken, EPdfTokenType *peType=NULL) |
bool | IsNextToken (const char *pszToken) |
pdf_long | GetNextNumber () |
void | GetNextVariant (PdfVariant &rVariant, PdfEncrypt *pEncrypt) |
Static Public Member Functions | |
static bool | GetIgnoreBrokenObjects () |
static void | SetIgnoreBrokenObjects (bool bBroken) |
static long | GetMaxObjectCount () |
static void | SetMaxObjectCount (long nMaxObjects) |
![]() | |
static PODOFO_NOTHROW bool | IsWhitespace (const unsigned char ch) |
static PODOFO_NOTHROW bool | IsDelimiter (const unsigned char ch) |
static PODOFO_NOTHROW bool | IsRegular (const unsigned char ch) |
static PODOFO_NOTHROW bool | IsPrintable (const unsigned char ch) |
static PODOFO_NOTHROW int | GetHexValue (const unsigned char ch) |
Protected Member Functions | |
void | FindToken (const char *pszToken, const long lRange) |
void | FindToken2 (const char *pszToken, const long lRange, size_t searchEnd) |
void | ReadDocumentStructure () |
void | HasLinearizationDict () |
void | MergeTrailer (const PdfObject *pTrailer) |
void | ReadTrailer () |
void | ReadXRef (pdf_long *pXRefOffset) |
void | ReadXRefContents (pdf_long lOffset, bool bPositionAtEnd=false) |
void | ReadXRefSubsection (pdf_int64 &nFirstObject, pdf_int64 &nNumObjects) |
void | ReadXRefStreamContents (pdf_long lOffset, bool bReadOnlyTrailer) |
void | ReadObjects () |
void | ReadObjectsInternal () |
void | ReadObjectFromStream (int nObjNo, int nIndex) |
bool | IsPdfFile () |
void | CheckEOFMarker () |
![]() | |
void | GetNextVariant (const char *pszToken, EPdfTokenType eType, PdfVariant &rVariant, PdfEncrypt *pEncrypt) |
EPdfDataType | DetermineDataType (const char *pszToken, EPdfTokenType eType, PdfVariant &rVariant) |
void | ReadDictionary (PdfVariant &rVariant, PdfEncrypt *pEncrypt) |
void | ReadArray (PdfVariant &rVariant, PdfEncrypt *pEncrypt) |
void | ReadString (PdfVariant &rVariant, PdfEncrypt *pEncrypt) |
void | ReadHexString (PdfVariant &rVariant, PdfEncrypt *pEncrypt) |
void | ReadHexString (std::vector< char > &rVecBuffer) |
void | ReadName (PdfVariant &rVariant) |
void | QuequeToken (const char *pszToken, EPdfTokenType eType) |
Additional Inherited Members | |
![]() | |
static const unsigned int | HEX_NOT_FOUND = std::numeric_limits<unsigned int>::max() |
PdfParser reads a PDF file into memory. The file can be modified in memory and written back using the PdfWriter class. Most PDF features are supported
PoDoFo::PdfParser::PdfParser | ( | PdfVecObjects * | pVecObjects | ) |
PoDoFo::PdfParser::PdfParser | ( | PdfVecObjects * | pVecObjects, |
const char * | pszFilename, | ||
bool | bLoadOnDemand = true |
||
) |
Create a new PdfParser object and open a PDF file and parse it into memory.
pVecObjects | vector to write the parsed PdfObjects to |
pszFilename | filename of the file which is going to be parsed |
bLoadOnDemand | If true all objects will be read from the file at the time they are accessed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword() with the correct password in this case.
PoDoFo::PdfParser::PdfParser | ( | PdfVecObjects * | pVecObjects, |
const char * | pBuffer, | ||
long | lLen, | ||
bool | bLoadOnDemand = true |
||
) |
Create a new PdfParser object and open a PDF file and parse it into memory.
pVecObjects | vector to write the parsed PdfObjects to |
pBuffer | buffer containing a PDF file in memory |
lLen | length of the buffer containing the PDF file |
bLoadOnDemand | If true all objects will be read from the file at the time they are accessed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword() with the correct password in this case.
PoDoFo::PdfParser::PdfParser | ( | PdfVecObjects * | pVecObjects, |
const PdfRefCountedInputDevice & | rDevice, | ||
bool | bLoadOnDemand = true |
||
) |
Create a new PdfParser object and open a PDF file and parse it into memory.
pVecObjects | vector to write the parsed PdfObjects to |
rDevice | read from this PdfRefCountedInputDevice |
bLoadOnDemand | If true all objects will be read from the file at the time they are accessed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword() with the correct password in this case.
|
virtual |
Delete the PdfParser and all PdfObjects
|
protected |
Checks for the existence of the %EOF marker at the end of the file. When strict mode is off it will also attempt to setup the parser to ignore any garbage after the last %EOF marker. Simply raises an error if there is a problem with the marker.
|
protected |
Searches backwards from the end of the file and tries to find a token. The current file is positioned right after the token.
pszToken | a token to find |
lRange | range in bytes in which to search beginning at the end of the file |
|
protected |
Searches backwards from the specified position of the file and tries to find a token. The current file is positioned right after the token.
pszToken | a token to find |
lRange | range in bytes in which to search beginning at the specified position of the file |
searchEnd | specifies position |
|
inline |
|
inline |
|
inline |
|
inlinestatic |
|
inline |
|
inlinestatic |
|
inline |
Retrieve the number of incremental updates that have been applied to the last parsed PDF file.
0 means no update has been applied.
|
inline |
Get a reference to the sorted internal objects vector.
|
inline |
Get the file format version of the pdf
const char * PoDoFo::PdfParser::GetPdfVersionString | ( | ) | const |
Get the file format version of the pdf
|
inline |
Get the trailer dictionary which can be written unmodified to a pdf file.
|
protected |
Checks whether this pdf is linearized or not. Initializes the linearization directory on success.
|
inline |
|
protected |
Checks the magic number at the start of the pdf file and sets the m_ePdfVersion member to the correct version of the pdf file.
|
inline |
|
protected |
Merge the information of this trailer object in the parsers main trailer object.
pTrailer | take the keys to merge from this dictionary. |
void PoDoFo::PdfParser::ParseFile | ( | const char * | pBuffer, |
long | lLen, | ||
bool | bLoadOnDemand = true |
||
) |
Open a PDF file and parse it.
pBuffer | buffer containing a PDF file in memory |
lLen | length of the buffer containing the PDF file |
bLoadOnDemand | If true all objects will be read from the file at the time they are accessed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword() with the correct password in this case.
void PoDoFo::PdfParser::ParseFile | ( | const char * | pszFilename, |
bool | bLoadOnDemand = true |
||
) |
Open a PDF file and parse it.
pszFilename | filename of the file which is going to be parsed |
bLoadOnDemand | If true all objects will be read from the file at the time they are accessed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword() with the correct password in this case.
void PoDoFo::PdfParser::ParseFile | ( | const PdfRefCountedInputDevice & | rDevice, |
bool | bLoadOnDemand = true |
||
) |
Open a PDF file and parse it.
rDevice | the input device to read from |
bLoadOnDemand | If true all objects will be read from the file at the time they are accessed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory. |
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword() with the correct password in this case.
bool PoDoFo::PdfParser::QuickEncryptedCheck | ( | const char * | pszFilename | ) |
Quick method to detect secured PDF files, i.e. a PDF with an /Encrypt key in the trailer directory.
|
protected |
Reads the xref sections and the trailers of the file in the correct order in the memory and takes care for linearized pdf files.
|
protected |
Read the object with index nIndex from the object stream nObjNo and push it on the objects vector m_vecOffsets.
All objects are read from this stream and the stream object is free'd from memory. Further calls who try to read from the same stream simply do nothing.
nObjNo | object number of the stream object |
nIndex | index of the object which should be parsed |
|
protected |
Reads all objects from the pdf into memory from the offsets listed in m_vecOffsets.
If required an encryption object is setup first.
The actual reading happens in ReadObjectsInternal() either if no encryption is required or a correct encryption object was initialized from SetPassword.
|
protected |
Reads all objects from the pdf into memory from the offsets listed in m_vecOffsets.
Requires a correctly setup PdfEncrypt object with correct password.
This method is called from ReadObjects or SetPassword.
|
protected |
Read the trailer directory at the end of the file.
|
protected |
Looks for a startxref entry at the current file position and saves its byteoffset to pXRefOffset.
pXRefOffset | store the byte offset of the xref section into this variable. |
|
protected |
Reads the xref table from a pdf file. If there is no xref table, ReadXRefStreamContents() is called.
lOffset | read the table from this offset |
bPositionAtEnd | if true the xref table is not read, but the file stream is positioned directly after the table, which allows reading a following trailer dictionary. |
|
protected |
Reads an XRef stream contents object
lOffset | read the stream from this offset |
bReadOnlyTrailer | only the trailer is skipped over, the contents of the xref stream are not parsed |
|
protected |
Read a xref subsection
Throws ePdfError_NoXref if the number of objects read was not the number specified by the subsection header (as passed in ‘nNumObjects’).
nFirstObject | object number of the first object |
nNumObjects | how many objects should be read from this section |
|
inlinestatic |
Specify if the parser should ignore broken objects, i.e. XRef entries that do not point to valid objects.
Default is to ignore broken objects and to not throw an exception if one is found.
bBroken | if true broken objects will be ignored |
|
inlinestatic |
Specify the maximum number of objects the parser should read. An exception is thrown if document contains more objects than this. Use to avoid problems with very large documents with millions of objects, which use 500MB of working set and spend 15 mins in Load() before throwing an out of memory exception.
By default, the maximum object count is set to 8388607 which is the maximum number of indirect objects according to the PDF specification.
nMaxObjects | set max number of objects |
void PoDoFo::PdfParser::SetPassword | ( | const std::string & | sPassword | ) |
If you try to open an encrypted PDF file, which requires a password to open, PoDoFo will throw a PdfError( ePdfError_InvalidPassword ) exception.
If you got such an exception, you have to set a password which should be used for opening the PDF.
The usual way will be to ask the user for the password and set the password using this method.
PdfParser will immediately continue to read the PDF file.
sPassword | a user or owner password which can be used to open an encrypted PDF file If the password is invalid, a PdfError( ePdfError_InvalidPassword ) exception is thrown! |
|
inline |
Enable/disable strict parsing mode. Strict parsing is by default disabled.
If you enable strict parsing, PoDoFo will fail on a few more common PDF failures. Please note that PoDoFo's parser is by default very strict already and does not recover from e.g. wrong XREF tables.
bStrict | new setting for strict parsing mode. |
|
inline |
Gives the encryption object from the parser. The internal handle will be set to NULL and the ownership of the object is given to the caller.
Only call this if you need access to the encryption object before deleting the parser.