Class PreflightParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.COSParser
org.apache.pdfbox.pdfparser.PDFParser
org.apache.pdfbox.preflight.parser.PreflightParser
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected PreflightContext
protected DataSource
static final Charset
Define a one byte encoding that hasn't specific encoding in UTF-8 charset.protected PreflightDocument
protected ValidationResult
Fields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, source, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX, xrefTrailerResolver
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, STREAM_STRING, T
-
Constructor Summary
ConstructorsConstructorDescriptionPreflightParser
(File file) Constructor.PreflightParser
(File file, ScratchFile scratch) Constructor.PreflightParser
(String filename) Constructor.PreflightParser
(String filename, ScratchFile scratch) Constructor.PreflightParser
(DataSource dataSource) Constructor.PreflightParser
(DataSource dataSource, ScratchFile scratch) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Add the error to the ValidationResult.protected void
protected void
'endstream' must be preceded by an EOLprotected void
Check that the PDF header match rules of the PDF/A specification.protected void
'stream' must be followed by <CR><LF> or only <LF>protected void
Create a validation context.protected void
createPdfADocument
(Format format, PreflightConfiguration config) protected static ValidationResult
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)This will get the PD document that was parsed.protected void
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.protected int
lastIndexOf
(char[] pattern, byte[] buf, int endOff) Searches last appearance of pattern within buffer.private boolean
void
parse()
This will parse the stream and populate the COSDocument object.void
Parse the given file and check if it is a confirming file according to the given format.void
parse
(Format format, PreflightConfiguration config) Parse the given file and check if it is a confirming file according to the given format.protected COSArray
This will parse a PDF array object.protected COSName
This will parse a PDF name from the stream.protected COSStream
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)
to check rules on 'stream' and 'endstream' keywords.protected COSString
Check that the hexa string contains only an even number of Hexadecimal characters.protected COSBase
CallBaseParser.parseDirObject()
check limit range for Float, Integer and number of Dictionary entries.protected COSBase
parseObjectDynamically
(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) This will parse the next object from the stream and add it to the local state.protected boolean
parseXrefTable
(long startByteOffset) Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so onMethods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, getAccessPermission, getDocument, getEncryption, getStartxrefOffset, isCatalog, isLenient, parseDictObjects, parseFDFHeader, parseObjectDynamically, parsePDFHeader, parseTrailerValuesDynamically, parseXref, rebuildTrailer, retrieveTrailer, setEOFLookupRange, setLenient
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
-
Field Details
-
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816 -
dataSource
-
validationResult
-
preflightDocument
-
ctx
-
-
Constructor Details
-
PreflightParser
Constructor.- Parameters:
file
-- Throws:
IOException
- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
file
-scratch
-- Throws:
IOException
- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
filename
-- Throws:
IOException
- if there is a reading error.
-
PreflightParser
Constructor.- Parameters:
filename
-scratch
-- Throws:
IOException
- if there is a reading error.
-
PreflightParser
Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.- Parameters:
dataSource
- the datasource- Throws:
IOException
- if there is a reading error.
-
PreflightParser
Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.- Parameters:
dataSource
- the datasourcescratch
-- Throws:
IOException
- if there is a reading error.
-
-
Method Details
-
createUnknownErrorResult
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)- Returns:
- the ValidationError instance.
-
addValidationError
Add the error to the ValidationResult. If the validationResult is null, an instance is created using the isWarning boolean of the ValidationError to know if the ValidationResult must be flagged as Valid.- Parameters:
error
-
-
addValidationErrors
-
parse
Description copied from class:PDFParser
This will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.- Overrides:
parse
in classPDFParser
- Throws:
InvalidPasswordException
- If the password is incorrect.IOException
- If there is an error reading from the stream or corrupt data is found.
-
parse
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format
- format that the document should follow (defaultFormat.PDF_A1B
)- Throws:
IOException
-
parse
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format
- format that the document should follow (defaultFormat.PDF_A1B
)config
- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.- Throws:
IOException
-
createPdfADocument
- Throws:
IOException
-
createContext
protected void createContext()Create a validation context. This context is set to the PreflightDocument. -
getPDDocument
Description copied from class:PDFParser
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Overrides:
getPDDocument
in classPDFParser
- Returns:
- The document at the PD layer.
- Throws:
IOException
- If there is an error getting the document.
-
getPreflightDocument
- Throws:
IOException
-
initialParse
Description copied from class:PDFParser
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Overrides:
initialParse
in classPDFParser
- Throws:
InvalidPasswordException
- If the password is incorrect.IOException
- If something went wrong.
-
checkPdfHeader
protected void checkPdfHeader()Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80 -
parseXrefTable
Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on- Overrides:
parseXrefTable
in classCOSParser
- Parameters:
startByteOffset
- the offset to start at- Returns:
- false on parsing error
- Throws:
IOException
- If an IO error occurs.
-
parseCOSStream
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)
to check rules on 'stream' and 'endstream' keywords.checkStreamKeyWord()
andcheckEndstreamKeyWord()
- Overrides:
parseCOSStream
in classCOSParser
- Parameters:
dic
- dictionary that goes with this stream.- Returns:
- parsed pdf stream.
- Throws:
IOException
- if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
-
checkStreamKeyWord
'stream' must be followed by <CR><LF> or only <LF>- Throws:
IOException
-
checkEndstreamKeyWord
'endstream' must be preceded by an EOL- Throws:
IOException
-
nextIsEOL
- Throws:
IOException
-
parseCOSArray
Description copied from class:BaseParser
This will parse a PDF array object.- Overrides:
parseCOSArray
in classBaseParser
- Returns:
- The parsed PDF array.
- Throws:
IOException
- If there is an error parsing the stream.
-
parseCOSName
Description copied from class:BaseParser
This will parse a PDF name from the stream.- Overrides:
parseCOSName
in classBaseParser
- Returns:
- The parsed PDF name.
- Throws:
IOException
- If there is an error reading from the stream.
-
parseCOSString
Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and callBaseParser.parseCOSString()
- Overrides:
parseCOSString
in classBaseParser
- Returns:
- The parsed PDF string.
- Throws:
IOException
- If there is an error reading from the stream.
-
parseDirObject
CallBaseParser.parseDirObject()
check limit range for Float, Integer and number of Dictionary entries.- Overrides:
parseDirObject
in classBaseParser
- Returns:
- The parsed object.
- Throws:
IOException
- if there is an error during parsing.
-
parseObjectDynamically
protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException Description copied from class:COSParser
This will parse the next object from the stream and add it to the local state. It's reduced to parsing an indirect object.- Overrides:
parseObjectDynamically
in classCOSParser
- Parameters:
objNr
- object number of object to be parsedobjGenNr
- object generation number of object to be parsedrequireExistingNotCompressedObj
- iftrue
the object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)- Returns:
- the parsed object (which is also added to document object)
- Throws:
IOException
- If an IO error occurs.
-
lastIndexOf
protected int lastIndexOf(char[] pattern, byte[] buf, int endOff) Description copied from class:COSParser
Searches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.- Overrides:
lastIndexOf
in classCOSParser
- Parameters:
pattern
- pattern to search forbuf
- buffer to search pattern inendOff
- offset (exclusive) where lookup starts at- Returns:
- start offset of pattern within buffer or
-1
if pattern could not be found
-