Class StreamScanner
- java.lang.Object
-
- com.ctc.wstx.io.WstxInputData
-
- com.ctc.wstx.sr.StreamScanner
-
- All Implemented Interfaces:
InputConfigFlags
,ParsingErrorMsgs
,InputProblemReporter
- Direct Known Subclasses:
BasicStreamReader
,MinimalDTDReader
public abstract class StreamScanner extends WstxInputData implements InputProblemReporter, InputConfigFlags, ParsingErrorMsgs
Abstract base class that defines some basic functionality that all Woodstox reader classes (main XML reader, DTD reader) extend from.
-
-
Field Summary
Fields Modifier and Type Field Description static char
CHAR_CR_LF_OR_NULL
Last (highest) char code of the three, LF, CR and NULLprotected static char
CHAR_FIRST_PURE_TEXT
Character that allows quick check of whether a char can potentially be some kind of markup, WRT input stream processing; has to contain linefeeds, &, < and > (">" only matters when quoting text, as part of "]]>")protected static char
CHAR_LOWEST_LEGAL_LOCALNAME_CHAR
First character in Unicode (ie one with lowest id) that is legal as part of a local name (all valid name chars minus ':').static int
INT_CR_LF_OR_NULL
protected boolean
mAllowXml11EscapedCharsInXml10
Flag that indicates whether all escaped chars are accepted in XML 1.0.protected Map<String,IntEntity>
mCachedEntities
Cache of internal character entities;protected boolean
mCfgNsEnabled
If true, Reader is namespace aware, and should do basic checks (usually enforcing limitations on having colons in names)protected boolean
mCfgReplaceEntities
note: left non-final on purpose: sub-class may need to modify the default value after construction.protected boolean
mCfgTreatCharRefsAsEntities
Flag for whether or not character references should be treated as entitiesprotected ReaderConfig
mConfig
Copy of the configuration object passed by the factory.protected int
mCurrDepth
This is the current depth of the input stack (same as what input element stack would return as its depth).protected EntityDecl
mCurrEntity
Entity reference stream currently points to.protected String
mCurrName
Local full name for the event, if it has one (note: element events do NOT use this variable; those names are stored in element stack): target for processing instructions.protected String
mDocInputEncoding
Input stream encoding, if known (passed in, or determined by auto-detection); null if not.protected String
mDocXmlEncoding
Character encoding from xml declaration, if any; null if no declaration, or it didn't specify encoding.protected int
mDocXmlVersion
XML version as declared by the document; one of constants fromXmlConsts
(likeXmlConsts.XML_V_10
).protected int
mEntityExpansionCount
Number of times a parsed general entity has been expanded; used for (optionally) limiting number of expansion to guard against denial-of-service attacks like "Billion Laughs".protected XMLResolver
mEntityResolver
Custom resolver used to handle external entities that are to be expanded by this reader (external param/general entity expander)protected WstxInputSource
mInput
Currently active input source; contains link to parent (nesting) input sources, if any.protected int
mInputTopDepth
protected char[]
mNameBuffer
Temporary buffer used if local name can not be just directly constructed from input buffer (name is on a boundary or such).protected boolean
mNormalizeLFs
Flag that indicates whether linefeeds in the input data are to be normalized or not.protected WstxInputSource
mRootInput
Top-most input source this reader can use; due to input source chaining, this is not necessarily the root of all input; for example, external DTD subset reader's root input still has original document input as its parent.protected int
mTokenInputCol
Column on input row that current token starts; 0-based (although in the end it'll be converted to 1-based)protected int
mTokenInputRow
Input row on which current token starts, 1-basedprotected long
mTokenInputTotal
Total number of characters read before start of current token.-
Fields inherited from class com.ctc.wstx.io.WstxInputData
CHAR_NULL, CHAR_SPACE, INT_NULL, INT_SPACE, MAX_UNICODE_CHAR, mCurrInputProcessed, mCurrInputRow, mCurrInputRowStart, mInputBuffer, mInputEnd, mInputPtr, mXml11
-
Fields inherited from interface com.ctc.wstx.cfg.InputConfigFlags
CFG_ALLOW_XML11_ESCAPED_CHARS_IN_XML10, CFG_AUTO_CLOSE_INPUT, CFG_CACHE_DTDS, CFG_CACHE_DTDS_BY_PUBLIC_ID, CFG_COALESCE_TEXT, CFG_INTERN_NAMES, CFG_INTERN_NS_URIS, CFG_LAZY_PARSING, CFG_NAMESPACE_AWARE, CFG_NORMALIZE_LFS, CFG_PRESERVE_LOCATION, CFG_REPLACE_ENTITY_REFS, CFG_REPORT_CDATA, CFG_REPORT_PROLOG_WS, CFG_SUPPORT_DTD, CFG_SUPPORT_DTDPP, CFG_SUPPORT_EXTERNAL_ENTITIES, CFG_TREAT_CHAR_REFS_AS_ENTS, CFG_VALIDATE_AGAINST_DTD, CFG_XMLID_TYPING, CFG_XMLID_UNIQ_CHECKS
-
Fields inherited from interface com.ctc.wstx.cfg.ParsingErrorMsgs
SUFFIX_EOF_EXP_NAME, SUFFIX_IN_ATTR_VALUE, SUFFIX_IN_CDATA, SUFFIX_IN_CLOSE_ELEMENT, SUFFIX_IN_COMMENT, SUFFIX_IN_DEF_ATTR_VALUE, SUFFIX_IN_DOC, SUFFIX_IN_DTD, SUFFIX_IN_DTD_EXTERNAL, SUFFIX_IN_DTD_INTERNAL, SUFFIX_IN_ELEMENT, SUFFIX_IN_ENTITY_REF, SUFFIX_IN_EPILOG, SUFFIX_IN_NAME, SUFFIX_IN_PROC_INSTR, SUFFIX_IN_PROLOG, SUFFIX_IN_TEXT, SUFFIX_IN_XML_DECL
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
StreamScanner(WstxInputSource input, ReaderConfig cfg, XMLResolver res)
Constructor used when creating a complete new (main-level) reader that does not share its input buffers or state with another reader.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
_reportProblem(XMLReporter rep, String probType, String msg, Location loc)
protected void
_reportProblem(XMLReporter rep, org.codehaus.stax2.validation.XMLValidationProblem prob)
protected void
closeAllInput(boolean force)
protected WstxException
constructFromIOE(IOException ioe)
Construct and return aXMLStreamException
to throw as a result of a failed Typed Access operation (but one not caused by a Well-Formedness Constraint or Validation Constraint problem)protected XMLStreamException
constructLimitViolation(String type, long limit)
protected WstxException
constructNullCharException()
protected WstxException
constructWfcException(String msg)
protected boolean
ensureInput(int minAmount)
Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to callloadMore()
.protected char[]
expandBy50Pct(char[] buf)
protected EntityDecl
expandEntity(String id, boolean allowExt, Object extraArg)
Helper method that will try to expand a parsed entity (parameter or generic entity).protected abstract EntityDecl
findEntity(String id, Object arg)
Abstract method for sub-classes to implement, for finding a declared general or parsed entity.protected int
fullyResolveEntity(boolean allowExt)
Method that does full resolution of an entity reference, be it character entity, internal entity or external entity, including updating of input buffers, and depending on whether result is a character entity (or one of 5 pre-defined entities), returns char in question, or null character (code 0) to indicate it had to change input source.ReaderConfig
getConfig()
WstxInputSource
getCurrentInput()
Returns current input source this source uses.org.codehaus.stax2.XMLStreamLocation2
getCurrentLocation()
protected EntityDecl
getIntEntity(int ch, char[] originalChars)
Returns an entity (possibly from cache) for the argument character using the encoded representation in mInputBuffer[entityStartPos ...protected WstxInputLocation
getLastCharLocation()
Method that returns location of the last character returned by this reader; that is, location "one less" than the currently pointed to location.abstract Location
getLocation()
Returns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).protected char[]
getNameBuffer(int minSize)
protected int
getNext()
protected int
getNextAfterWS()
Method that will skip through zero or more white space characters, and return either the character following white space, or -1 to indicate EOF (end of the outermost input source)/protected char
getNextChar(String errorMsg)
protected char
getNextCharAfterWS(String errorMsg)
protected char
getNextCharFromCurrent(String errorMsg)
Similar togetNextChar(java.lang.String)
, but will not read more characters from parent input source(s) if the current input source doesn't have more content.protected char
getNextInCurrAfterWS(String errorMsg)
protected char
getNextInCurrAfterWS(String errorMsg, char c)
protected URL
getSource()
org.codehaus.stax2.XMLStreamLocation2
getStartLocation()
protected String
getSystemId()
protected abstract void
handleIncompleteEntityProblem(WstxInputSource closing)
protected abstract void
handleUndeclaredEntity(String id)
This method gets called if a declaration for an entity was not found in entity expanding mode (enabled by default for xml reader, always enabled for dtd reader).protected void
initInputSource(WstxInputSource newInput, boolean isExt, String entityId)
Method called when an entity has been expanded (new input source has been created).protected int
inputInBuffer()
protected boolean
loadMore()
Method that will try to read one or more characters from currently open input sources; closing input sources if necessary.protected boolean
loadMore(String errorMsg)
protected boolean
loadMoreFromCurrent()
protected boolean
loadMoreFromCurrent(String errorMsg)
protected void
markLF()
protected void
markLF(int inputPtr)
protected String
parseEntityName(char c)
protected String
parseFNameForError()
Method called to read in full name, including unlimited number of namespace separators (':'), for the purpose of displaying name in an error message.protected String
parseFullName()
Method that will parse 'full' name token; what full means depends on whether reader is namespace aware or not.protected String
parseFullName(char c)
protected String
parseFullName2(int start, int hash)
protected String
parseLocalName(char c)
Method that will parse name token (roughly equivalent to XML specs; although bit lenier for more efficient handling); either uri prefix, or local name.protected String
parseLocalName2(int start, int hash)
Second part of name token parsing; called when name can continue past input buffer end (so only part was read before calling this method to read the rest).protected String
parsePublicId(char quoteChar, String errorMsg)
Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).protected String
parseSystemId(char quoteChar, boolean convertLFs, String errorMsg)
Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).protected void
parseUntil(TextBuffer tb, char endChar, boolean convertLFs, String errorMsg)
protected int
peekNext()
Similar togetNext()
, but does not advance pointer in input buffer.protected void
pushback()
Method to push back last character read; can only be called once, that is, no more than one char can be guaranteed to be succesfully returned.void
reportProblem(String probType, String format, Object arg, Object arg2)
void
reportProblem(Location loc, String probType, String format, Object arg, Object arg2)
void
reportValidationProblem(String msg)
void
reportValidationProblem(String msg, int severity)
void
reportValidationProblem(String format, Object arg, Object arg2)
void
reportValidationProblem(Location loc, String msg)
void
reportValidationProblem(org.codehaus.stax2.validation.XMLValidationProblem prob)
Note: this is the base implementation used for implementingValidationContext
protected int
resolveCharOnlyEntity(boolean checkStd)
Method called to resolve character entities, and only character entities (except that pre-defined char entities -- amp, apos, lt, gt, quote -- MAY be "char entities" in this sense, depending on arguments).protected EntityDecl
resolveNonCharEntity()
Reverse ofresolveCharOnlyEntity(boolean)
; will only resolve entity if it is NOT a character entity (or pre-defined 'generic' entity; amp, apos, lt, gt or quot).protected int
resolveSimpleEntity(boolean checkStd)
Method that tries to resolve a character entity, or (if caller so specifies), a pre-defined internal entity (lt, gt, amp, apos, quot).protected boolean
skipCRLF(char c)
Method called when a CR has been spotted in input; checks if next char is LF, and if so, skips it.protected int
skipFullName(char c)
Note: does not check for number of colons, amongst other things.protected void
throwFromIOE(IOException ioe)
protected void
throwFromStrE(XMLStreamException strex)
protected void
throwInvalidSpace(int i)
protected WstxException
throwInvalidSpace(int i, boolean deferErrors)
protected void
throwLazyError(Exception e)
Method called to report an error, when caller's signature only allows runtime exceptions to be thrown.protected void
throwNullChar()
protected void
throwNullParent(WstxInputSource curr)
void
throwParseError(String msg)
void
throwParseError(String format, Object arg, Object arg2)
Throws generic parse error with specified message and current parsing location.protected void
throwUnexpectedChar(int i, String msg)
protected void
throwUnexpectedEOB(String msg)
Similar tothrowUnexpectedEOF(java.lang.String)
, but only indicates ending of an input block.protected void
throwUnexpectedEOF(String msg)
WstxException
throwWfcException(String msg, boolean deferErrors)
protected String
tokenTypeDesc(int type)
protected void
verifyLimit(String type, long maxValue, long currentValue)
-
Methods inherited from class com.ctc.wstx.io.WstxInputData
copyBufferStateFrom, findIllegalNameChar, findIllegalNmtokenChar, getCharDesc, isNameChar, isNameChar, isNameStartChar, isNameStartChar, isSpaceChar
-
-
-
-
Field Detail
-
CHAR_CR_LF_OR_NULL
public static final char CHAR_CR_LF_OR_NULL
Last (highest) char code of the three, LF, CR and NULL- See Also:
- Constant Field Values
-
INT_CR_LF_OR_NULL
public static final int INT_CR_LF_OR_NULL
- See Also:
- Constant Field Values
-
CHAR_FIRST_PURE_TEXT
protected static final char CHAR_FIRST_PURE_TEXT
Character that allows quick check of whether a char can potentially be some kind of markup, WRT input stream processing; has to contain linefeeds, &, < and > (">" only matters when quoting text, as part of "]]>")- See Also:
- Constant Field Values
-
CHAR_LOWEST_LEGAL_LOCALNAME_CHAR
protected static final char CHAR_LOWEST_LEGAL_LOCALNAME_CHAR
First character in Unicode (ie one with lowest id) that is legal as part of a local name (all valid name chars minus ':'). Used for doing quick check for local name end; usually name ends in a whitespace or equals sign.- See Also:
- Constant Field Values
-
mConfig
protected final ReaderConfig mConfig
Copy of the configuration object passed by the factory. Contains immutable settings for this reader (or in case of DTD parsers, reader that uses it)
-
mCfgNsEnabled
protected final boolean mCfgNsEnabled
If true, Reader is namespace aware, and should do basic checks (usually enforcing limitations on having colons in names)
-
mCfgReplaceEntities
protected boolean mCfgReplaceEntities
note: left non-final on purpose: sub-class may need to modify the default value after construction.
-
mCurrName
protected String mCurrName
Local full name for the event, if it has one (note: element events do NOT use this variable; those names are stored in element stack): target for processing instructions.Currently used for proc. instr. target, and entity name (at least when current entity reference is null).
Note: this variable is generally not cleared, since it comes from a symbol table, ie. this won't be the only reference.
-
mInput
protected WstxInputSource mInput
Currently active input source; contains link to parent (nesting) input sources, if any.
-
mRootInput
protected final WstxInputSource mRootInput
Top-most input source this reader can use; due to input source chaining, this is not necessarily the root of all input; for example, external DTD subset reader's root input still has original document input as its parent.
-
mEntityResolver
protected XMLResolver mEntityResolver
Custom resolver used to handle external entities that are to be expanded by this reader (external param/general entity expander)
-
mCurrDepth
protected int mCurrDepth
This is the current depth of the input stack (same as what input element stack would return as its depth). It is used to enforce input scope constraints for nesting of elements (for xml reader) and dtd declaration (for dtd reader) with regards to input block (entity expansion) boundaries.Basically this value is compared to
mInputTopDepth
, which indicates what was the depth at the point where the currently active input scope/block was started.
-
mInputTopDepth
protected int mInputTopDepth
-
mEntityExpansionCount
protected int mEntityExpansionCount
Number of times a parsed general entity has been expanded; used for (optionally) limiting number of expansion to guard against denial-of-service attacks like "Billion Laughs".- Since:
- 4.3
-
mNormalizeLFs
protected boolean mNormalizeLFs
Flag that indicates whether linefeeds in the input data are to be normalized or not. Xml specs mandate that the line feeds are only normalized when they are from the external entities (main doc, external general/parsed entities), so normalization has to be suppressed when expanding internal general/parsed entities.
-
mAllowXml11EscapedCharsInXml10
protected boolean mAllowXml11EscapedCharsInXml10
Flag that indicates whether all escaped chars are accepted in XML 1.0.- Since:
- 5.2
-
mNameBuffer
protected char[] mNameBuffer
Temporary buffer used if local name can not be just directly constructed from input buffer (name is on a boundary or such).
-
mTokenInputTotal
protected long mTokenInputTotal
Total number of characters read before start of current token. For big (gigabyte-sized) sizes are possible, needs to be long, unlike pointers and sizes related to in-memory buffers.
-
mTokenInputRow
protected int mTokenInputRow
Input row on which current token starts, 1-based
-
mTokenInputCol
protected int mTokenInputCol
Column on input row that current token starts; 0-based (although in the end it'll be converted to 1-based)
-
mDocInputEncoding
protected String mDocInputEncoding
Input stream encoding, if known (passed in, or determined by auto-detection); null if not.
-
mDocXmlEncoding
protected String mDocXmlEncoding
Character encoding from xml declaration, if any; null if no declaration, or it didn't specify encoding.
-
mDocXmlVersion
protected int mDocXmlVersion
XML version as declared by the document; one of constants fromXmlConsts
(likeXmlConsts.XML_V_10
).
-
mCachedEntities
protected Map<String,IntEntity> mCachedEntities
Cache of internal character entities;
-
mCfgTreatCharRefsAsEntities
protected boolean mCfgTreatCharRefsAsEntities
Flag for whether or not character references should be treated as entities
-
mCurrEntity
protected EntityDecl mCurrEntity
Entity reference stream currently points to.
-
-
Constructor Detail
-
StreamScanner
protected StreamScanner(WstxInputSource input, ReaderConfig cfg, XMLResolver res)
Constructor used when creating a complete new (main-level) reader that does not share its input buffers or state with another reader.
-
-
Method Detail
-
getConfig
public ReaderConfig getConfig()
- Since:
- 5.2
-
getLastCharLocation
protected WstxInputLocation getLastCharLocation()
Method that returns location of the last character returned by this reader; that is, location "one less" than the currently pointed to location.
-
getSource
protected URL getSource() throws IOException
- Throws:
IOException
-
getSystemId
protected String getSystemId()
-
getLocation
public abstract Location getLocation()
Returns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).- Specified by:
getLocation
in interfaceInputProblemReporter
-
getStartLocation
public org.codehaus.stax2.XMLStreamLocation2 getStartLocation()
-
getCurrentLocation
public org.codehaus.stax2.XMLStreamLocation2 getCurrentLocation()
-
throwWfcException
public WstxException throwWfcException(String msg, boolean deferErrors) throws WstxException
- Throws:
WstxException
-
throwParseError
public void throwParseError(String msg) throws XMLStreamException
- Specified by:
throwParseError
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
throwParseError
public void throwParseError(String format, Object arg, Object arg2) throws XMLStreamException
Throws generic parse error with specified message and current parsing location.Note: public access only because core code in other packages needs to access it.
- Specified by:
throwParseError
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
reportProblem
public void reportProblem(String probType, String format, Object arg, Object arg2) throws XMLStreamException
- Throws:
XMLStreamException
-
reportProblem
public void reportProblem(Location loc, String probType, String format, Object arg, Object arg2) throws XMLStreamException
- Specified by:
reportProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
_reportProblem
protected void _reportProblem(XMLReporter rep, String probType, String msg, Location loc) throws XMLStreamException
- Throws:
XMLStreamException
-
_reportProblem
protected void _reportProblem(XMLReporter rep, org.codehaus.stax2.validation.XMLValidationProblem prob) throws XMLStreamException
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(org.codehaus.stax2.validation.XMLValidationProblem prob) throws XMLStreamException
Note: this is the base implementation used for implementing
ValidationContext
- Specified by:
reportValidationProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(String msg, int severity) throws XMLStreamException
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(String msg) throws XMLStreamException
- Specified by:
reportValidationProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(Location loc, String msg) throws XMLStreamException
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(String format, Object arg, Object arg2) throws XMLStreamException
- Specified by:
reportValidationProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
constructWfcException
protected WstxException constructWfcException(String msg)
-
constructFromIOE
protected WstxException constructFromIOE(IOException ioe)
Construct and return aXMLStreamException
to throw as a result of a failed Typed Access operation (but one not caused by a Well-Formedness Constraint or Validation Constraint problem)
-
constructNullCharException
protected WstxException constructNullCharException()
-
throwUnexpectedChar
protected void throwUnexpectedChar(int i, String msg) throws WstxException
- Throws:
WstxException
-
throwNullChar
protected void throwNullChar() throws WstxException
- Throws:
WstxException
-
throwInvalidSpace
protected void throwInvalidSpace(int i) throws WstxException
- Throws:
WstxException
-
throwInvalidSpace
protected WstxException throwInvalidSpace(int i, boolean deferErrors) throws WstxException
- Throws:
WstxException
-
throwUnexpectedEOF
protected void throwUnexpectedEOF(String msg) throws WstxException
- Throws:
WstxException
-
throwUnexpectedEOB
protected void throwUnexpectedEOB(String msg) throws WstxException
Similar tothrowUnexpectedEOF(java.lang.String)
, but only indicates ending of an input block. Used when reading a token that can not span input block boundaries (ie. can not continue past end of an entity expansion).- Throws:
WstxException
-
throwFromIOE
protected void throwFromIOE(IOException ioe) throws WstxException
- Throws:
WstxException
-
throwFromStrE
protected void throwFromStrE(XMLStreamException strex) throws WstxException
- Throws:
WstxException
-
throwLazyError
protected void throwLazyError(Exception e)
Method called to report an error, when caller's signature only allows runtime exceptions to be thrown.
-
tokenTypeDesc
protected String tokenTypeDesc(int type)
-
getCurrentInput
public final WstxInputSource getCurrentInput()
Returns current input source this source uses.Note: public only because some implementations are on different package.
-
inputInBuffer
protected final int inputInBuffer()
-
getNext
protected final int getNext() throws XMLStreamException
- Throws:
XMLStreamException
-
peekNext
protected final int peekNext() throws XMLStreamException
Similar togetNext()
, but does not advance pointer in input buffer.Note: this method only peeks within current input source; it does not close it and check nested input source (if any). This is necessary when checking keywords, since they can never cross input block boundary.
- Throws:
XMLStreamException
-
getNextChar
protected final char getNextChar(String errorMsg) throws XMLStreamException
- Throws:
XMLStreamException
-
getNextCharFromCurrent
protected final char getNextCharFromCurrent(String errorMsg) throws XMLStreamException
Similar togetNextChar(java.lang.String)
, but will not read more characters from parent input source(s) if the current input source doesn't have more content. This is often needed to prevent "runaway" content, such as comments that start in an entity but do not have matching close marker inside entity; XML specification specifically states such markup is not legal.- Throws:
XMLStreamException
-
getNextAfterWS
protected final int getNextAfterWS() throws XMLStreamException
Method that will skip through zero or more white space characters, and return either the character following white space, or -1 to indicate EOF (end of the outermost input source)/- Throws:
XMLStreamException
-
getNextCharAfterWS
protected final char getNextCharAfterWS(String errorMsg) throws XMLStreamException
- Throws:
XMLStreamException
-
getNextInCurrAfterWS
protected final char getNextInCurrAfterWS(String errorMsg) throws XMLStreamException
- Throws:
XMLStreamException
-
getNextInCurrAfterWS
protected final char getNextInCurrAfterWS(String errorMsg, char c) throws XMLStreamException
- Throws:
XMLStreamException
-
skipCRLF
protected final boolean skipCRLF(char c) throws XMLStreamException
Method called when a CR has been spotted in input; checks if next char is LF, and if so, skips it. Note that next character has to come from the current input source, to qualify; it can never come from another (nested) input source.- Returns:
- True, if passed in char is '\r' and next one is '\n'.
- Throws:
XMLStreamException
-
markLF
protected final void markLF()
-
markLF
protected final void markLF(int inputPtr)
-
pushback
protected final void pushback()
Method to push back last character read; can only be called once, that is, no more than one char can be guaranteed to be succesfully returned.
-
initInputSource
protected void initInputSource(WstxInputSource newInput, boolean isExt, String entityId) throws XMLStreamException
Method called when an entity has been expanded (new input source has been created). Needs to initialize location information and change active input source.- Parameters:
entityId
- Name of the entity being expanded- Throws:
XMLStreamException
-
loadMore
protected boolean loadMore() throws XMLStreamException
Method that will try to read one or more characters from currently open input sources; closing input sources if necessary.- Returns:
- true if reading succeeded (or may succeed), false if we reached EOF.
- Throws:
XMLStreamException
-
loadMore
protected final boolean loadMore(String errorMsg) throws XMLStreamException
- Throws:
XMLStreamException
-
loadMoreFromCurrent
protected boolean loadMoreFromCurrent() throws XMLStreamException
- Throws:
XMLStreamException
-
loadMoreFromCurrent
protected final boolean loadMoreFromCurrent(String errorMsg) throws XMLStreamException
- Throws:
XMLStreamException
-
ensureInput
protected boolean ensureInput(int minAmount) throws XMLStreamException
Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to callloadMore()
. It can only be called when input comes from main-level buffer; further, call can shift content in input buffer, so caller has to flush any data still pending. In short, caller has to know exactly what it's doing. :-)Note: method does not check for any other input sources than the current one -- if current source can not fulfill the request, a failure is indicated.
- Returns:
- true if there's now enough data; false if not (EOF)
- Throws:
XMLStreamException
-
closeAllInput
protected void closeAllInput(boolean force) throws XMLStreamException
- Throws:
XMLStreamException
-
throwNullParent
protected void throwNullParent(WstxInputSource curr)
- Parameters:
curr
- Input source currently in use
-
resolveSimpleEntity
protected int resolveSimpleEntity(boolean checkStd) throws XMLStreamException
Method that tries to resolve a character entity, or (if caller so specifies), a pre-defined internal entity (lt, gt, amp, apos, quot). It will succeed iff:- Entity in question is a simple character entity (either one of 5 pre-defined ones, or using decimal/hex notation), AND
- Entity fits completely inside current input buffer.
Note: On entry we are guaranteed there are at least 3 more characters in this buffer; otherwise we shouldn't be called.
- Parameters:
checkStd
- If true, will check pre-defined internal entities (gt, lt, amp, apos, quot); if false, will only check actual character entities.- Returns:
- (Valid) character value, if entity is a character reference, and could be resolved from current input buffer (does not span buffer boundary); null char (code 0) if not (either non-char entity, or spans input buffer boundary).
- Throws:
XMLStreamException
-
resolveCharOnlyEntity
protected int resolveCharOnlyEntity(boolean checkStd) throws XMLStreamException
Method called to resolve character entities, and only character entities (except that pre-defined char entities -- amp, apos, lt, gt, quote -- MAY be "char entities" in this sense, depending on arguments). Otherwise it is to return the null char; if so, the input pointer will point to the same point as when method entered (char after ampersand), plus the ampersand itself is guaranteed to be in the input buffer (so caller can just push it back if necessary).Most often this method is called when reader is not to expand non-char entities automatically, but to return them as separate events.
Main complication here is that we need to do 5-char lookahead. This is problematic if chars are on input buffer boundary. This is ok for the root level input buffer, but not for some nested buffers. However, according to XML specs, such split entities are actually illegal... so we can throw an exception in those cases.
- Parameters:
checkStd
- If true, will check pre-defined internal entities (gt, lt, amp, apos, quot) as character entities; if false, will only check actual 'real' character entities.- Returns:
- (Valid) character value, if entity is a character reference, and could be resolved from current input buffer (does not span buffer boundary); null char (code 0) if not (either non-char entity, or spans input buffer boundary).
- Throws:
XMLStreamException
-
resolveNonCharEntity
protected EntityDecl resolveNonCharEntity() throws XMLStreamException
Reverse ofresolveCharOnlyEntity(boolean)
; will only resolve entity if it is NOT a character entity (or pre-defined 'generic' entity; amp, apos, lt, gt or quot). Only used in cases where entities are to be separately returned unexpanded (in non-entity-replacing mode); which means it's never called from dtd handler.- Throws:
XMLStreamException
-
fullyResolveEntity
protected int fullyResolveEntity(boolean allowExt) throws XMLStreamException
Method that does full resolution of an entity reference, be it character entity, internal entity or external entity, including updating of input buffers, and depending on whether result is a character entity (or one of 5 pre-defined entities), returns char in question, or null character (code 0) to indicate it had to change input source.- Parameters:
allowExt
- If true, is allowed to expand external entities (expanding text); if false, is not (expanding attribute value).- Returns:
- Either single-character replacement (which is NOT to be reparsed), or null char (0) to indicate expansion is done via input source.
- Throws:
XMLStreamException
-
getIntEntity
protected EntityDecl getIntEntity(int ch, char[] originalChars)
Returns an entity (possibly from cache) for the argument character using the encoded representation in mInputBuffer[entityStartPos ... mInputPtr-1].
-
expandEntity
protected EntityDecl expandEntity(String id, boolean allowExt, Object extraArg) throws XMLStreamException
Helper method that will try to expand a parsed entity (parameter or generic entity).note: called by sub-classes (dtd parser), needs to be protected.
- Parameters:
id
- Name of the entity being expandedallowExt
- Whether external entities can be expanded or not; if not, and the entity to expand would be external one, an exception will be thrown- Throws:
XMLStreamException
-
findEntity
protected abstract EntityDecl findEntity(String id, Object arg) throws XMLStreamException
Abstract method for sub-classes to implement, for finding a declared general or parsed entity.- Parameters:
id
- Identifier of the entity to findarg
- Optional argument passed from caller; needed by DTD reader.- Throws:
XMLStreamException
-
handleUndeclaredEntity
protected abstract void handleUndeclaredEntity(String id) throws XMLStreamException
This method gets called if a declaration for an entity was not found in entity expanding mode (enabled by default for xml reader, always enabled for dtd reader).- Throws:
XMLStreamException
-
handleIncompleteEntityProblem
protected abstract void handleIncompleteEntityProblem(WstxInputSource closing) throws XMLStreamException
- Throws:
XMLStreamException
-
parseLocalName
protected String parseLocalName(char c) throws XMLStreamException
Method that will parse name token (roughly equivalent to XML specs; although bit lenier for more efficient handling); either uri prefix, or local name.Much of complexity in this method has to do with the intention to try to avoid any character copies. In this optimal case algorithm would be fairly simple. However, this only works if all data is already in input buffer... if not, copy has to be made halfway through parsing, and that complicates things.
One thing to note is that String returned has been canonicalized and (if necessary) added to symbol table. It can thus be compared against other such (usually id) Strings, with simple equality operator.
- Parameters:
c
- First character of the name; not yet checked for validity- Returns:
- Canonicalized name String (which may have length 0, if EOF or non-name-start char encountered)
- Throws:
XMLStreamException
-
parseLocalName2
protected String parseLocalName2(int start, int hash) throws XMLStreamException
Second part of name token parsing; called when name can continue past input buffer end (so only part was read before calling this method to read the rest).Note that this isn't heavily optimized, on assumption it's not called very often.
- Throws:
XMLStreamException
-
parseFullName
protected String parseFullName() throws XMLStreamException
Method that will parse 'full' name token; what full means depends on whether reader is namespace aware or not. If it is, full name means local name with no namespace prefix (PI target, entity/notation name); if not, name can contain arbitrary number of colons. Note that element and attribute names are NOT parsed here, so actual namespace prefix separation can be handled properly there.Similar to
parseLocalName(char)
, much of complexity stems from trying to avoid copying name characters from input buffer.Note that returned String will be canonicalized, similar to
parseLocalName(char)
, but without separating prefix/local name.- Returns:
- Canonicalized name String (which may have length 0, if EOF or non-name-start char encountered)
- Throws:
XMLStreamException
-
parseFullName
protected String parseFullName(char c) throws XMLStreamException
- Throws:
XMLStreamException
-
parseFullName2
protected String parseFullName2(int start, int hash) throws XMLStreamException
- Throws:
XMLStreamException
-
parseFNameForError
protected String parseFNameForError() throws XMLStreamException
Method called to read in full name, including unlimited number of namespace separators (':'), for the purpose of displaying name in an error message. Won't do any further validations, and parsing is not optimized: main need is just to get more meaningful error messages.- Throws:
XMLStreamException
-
parseEntityName
protected final String parseEntityName(char c) throws XMLStreamException
- Throws:
XMLStreamException
-
skipFullName
protected int skipFullName(char c) throws XMLStreamException
Note: does not check for number of colons, amongst other things. Main idea is to skip through what superficially seems like a valid id, nothing more. This is only done when really skipping through something we do not care about at all: not even whether names/ids would be valid (for example, when ignoring internal DTD subset).- Returns:
- Length of skipped name.
- Throws:
XMLStreamException
-
parseSystemId
protected final String parseSystemId(char quoteChar, boolean convertLFs, String errorMsg) throws XMLStreamException
Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).NOTE: returned String is not canonicalized, on assumption that external ids may be longish, and are not shared all that often, as they are generally just used for resolving paths, if anything.
Also note that this method is not heavily optimized, as it's not likely to be a bottleneck for parsing.- Throws:
XMLStreamException
-
parsePublicId
protected final String parsePublicId(char quoteChar, String errorMsg) throws XMLStreamException
Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).As per xml specs, the contents are actually normalized.
NOTE: returned String is not canonicalized, on assumption that external ids may be longish, and are not shared all that often, as they are generally just used for resolving paths, if anything.
Also note that this method is not heavily optimized, as it's not likely to be a bottleneck for parsing.- Throws:
XMLStreamException
-
parseUntil
protected final void parseUntil(TextBuffer tb, char endChar, boolean convertLFs, String errorMsg) throws XMLStreamException
- Throws:
XMLStreamException
-
getNameBuffer
protected final char[] getNameBuffer(int minSize)
-
expandBy50Pct
protected final char[] expandBy50Pct(char[] buf)
-
verifyLimit
protected void verifyLimit(String type, long maxValue, long currentValue) throws XMLStreamException
- Throws:
XMLStreamException
-
constructLimitViolation
protected XMLStreamException constructLimitViolation(String type, long limit) throws XMLStreamException
- Throws:
XMLStreamException
-
-