Class XMLInputFactory2

java.lang.Object
javax.xml.stream.XMLInputFactory
org.codehaus.stax2.XMLInputFactory2
All Implemented Interfaces:
XMLStreamProperties

public abstract class XMLInputFactory2 extends XMLInputFactory implements XMLStreamProperties
Extension of XMLInputFactory that adds some convenience factory methods as new standard properties that conforming stream reader factory and instance implementations need to recognize, and preferably support. There are also some profile-based configuration methods which allow implementations to set proper goal-based values for custom properties.
NOTE: although actual values for the property names are visible, implementations should try to use the symbolic constants defined here instead, to avoid typos.
Version:
3.0 01/21/2007
  • Field Details

    • P_REPORT_PROLOG_WHITESPACE

      public static final String P_REPORT_PROLOG_WHITESPACE
      Whether reader will generate 'ignorable white space' events during prolog and epilog (before and after the main XML root element); if true, will generate those events; if false, will just ignore white space in these parts of the parsed document.

      Turning this feature off may give slight performance improvement, although usually effect should be negligible. This option is usually only turned on when round-trip output should be as similar to input as possible.

      Default value for this setting is implementation dependant.

      See Also:
    • P_REPORT_CDATA

      public static final String P_REPORT_CDATA
      Whether cursor-based reader will ever generate CDATA events; if true, CDATA events may be generated for non-coalesced CDATA sections. If false, all CDATA sections are reported as CHARACTERS types. It may still be possible for event methods to distinguish between underlying type, but event type code will be reported as CHARACTERS.

      State of property does not have any effect on performance.

      Default value for this setting is implementation dependant.

      See Also:
    • P_LAZY_PARSING

      public static final String P_LAZY_PARSING
      Whether stream readers are allowed to do lazy parsing, meaning to parse minimal part of the event when XMLStreamReader.next() is called, and only parse the rest as needed (or skip remainder of no extra information is needed). Alternative to lazy parsing is called "eager parsing", and is what most xml parsers use by default.

      Enabling lazy parsing can improve performance for tasks where number of textual events are skipped. The downside is that not all well-formedness problems are reported when XMLStreamReader.next() is called, but only when the rest of event are read or skipped.

      Default value for this setting is implementation dependant.

      See Also:
    • P_INTERN_NAMES

      public static final String P_INTERN_NAMES
      Whether name symbols (element, attribute, entity and notation names, namespace prefixes) stream reader returns are guaranteed to have been String.intern()ed. Interning generally makes access faster (both internal and externally), and saves memory, but can add some overhead for processing. It may also be problematic for large symbol spaces; especially if xml content has unbounded value space for names.

      Default value for this setting is implementation dependant. Additionally implementations may have use different default for different types of stream readers.

      See Also:
    • P_INTERN_NS_URIS

      public static final String P_INTERN_NS_URIS
      Whether namespace URIs stream reader returns are guaranteed to have been String.intern()ed. Interning can make access by fully-qualified name faster as well as save memory, but it can also add some overhead when encountering a namespace URI for the first time.

      Default value for this setting is implementation dependant.

      See Also:
    • P_PRESERVE_LOCATION

      public static final String P_PRESERVE_LOCATION
      Property that determines whether stream reader instances are required to try to keep track of the parser Location in the input documents.

      When turned on, the stream reader should try to do its best to keep track of the locations, to be able to properly create XMLEvent objects with accurate Location information. Similarly, implementation should keep track of the location for error reporting purposes, and include this information within XMLStreamException instances.

      When turned off, implementations are allowed to optimize things, and only keep/pass partial Location information, or even none at all. Implementations are still encouraged to keep some location information for error reporting purposes, even if they do not maintain accurate XMLEvent locations, or exact byte/character offsets.

      Default value for this setting is true.

      See Also:
    • P_AUTO_CLOSE_INPUT

      public static final String P_AUTO_CLOSE_INPUT
      Whether stream reader is to close the underlying input source (input stream, reader) when stream reader is closed. Basic StAX2 specification mandates this feature to be set to false by default (for sources that are passed by the application).

      Note: if set to true, readers are also allowed (but not required) to close the underlying input source when they do not need it any more, for example when encountering EOF, or when throwing an unrecoverable parsing exception

      See Also:
    • P_DTD_OVERRIDE

      public static final String P_DTD_OVERRIDE
      Property used to specify the source for DTD external subset to use instead of DTD specified by the XML document itself (if any).
      See Also:
  • Constructor Details

    • XMLInputFactory2

      protected XMLInputFactory2()
  • Method Details

    • createXMLEventReader

      public abstract XMLEventReader2 createXMLEventReader(URL src) throws XMLStreamException
      Factory method that allows for parsing a document accessible via specified URL. Note that URL may refer to all normal URL accessible resources, from files to web- and ftp-accessible documents.
      Throws:
      XMLStreamException
    • createXMLEventReader

      public abstract XMLEventReader2 createXMLEventReader(File f) throws XMLStreamException
      Convenience factory method that allows for parsing a document stored in the specified file.
      Throws:
      XMLStreamException
    • createXMLStreamReader

      public abstract XMLStreamReader2 createXMLStreamReader(URL src) throws XMLStreamException
      Factory method that allows for parsing a document accessible via specified URL. Note that URL may refer to all normal URL accessible resources, from files to web- and ftp-accessible documents.
      Throws:
      XMLStreamException
    • createXMLStreamReader

      public abstract XMLStreamReader2 createXMLStreamReader(File f) throws XMLStreamException
      Convenience factory method that allows for parsing a document stored in the specified file.
      Throws:
      XMLStreamException
    • configureForXmlConformance

      public abstract void configureForXmlConformance()
      Method to call to make Reader created conform as closely to XML standard as possible, doing all checks and transformations mandated by the XML specification (linefeed conversions, attr value normalizations).

      Regarding the default StAX property settings, implementations are suggested to do following:

      • Enable SUPPORT_DTD property.
      • Enable IS_NAMESPACE_AWARE
      • Enable IS_REPLACING_ENTITY_REFERENCES
      • Enable IS_SUPPORTING_EXTERNAL_ENTITIES
      All the other standard settings should be left as is.

      In addition, implementations should set implementation-dependant settings appropriately, to be as strict as possible with regards to XML specification mandated checks and transformations.

    • configureForConvenience

      public abstract void configureForConvenience()
      Method to call to make Reader created be as "convenient" to use as possible; ie try to avoid having to deal with some of things like segmented text chunks. This may incur some slight performance penalties, but should not affect XML conformance.

      Regarding the default StAX property settings, implementations are suggested to do following:

      • Enable IS_COALESCING (text coalescing)
      • Enable IS_REPLACING_ENTITY_REFERENCES
      • Disable P_REPORT_PROLOG_WHITESPACE (StAX2); so that the application need not skip possible SPACE (and COMMENT, PROCESSING_INSTRUCTION) events.
      • Enable P_REPORT_ALL_TEXT_AS_CHARACTERS (StAX2)
      • Enable P_PRESERVE_LOCATION (StAX2)
      All the other standard settings should be left as is.

      In addition, implementations should set implementation-dependant settings appropriately as well.

    • configureForSpeed

      public abstract void configureForSpeed()
      Method to call to make the Reader created be as fast as possible reading documents, especially for long-running processes where caching is likely to help. This means reducing amount of information collected (ignorable white space in prolog/epilog, accurate Location information for Event API), and possibly even including simplifying handling of XML-specified transformations (skip attribute value and text linefeed normalization). Potential downsides are somewhat increased memory usage (for full-sized input buffers), and reduced XML conformance (will not do some of transformations).

      Regarding the default StAX property settings, implementations are suggested to do following:

      • Disable IS_COALESCING (text coalescing)
      • Disable P_PRESERVE_LOCATION (StAX2)
      • Disable P_REPORT_PROLOG_WHITESPACE (StAX2)
      • Enable P_INTERN_NAMES (StAX2)
      • Enable P_INTERN_NS_URIS (StAX2)
      All the other standard settings should be left as is.

      In addition, implementations should set implementation-dependant settings appropriately as well.

    • configureForLowMemUsage

      public abstract void configureForLowMemUsage()
      Method to call to minimize the memory usage of the stream/event reader; both regarding Objects created, and the temporary memory usage during parsing. This generally incurs some performance penalties, due to using smaller input buffers.

      Regarding the default StAX property settings, implementations are suggested to do following:

      • Disable IS_COALECING (text coalescing, can force longer internal result buffers to be used)
      • Disable P_PRESERVE_LOCATION (StAX) to reduce temporary memory usage.
      All the other standard settings should be left as is.

      In addition, implementations should set implementation-dependant settings appropriately so that the memory usage is minimized.

    • configureForRoundTripping

      public abstract void configureForRoundTripping()
      Method to call to make Reader try to preserve as much of input formatting as possible, so that round-tripping would be as lossless as possible. This means that the matching writer should be able to reproduce output as closely matching input format as possible (most implementations won't be able to provide 100% vis-a-vis; white space between attributes is generally lost, as well as use of character entities).

      Regarding the default StAX property settings, implementations are suggested to do following:

      • Disable IS_COALESCING (to prevent CDATA and Text segments from getting combined)
      • Disable IS_REPLACING_ENTITY_REFERENCES to allow for preserving explicitly declared general entity references (that is, there is no way to preserve character entities, or pre-defined entities like 'gt', 'lt', 'amp', 'apos' and 'quot').
      • Disable P_REPORT_ALL_TEXT_AS_CHARACTERS (StAX2) (so that CDATA sections are not reported as 'normal' text)
      • Enable P_REPORT_PROLOG_WHITESPACE (StAX2)
      All the other standard settings should be left as is.

      In addition, implementations should set implementation-dependant settings appropriately as well.