Public XPath API

The package includes some classes and functions that implement XPath selectors, parsers, tokens, contexts and schema proxy.

XPath selectors

select(root, path, namespaces=None, parser=None, **kwargs)

XPath selector function that apply a path expression on root Element.

Parameters
  • root – an Element or ElementTree instance.

  • path – the XPath expression.

  • namespaces – a dictionary with mapping from namespace prefixes into URIs.

  • parser – the parser class to use, that is XPath2Parser for default.

  • kwargs – other optional parameters for the parser instance or the dynamic context.

Returns

a list with XPath nodes or a basic type for expressions based on a function or literal.

iter_select(root, path, namespaces=None, parser=None, **kwargs)

A function that creates an XPath selector generator for apply a path expression on root Element.

Parameters
  • root – an Element or ElementTree instance.

  • path – the XPath expression.

  • namespaces – a dictionary with mapping from namespace prefixes into URIs.

  • parser – the parser class to use, that is XPath2Parser for default.

  • kwargs – other optional parameters for the parser instance or the dynamic context.

Returns

a generator of the XPath expression results.

class Selector(path, namespaces=None, parser=None, **kwargs)

XPath selector class. Create an instance of this class if you want to apply an XPath selector to several target data.

Parameters
  • path – the XPath expression.

  • namespaces – a dictionary with mapping from namespace prefixes into URIs.

  • parser – the parser class to use, that is XPath2Parser for default.

  • kwargs – other optional parameters for the XPath parser instance.

Variables
  • path (str) – the XPath expression.

  • parser (XPath1Parser or XPath2Parser) – the parser instance.

  • root_token (XPathToken) – the root of tokens tree compiled from path.

namespaces

A dictionary with mapping from namespace prefixes into URIs.

select(root, **kwargs)

Applies the instance’s XPath expression on root Element.

Parameters
  • root – an Element or ElementTree instance.

  • kwargs – other optional parameters for the XPath dynamic context.

Returns

a list with XPath nodes or a basic type for expressions based on a function or literal.

iter_select(root, **kwargs)

Creates an XPath selector generator for apply the instance’s XPath expression on root Element.

Parameters
  • root – an Element or ElementTree instance.

  • kwargs – other optional parameters for the XPath dynamic context.

Returns

a generator of the XPath expression results.

XPath parsers

class XPath1Parser(namespaces=None, strict=True, *args, **kwargs)

XPath 1.0 expression parser class. Provide a namespaces dictionary argument for mapping namespace prefixes to URI inside expressions. If strict is set to False the parser enables also the parsing of QNames, like the ElementPath library.

Parameters
  • namespaces – a dictionary with mapping from namespace prefixes into URIs.

  • strict – a strict mode is False the parser enables parsing of QNames in extended format, like the Python’s ElementPath library. Default is True.

DEFAULT_NAMESPACES = {'xml': 'http://www.w3.org/XML/1998/namespace'}

The default prefix-to-namespace associations of the XPath class. These namespaces are updated in the instance with the ones passed with the namespaces argument.

version = '1.0'

The XPath version string.

default_namespace

The default namespace. For XPath 1.0 this value is always None because the default namespace is ignored (see https://www.w3.org/TR/1999/REC-xpath-19991116/#node-tests).

Helper methods for defining token classes:

classmethod axis(symbol, reverse_axis=False, bp=80)

Register a token for a symbol that represents an XPath axis.

classmethod function(symbol, nargs=None, label='function', bp=90)

Registers a token class for a symbol that represents an XPath callable object. For default a callable labeled as function is registered but a different label can be provided.

class XPath2Parser(namespaces=None, variable_types=None, strict=True, compatibility_mode=False, default_collation=None, default_namespace=None, function_namespace=None, xsd_version=None, schema=None, base_uri=None, document_types=None, collection_types=None, default_collection_type='node()*')

XPath 2.0 expression parser class. This is the default parser used by XPath selectors. A parser instance represents also the XPath static context. With variable_types you can pass a dictionary with the types of the in-scope variables. Provide a namespaces dictionary argument for mapping namespace prefixes to URI inside expressions. If strict is set to False the parser enables also the parsing of QNames, like the ElementPath library. There are some additional XPath 2.0 related arguments.

Parameters
  • namespaces – a dictionary with mapping from namespace prefixes into URIs.

  • variable_types – a dictionary with the static context’s in-scope variable types. It defines the associations between variables and static types.

  • strict – if strict mode is False the parser enables parsing of QNames, like the ElementPath library. Default is True.

  • compatibility_mode – if set to True the parser instance works with XPath 1.0 compatibility rules.

  • default_namespace – the default namespace to apply to unprefixed names. For default no namespace is applied (empty namespace ‘’).

  • function_namespace – the default namespace to apply to unprefixed function names. For default the namespace “http://www.w3.org/2005/xpath-functions” is used.

  • schema – the schema proxy class or instance to use for types, attributes and elements lookups. If an AbstractSchemaProxy subclass is provided then a schema proxy instance is built without the optional argument, that involves a mapping of only XSD builtin types. If it’s not provided the XPath 2.0 schema’s related expressions cannot be used.

  • base_uri – an absolute URI maybe provided, used when necessary in the resolution of relative URIs.

  • default_collation – the default string collation to use. If not set the environment’s default locale setting is used.

  • document_types – statically known documents, that is a dictionary from absolute URIs onto types. Used for type check when calling the fn:doc function with a sequence of URIs. The default type of a document is ‘document-node()’.

  • collection_types – statically known collections, that is a dictionary from absolute URIs onto types. Used for type check when calling the fn:collection function with a sequence of URIs. The default type of a collection is ‘node()*’.

  • default_collection_type – this is the type of the sequence of nodes that would result from calling the fn:collection function with no arguments. Default is ‘node()*’.

XPath tokens

class XPathToken(parser, value=None)

Base class for XPath tokens.

evaluate(context=None)

Evaluate default method for XPath tokens.

Parameters

context – The XPath dynamic context.

select(context=None)

Select operator that generates XPath results.

Parameters

context – The XPath dynamic context.

Context manipulation helpers:

get_argument(context, index=0, required=False, default_to_context=False, default=None, cls=None, promote=None)

Get the argument value of a function of constructor token. A zero length sequence is converted to a None value. If the function has no argument returns the context’s item if the dynamic context is not None.

Parameters
  • context – the dynamic context.

  • index – an index for select the argument to be got, the first for default.

  • required – if set to True missing or empty sequence arguments are not allowed.

  • default_to_context – if set to True then the item of the dynamic context is returned when the argument is missing.

  • default – the default value returned in case the argument is an empty sequence. If not provided returns None.

  • cls – if a type is provided performs a type checking on item.

  • promote – a class or a tuple of classes that are promoted to cls class.

atomization(context=None)

Helper method for value atomization of a sequence.

Ref: https://www.w3.org/TR/xpath20/#id-atomization

Parameters

context – the XPath dynamic context.

get_atomized_operand(context=None)

Get the atomized value for an XPath operator.

Parameters

context – the XPath dynamic context.

Returns

the atomized value of a single length sequence or None if the sequence is empty.

iter_comparison_data(context)

Generates comparison data couples for the general comparison of sequences. Different sequences maybe generated with an XPath 2.0 parser, depending on compatibility mode setting.

Ref: https://www.w3.org/TR/xpath20/#id-general-comparisons

Parameters

context – the XPath dynamic context.

get_operands(context, cls=None)

Returns the operands for a binary operator. Float arguments are converted to decimal if the other argument is a Decimal instance.

Parameters
  • context – the XPath dynamic context.

  • cls – if a type is provided performs a type checking on item.

Returns

a couple of values representing the operands. If any operand is not available returns a (None, None) couple.

get_results(context)

Returns formatted XPath results.

Parameters

context – the XPath dynamic context.

Returns

a list or a simple datatype when the result is a single simple type generated by a literal or function token.

select_results(context)

Generates formatted XPath results.

Parameters

context – the XPath dynamic context.

adjust_datetime(context, cls)

XSD datetime adjust function helper.

Parameters
  • context – the XPath dynamic context.

  • cls – the XSD datetime subclass to use.

Returns

an empty list if there is only one argument that is the empty sequence or the adjusted XSD datetime instance.

use_locale(collation)

A context manager for use a locale setting for string comparison in a code block.

Schema context methods .. automethod:: select_xsd_nodes .. automethod:: add_xsd_type .. automethod:: get_xsd_type .. automethod:: get_typed_node

Data accessor helpers .. automethod:: data_value .. automethod:: boolean_value .. automethod:: string_value .. automethod:: number_value .. automethod:: schema_node_value

Error management helper:

error(code, message_or_error=None)

Returns an XPath error instance related with a code. An XPath/XQuery/XSLT error code is an alphanumeric token starting with four uppercase letters and ending with four digits.

Parameters
  • code – the error code as QName or string.

  • message_or_error – an optional custom additional message.

XPath contexts

class XPathContext(root, namespaces=None, item=None, position=1, size=1, axis=None, variables=None, current_dt=None, timezone=None, documents=None, collections=None, default_collection=None)

The XPath dynamic context. The static context is provided by the parser.

Usually the dynamic context instances are created providing only the root element. Variable values argument is needed if the XPath expression refers to in-scope variables. The other optional arguments are needed only if a specific position on the context is required, but have to be used with the knowledge of what is their meaning.

Parameters
  • root – the root of the XML document, can be a ElementTree instance or an Element.

  • namespaces – a dictionary with mapping from namespace prefixes into URIs, used when namespace information is not available within document and element nodes. This can be useful when the dynamic context has additional namespaces and root is an Element or an ElementTree instance of the standard library.

  • item – the context item. A None value means that the context is positioned on the document node.

  • position – the current position of the node within the input sequence.

  • size – the number of items in the input sequence.

  • axis – the active axis. Used to choose when apply the default axis (‘child’ axis).

  • variables – dictionary of context variables that maps a QName to a value.

  • current_dt – current dateTime of the implementation, including explicit timezone.

  • timezone – implicit timezone to be used when a date, time, or dateTime value does not have a timezone.

  • documents – available documents. This is a dictionary from absolute URIs onto document nodes. Used by the function fn:doc.

  • collections – available collections. This is a dictionary from absolute URIs onto sequences of nodes. Used by the function fn:collection.

  • default_collection – this is the sequence of nodes used when fn:collection is called with no arguments.

class XPathSchemaContext(root, namespaces=None, item=None, position=1, size=1, axis=None, variables=None, current_dt=None, timezone=None, documents=None, collections=None, default_collection=None)

The XPath dynamic context base class for schema bounded parsers. Use this class as dynamic context for schema instances in order to perform a schema-based type checking during the static analysis phase. Don’t use this as dynamic context on XML instances.

XML Schema proxy

The XPath 2.0 parser can be interfaced with an XML Schema processor through a schema proxy. An XMLSchemaProxy class is defined for interfacing schemas created with the xmlschema package. This class is based on an abstract class AbstractSchemaProxy, that can be used for implementing concrete interfaces to other types of XML Schema processors.

class AbstractSchemaProxy(schema, base_element=None)

Abstract class for defining schema proxies.

Parameters
  • schema – a schema instance that implements the AbstractEtreeElement interface.

  • base_element – the schema element used as base item for static analysis. It must implements the AbstractXsdElement interface.

bind_parser(parser)

Binds a parser instance with schema proxy adding the schema’s atomic types constructors. This method can be redefined in a concrete proxy to optimize schema bindings.

Parameters

parser – a parser instance.

get_context()

Get a context instance for static analysis phase.

Returns

an XPathSchemaContext instance.

find(path, namespaces=None)

Find a schema element or attribute using an XPath expression.

Parameters
  • path – an XPath expression that selects an element or an attribute node.

  • namespaces – an optional mapping from namespace prefix to namespace URI.

Returns

The first matching schema component, or None if there is no match.

abstract get_type(qname)

Get the XSD global type from the schema’s scope. A concrete implementation must returns an object that implements the AbstractXsdType interface, or None if the global type is not found.

Parameters

qname – the fully qualified name of the type to retrieve.

Returns

an object that represents an XSD type or None.

abstract get_attribute(qname)

Get the XSD global attribute from the schema’s scope. A concrete implementation must returns an object that implements the AbstractXsdAttribute interface, or None if the global attribute is not found.

Parameters

qname – the fully qualified name of the attribute to retrieve.

Returns

an object that represents an XSD attribute or None.

abstract get_element(qname)

Get the XSD global element from the schema’s scope. A concrete implementation must returns an object that implements the AbstractXsdElement interface or None if the global element is not found.

Parameters

qname – the fully qualified name of the element to retrieve.

Returns

an object that represents an XSD element or None.

abstract is_instance(obj, type_qname)

Returns True if obj is an instance of the XSD global type, False if not.

Parameters
  • obj – the instance to be tested.

  • type_qname – the fully qualified name of the type used to test the instance.

abstract cast_as(obj, type_qname)

Converts obj to the Python type associated with an XSD global type. A concrete implementation must raises a ValueError or TypeError in case of a decoding error or a KeyError if the type is not bound to the schema’s scope.

Parameters
  • obj – the instance to be casted.

  • type_qname – the fully qualified name of the type used to convert the instance.

abstract iter_atomic_types()

Returns an iterator for not builtin atomic types defined in the schema’s scope. A concrete implementation must yields objects that implement the AbstractXsdType interface.

abstract get_primitive_type(xsd_type)

Returns the type at base of the definition of an XSD type. For an atomic type is effectively the primitive type. For a list is the primitive type of the item. For a union is the base union type. For a complex type is xs:anyType.

Parameters

xsd_type – an XSD type instance.

Returns

an XSD type instance.

XPath nodes

XPath nodes are processed using a set of namedtuple classes. The choice of a tuple-based processing is for speed and because these are only temporary containers, being that the final results are cleaned from intermediate tuples.

AttributeNode

A namedtuple-based type for processing XPath attributes.

Parameters
  • name – the attribute name.

  • value – the string value of the attribute, or an XSD attribute when XPath is applied on a schema.

alias of elementpath.xpath_nodes.Attribute

TextNode

A namedtuple-based type for processing XPath text nodes. A text node is the elem.text value if this is None, otherwise the element doesn’t have a text node.

Parameters

value – the string value.

alias of elementpath.xpath_nodes.Text

class TypedAttribute(attribute, type, value)

A namedtuple-based type for processing typed-value attributes.

Parameters
  • attribute – the origin AttributeNode tuple.

  • type – the reference XSD type.

  • value – the decoded value.

class TypedElement(elem, type, value)

A namedtuple-based type for processing typed-value elements.

Parameters
  • elem – the origin element. Can be an Element, or an XSD element when XPath is applied on a schema.

  • type – the reference XSD type.

  • value – the decoded value. Can be None for empty or element-only elements.

NamespaceNode

A namedtuple-based type for processing XPath namespaces.

Parameters
  • prefix – the namespace prefix.

  • uri – the namespace URI.

alias of elementpath.xpath_nodes.Namespace

XPath regular expressions

translate_pattern(pattern, flags=0, xsd_version='1.0', back_references=True, lazy_quantifiers=True, anchors=True)

Translates a pattern regex expression to a Python regex pattern. With default options the translator processes XPath 2.0/XQuery 1.0 regex patterns. For XML Schema patterns set all boolean options to False.

Parameters
  • pattern – the source XML Schema regular expression.

  • flags – regex flags as represented by Python’s re module.

  • xsd_version – apply regex rules of a specific XSD version, ‘1.0’ for default.

  • back_references – if True supports back-references and capturing groups.

  • lazy_quantifiers – if True supports lazy quantifiers (*?, +?).

  • anchors – if True supports ^ and $ anchors, otherwise the translated pattern is anchored to its boundaries and anchors are treated as normal characters.

Exception classes

exception ElementPathError(message, code=None, token=None)

Base exception class for elementpath package.

Parameters
  • message – the message related to the error.

  • code – an optional error code.

  • token – an optional token instance related with the error.

exception MissingContextError(message, code=None, token=None)

Raised when the dynamic context is required for evaluate the XPath expression.

exception RegexError

Error in a regular expression or in a character class specification. This exception is derived from Exception base class and is raised only by the regex subpackage.

There are also other exceptions, multiple derived from the base exception ElementPathError and Python built-in exceptions:

exception ElementPathKeyError(message, code=None, token=None)
exception ElementPathLocaleError(message, code=None, token=None)
exception ElementPathNameError(message, code=None, token=None)
exception ElementPathOverflowError(message, code=None, token=None)
exception ElementPathSyntaxError(message, code=None, token=None)
exception ElementPathTypeError(message, code=None, token=None)
exception ElementPathValueError(message, code=None, token=None)
exception ElementPathZeroDivisionError(message, code=None, token=None)