Package nltk_lite :: Package contrib :: Package mit :: Package six863 :: Package semantics :: Module featurelite
[hide private]
[frames] | no frames]

Module featurelite

source code

This module provides utilities for treating Python dictionaries as feature structures. Specifically, it contains the unify function, which can be used to merge the properties of two dictionaries, and the Variable class, which holds an unknown value to be used in unification.

A feature structure is a mapping from feature names to feature values, where:

However, feature structures are not a specialized class; they are represented by dictionaries, or more generally by anything that responds to the has_key method. The YAML representation can be used to create and display feature structures intuitively:

>>> f1 = yaml.load('''
... A:
...   B: b
...   D: d
... ''')
>>> f2 = yaml.load('''
... A:
...   C: c
...   D: d
... ''')
>>> print show(unify(f1, f2))
A:
  B: b
  C: c
  D: d

Feature structures are typically used to represent partial information about objects. A feature name that is not mapped to a value stands for a feature whose value is unknown (not a feature without a value). Two feature structures that represent (potentially overlapping) information about the same object can be combined by unification. When two inconsistant feature structures are unified, the unification fails and raises an error.

Features can be specified using feature paths, or tuples of feature names that specify paths through the nested feature structures to a value.

Feature structures may contain reentrant feature values. A reentrant feature value is a single feature value that can be accessed via multiple feature paths. Unification preserves the reentrance relations imposed by both of the unified feature structures. After unification, any extensions to a reentrant feature value will be visible using any of its feature paths.

Feature structure variables are encoded using the Variable class. The scope of a variable is determined by the bindings used when the structure including that variable is unified. Bindings can be reused between unifications to ensure that variables with the same name get the same value.

Classes [hide private]
  UnificationFailure
An exception that is raised when two values cannot be unified.
  FeatureI
  _FORWARD
_FORWARD is a singleton value, used in unification as a flag that a value has been forwarded to another object.
  Variable
A Variable is an object that can be used in unification to hold an initially unknown value.
  SubstituteBindingsI
An interface for classes that can perform substitutions for feature variables.
  SubstituteBindingsMixin
Functions [hide private]
 
makevar(varname)
Given a variable representation such as ?x, construct a corresponding Variable object.
source code
bool
isMapping(obj)
Determine whether to treat a given object as a feature structure.
source code
 
show(data)
Works like yaml.dump(), but with output suited for doctests.
source code
 
object_to_features(obj) source code
 
variable_representer(dumper, var)
Output variables in YAML as ?name.
source code
 
variable_constructor(loader, node)
Recognize variables written as ?name in YAML.
source code
 
_copy_and_bind(feature, bindings, memo=None)
Make a deep copy of a feature structure, preserving reentrance using the memo dictionary.
source code
 
substitute_bindings(feature, bindings)
Replace variables in a feature structure with their bound values.
source code
 
apply(feature, bindings)
Replace variables in a feature structure with their bound values.
source code
object (probably a mapping)
unify(feature1, feature2, bindings1=None, bindings2=None, memo=None, fail=None, trace=0)
In general, the 'unify' procedure takes two values, and either returns a value that provides the information provided by both values, or fails if that is impossible.
source code
 
_destructively_unify(feature1, feature2, bindings1, bindings2, memo, fail, depth=0)
Attempt to unify self and other by modifying them in-place.
source code
 
_do_unify(feature1, feature2, bindings1, bindings2, memo, fail, depth=0)
Do the actual work of _destructively_unify when the result isn't memoized.
source code
 
_apply_forwards(feature, visited)
Replace any feature structure that has a forward pointer with the target of its forward pointer (to preserve reentrance).
source code
 
_lookup_values(mapping, visited, remove=True)
The unification procedure creates _bound variables_, which are Variable objects that have been assigned a value.
source code
 
_apply_forwards_to_bindings(bindings)
Replace any feature structures that have been forwarded by their new identities.
source code
 
test()
Run unit tests on unification.
source code
Function Details [hide private]

isMapping(obj)

source code 

Determine whether to treat a given object as a feature structure. The test is whether it responds to has_key. This can be overridden if the object includes an attribute or method called _no_feature.

Parameters:
  • obj (object) - The object to be tested
Returns: bool
True iff the object can be treated as a feature structure

show(data)

source code 

Works like yaml.dump(), but with output suited for doctests. Flow style is always off, and there is no blank line at the end.

_copy_and_bind(feature, bindings, memo=None)

source code 

Make a deep copy of a feature structure, preserving reentrance using the memo dictionary. Meanwhile, variables are replaced by their bound values, if these values are already known, and variables with unknown values are given placeholder bindings.

unify(feature1, feature2, bindings1=None, bindings2=None, memo=None, fail=None, trace=0)

source code 

In general, the 'unify' procedure takes two values, and either returns a value that provides the information provided by both values, or fails if that is impossible.

These values can have any type, but fall into a few general cases:

  • Values that respond to has_key represent feature structures. The unify procedure will recurse into them and unify their inner values.
  • Variables represent an unknown value, and are handled specially. The values assigned to variables are tracked using bindings.
  • None represents the absence of information.
  • Any other value is considered a base value. Base values are compared to each other with the == operation.

The value 'None' represents the absence of any information. It specifies no properties and acts as the identity in unification. >>> unify(3, None) 3

>>> unify(None, 'fish')
'fish'

A base value unifies with itself, but not much else. >>> unify(True, True) True

>>> unify([1], [1])
[1]
>>> unify('a', 'b')
Traceback (most recent call last):
    ...
UnificationFailure

When two mappings (representing feature structures, and usually implemented as dictionaries) are unified, any chain of keys that accesses a value in either mapping will access an equivalent or more specific value in the unified mapping. If this is not possible, UnificationFailure is raised.

>>> f1 = dict(A=dict(B='b'))
>>> f2 = dict(A=dict(C='c'))
>>> unify(f1, f2) == dict(A=dict(B='b', C='c'))
True

The empty dictionary specifies no features. It unifies with any mapping. >>> unify({}, dict(foo='bar')) {'foo': 'bar'}

>>> unify({}, True)
Traceback (most recent call last):
    ...
UnificationFailure

Representing dictionaries in YAML form is useful for making feature structures readable:

>>> f1 = yaml.load("number: singular")
>>> f2 = yaml.load("person: 3")
>>> print show(unify(f1, f2))
number: singular
person: 3
>>> f1 = yaml.load('''
... A:
...   B: b
...   D: d
... ''')
>>> f2 = yaml.load('''
... A:
...   C: c
...   D: d
... ''')
>>> print show(unify(f1, f2))
A:
  B: b
  C: c
  D: d

Variables are names for unknown values. Variables are assigned values that will make unification succeed. The values of variables can be reused in later unifications if you provide a dictionary of _bindings_ from variables to their values. >>> bindings = {} >>> print unify(Variable('x'), 5, bindings) 5

>>> print bindings
{'x': 5}
>>> print unify({'a': Variable('x')}, {}, bindings)
{'a': 5}

The same variable name can be reused in different binding dictionaries without collision. In some cases, you may want to provide two separate binding dictionaries to unify -- one for each feature structure, so their variables do not collide.

In the following examples, two different feature structures use the variable ?x to require that two values are equal. The values assigned to ?x are consistent within each structure, but would be inconsistent if every ?x had to have the same value.

>>> f1 = yaml.load('''
... a: 1
... b: 1
... c: ?x
... d: ?x
... ''')
>>> f2 = yaml.load('''
... a: ?x
... b: ?x
... c: 2
... d: 2
... ''')
>>> bindings1 = {}
>>> bindings2 = {}
>>> print show(unify(f1, f2, bindings1, bindings2))
a: 1
b: 1
c: 2
d: 2
>>> print bindings1
{'x': 2}
>>> print bindings2
{'x': 1}

Feature structures can involve _reentrant_ values, where multiple feature paths lead to the same value. This is represented by the features having the same Python object as a value. (This kind of identity can be tested using the is operator.)

Unification preserves the properties of reentrance. So if a reentrant value is changed by unification, it is changed everywhere it occurs, and it is still reentrant. Reentrant features can even form cycles; these cycles can now be printed through the current YAML library.

>>> f1 = yaml.load('''
... A: &1                # &1 defines a reference in YAML...
...   B: b
... E:
...   F: *1              # and *1 uses the previously defined reference.
... ''')
>>> f1['E']['F']['B']
'b'
>>> f1['A'] is f1['E']['F']
True
>>> f2 = yaml.load('''
... A:
...   C: c
... E:
...   F:
...     D: d
... ''')
>>> f3 = unify(f1, f2)
>>> print show(f3)
A: &id001
  B: b
  C: c
  D: d
E:
  F: *id001
>>> f3['A'] is f3['E']['F']    # Showing that the reentrance still holds.
True

This unification creates a cycle: >>> f1 = yaml.load(''' ... F: &1 {} ... G: *1 ... ''') >>> f2 = yaml.load(''' ... F: ... H: &2 {} ... G: *2 ... ''') >>> f3 = unify(f1, f2) >>> print f3 {'G': {'H': {...}}, 'F': {'H': {...}}} >>> print f3['F'] is f3['G'] True >>> print f3['F'] is f3['G']['H'] True >>> print f3['F'] is f3['G']['H']['H'] True

A cycle can also be created using variables instead of reentrance. Here we supply a single set of bindings, so that it is used on both sides of the unification, making ?x mean the same thing in both feature structures.

>>> f1 = yaml.load('''
... F:
...   H: ?x
... ''')
>>> f2 = yaml.load('''
... F: ?x
... ''')
>>> f3 = unify(f1, f2, {})
>>> print f3
{'F': {'H': {...}}}
>>> print f3['F'] is f3['F']['H']
True
>>> print f3['F'] is f3['F']['H']['H']
True

Two sets of bindings can be provided because the variable names on each side of the unification may be unrelated. An example involves unifying the following two structures, which each require that two values are equivalent, and happen to both use ?x to express that requirement.

>>> f1 = yaml.load('''
... a: 1
... b: 1
... c: ?x
... d: ?x
... ''')
>>> f2 = yaml.load('''
... a: ?x
... b: ?x
... c: 2
... d: 2
... ''')
>>> bindings1 = {}
>>> bindings2 = {}
>>> # We could avoid defining two empty dictionaries by simply using the
>>> # defaults, with unify(f1, f2) -- but we want to be able to examine
>>> # the bindings afterward.
>>> print show(unify(f1, f2, bindings1, bindings2))
a: 1
b: 1
c: 2
d: 2
>>> print bindings1
{'x': 2}
>>> print bindings2
{'x': 1}

If a variable is unified with another variable, the two variables are _aliased_ to each other; they share the same value, similarly to reentrant feature structures. This is represented in a set of bindings as one variable having the other as its value. >>> f1 = yaml.load(''' ... a: ?x ... b: ?x ... ''') >>> f2 = yaml.load(''' ... b: ?y ... c: ?y ... ''') >>> bindings = {} >>> print show(unify(f1, f2, bindings)) a: &id001 ?y b: *id001 c: *id001 >>> print bindings {'x': ?y}

Reusing the same variable bindings ensures that appropriate bindings are made after the fact: >>> bindings = {} >>> f1 = {'a': Variable('x')} >>> f2 = unify(f1, {'a': {}}, bindings) >>> f3 = unify(f2, {'b': Variable('x')}, bindings) >>> print show(f3) a: &id001 {} b: *id001 >>> print bindings {'x': {}}

Parameters:
  • feature1 (object (probably a mapping)) - The first object to be unified.
  • feature2 (object (probably a mapping)) - The second object to be unified.
  • bindings1 (dict or None) - The variable bindings associated with the first object.
  • bindings2 (dict or None) - The variable bindings associated with the second object, if these are distinct from bindings1.
Returns: object (probably a mapping)
The result of unifying the two objects.

_destructively_unify(feature1, feature2, bindings1, bindings2, memo, fail, depth=0)

source code 

Attempt to unify self and other by modifying them in-place. If the unification succeeds, then self will contain the unified value, and the value of other is undefined. If the unification fails, then a UnificationFailure is raised, and the values of self and other are undefined.

_lookup_values(mapping, visited, remove=True)

source code 

The unification procedure creates _bound variables_, which are Variable objects that have been assigned a value. Bound variables are not useful in the end result, however, so they should be replaced by their values.

This procedure takes a mapping, which may be a feature structure or a binding dictionary, and replaces bound variables with their values.

If the dictionary is a binding dictionary, then 'remove' should be set to True. This ensures that unbound, unaliased variables are removed from the dictionary. If the variable name 'x' is mapped to the unbound variable ?x, then, it should be removed. This is not done with features, because a feature named 'x' can of course have a variable ?x as its value.