Module ycoe
source code
Reads tokens from the York-Toronto-Helsinki Parsed Corpus of
Old English Prose (YCOE), a 1.5 million word syntactically-
annotated corpus of Old English prose texts. The corpus is
distributed by the Oxford Text Archive: http://www.ota.ahds.ac.uk/
The YCOE corpus is divided into 100 files, each representing
an Old English prose text. Tags used within each text complies
to the YCOE standard: http://www-users.york.ac.uk/~lang22/YCOE/YcoeHome.htm
Output of the reader is as follows:
Raw:
['+D+atte',
'on',
'o+dre',
'wisan',
'sint',
'to',
'manianne',
'+da',
'unge+dyldegan',
',',
'&',
'on',
'o+dre',
'+da',
'ge+dyldegan',
'.']
Tagged:
[('+D+atte', 'C'),
('on', 'P'),
('o+dre', 'ADJ'),
('wisan', 'N'),
('sint', 'BEPI'),
('to', 'TO'),
('manianne', 'VB^D'),
('+da', 'D^N'),
('unge+dyldegan', 'ADJ^N'),
(',', ','),
('&', 'CONJ'),
('on', 'P'),
('o+dre', 'ADJ'),
('+da', 'D^N'),
('ge+dyldegan', 'ADJ^N'),
('.', '.')]
Bracket Parse:
(CP-THT: (C: '+D+atte') (IP-SUB: (IP-SUB-0: (PP: (P: 'on') (NP: (ADJ: 'o+dre') (N: 'wisan')))
(BEPI: 'sint') (IP-INF: (TO: 'to') (VB^D: 'manianne') (NP: '*-1')) (NP-NOM-1: (D^N: '+da')
(ADJ^N: 'unge+dyldegan'))) (,: ',') (CONJP: (CONJ: '&') (IPX-SUB-CON=0: (PP: (P: 'on')
(NP: (ADJ: 'o+dre'))) (NP-NOM: (D^N: '+da') (ADJ^N: 'ge+dyldegan'))))) (.: '.')),
Chunk Parse:
[(S:
('C', '+D+atte')
(PP: ('P', 'on') ('ADJ', 'o+dre') ('N', 'wisan'))
('BEPI', 'sint') ('TO', 'to') ('VB^D', 'manianne')
(NP: ('NP', '*-1')) ('D^N', '+da') ('ADJ^N', 'unge+dyldegan') (',', ',') ('CONJ', '&')
(PP: ('P', 'on') ('ADJ', 'o+dre')) ('D^N', '+da') ('ADJ^N', 'ge+dyldegan') ('.', '.'))]
|
|
|
raw(files=[ ' coprefcura.o2 ' , ' cosolsat2 ' , ' coprefsolilo ' , ' comarvel.o23 ' , ... ) |
source code
|
|
|
tagged(files=[ ' coprefcura.o2 ' , ' cosolsat2 ' , ' coprefsolilo ' , ' comarvel.o23 ' , ... ) |
source code
|
|
|
chunked(files=[ ' coprefcura.o2 ' , ' cosolsat2 ' , ' coprefsolilo ' , ' comarvel.o23 ' , ... ,
chunk_types=( ' NP ' ) ,
top_node=' S ' ,
partial_match=True,
collapse_partials=True,
cascade=True) |
source code
|
|
|
bracket_parse(files=[ ' coprefcura.o2 ' , ' cosolsat2 ' , ' coprefsolilo ' , ' comarvel.o23 ' , ... ) |
source code
|
|
|
|
|
|
|
_chunk_parse(files,
chunk_types,
top_node,
partial_match,
collapse_partials,
cascade) |
source code
|
|
|
|
|
item_name = { ' coadrian.o34 ' : ' Adrian and Ritheus ' , ' coaelhom.o ...
|
|
items = [ ' coprefcura.o2 ' , ' cosolsat2 ' , ' coprefsolilo ' , ' comarv ...
Reads files from a given list, and converts them via the
conversion_function.
|
item_name
- Value:
{ ' coadrian.o34 ' : ' Adrian and Ritheus ' ,
' coaelhom.o3 ' : ' \xc6lfric, Supplemental Homilies ' ,
' coaelive.o3 ' : ' \xc6lfrics Lives of Saints ' ,
' coalcuin ' : ' Alcuin De virtutibus et vitiis ' ,
' coalex.o23 ' : ' Alexanders Letter to Aristotle ' ,
' coapollo.o3 ' : ' Apollonius of Tyre ' ,
' coaugust ' : ' Augustine ' ,
' cobede.o2 ' : ' Bedes History of the English Church ' ,
...
|
|
items
Reads files from a given list, and converts them via the
conversion_function. Can return raw or tagged read files.
- Value:
[ ' coprefcura.o2 ' ,
' cosolsat2 ' ,
' coprefsolilo ' ,
' comarvel.o23 ' ,
' cochdrul ' ,
' coalex.o23 ' ,
' colawwllad.o4 ' ,
' cocathom1.o3 ' ,
...
|
|