Contents of this chapter:
Here is a template for the format of a POS-tagged corpus file:
#!FORMAT=POS_1Insert header information, if any, below format line above, which must be the first line in the file.
<text>
WORD1/TAG WORD2/TAG WORD3/TAG ..... ./.
WORD1/TAG WORD2/TAG WORD3/TAG ..... ?/.
.....
</text>
Every word in a POS-tagged file after the initial "
(MD0 exists)
will find this sentence:
/~* I shal not conne wel goo thyder ./. (ID CMREYNAR,14.261) *~/ /* 4 MD0 conne */ ( (PRO I) (MD shal) (NEG not) (MD0 conne) (ADV wel)) (VB goo) (ADV thyder) )
((PRO iDominates he) AND (FP iDominates ane))
finds this sentence:
/~* Sythen he ledes +tam by +tar ane, (CMROLLEP,118.978) *~/ /* 2 PRO he, 7 FP ane */ ( (ADV Sythen) (PRO he) (VBP ledes) (8 PRO +tam) (10 P by) (12 PRO$ +tar) (13 FP ane) (. ,) ) /*
Notice that "iDominates" describes the relationship between a POS tag and its associated text (e.g., "FP" and "ane").
The following query:
query: (as iPrecedes sone) AND (sone iPrecedes P)
finds this sentence:
/~* and as sone as he myght he toke his horse . (CMMALORY,206.3401) *~/ /* 2 as, 3 sone, 4 P as */ ( CONJ and) (ADVR as) (ADV sone) (P as) (PRO he) (MD myght) (PRO he) (VBD toke) (PRO$ his) (N horse) (. .) )
Neighborhood takes three arguments, two words or tags and a number. It searches for sentences in which the two words/tags occur within a certain number of words of one another. For instance, this query:
query: (whoreson Neighborhood 2 wilt)
will return all tokens in the corpus in which the word "whoreson" is within two words of the word "wilt," for instance, the following sentence:
/~* why thou whoreson when wilt thou be maried? (DELONEY,79.296) *~/ /* 3 whoreson, 5 wilt */ ( (WADV why) (PRO thou) (N whoreson) WADV when) (MD wilt) (PRO thou) (BE be) (VAN maried) (. ?) ) (ID DELONEY,79.296))
(VB precedes N)
finds this case:
/~* thenne have ye cause to make myghty werre upon hym. (CMMALORY,2.25) *~/ /* 6 VB make, 8 N werre */ ( (ADV thenne) (HV have) (PRO ye) (N cause) (TO to) (VB make) (ADJ myghty) (N werre) (P upon) (PRO hym) (. .) ) (ID CMMALORY,2.25))