Contents of this chapter:
Here is a template for the format of a POS-tagged corpus file:
#!FORMAT=POS_1Insert header information, if any, below format line above, which must be the first line in the file.
<text>
WORD1/TAG WORD2/TAG WORD3/TAG ..... ./.
WORD1/TAG WORD2/TAG WORD3/TAG ..... ?/.
.....
</text>
Every word in a POS-tagged file after the initial "
will find this sentence:
finds this sentence:
Notice that "iDominates" describes the relationship between a POS tag and its
associated text (e.g., "FP" and "ane").
The following query:
finds this sentence:
Neighborhood takes three arguments, two words or tags and a number. It searches for sentences in which
the two words/tags occur within a certain number of words of one another. For instance, this query:
will return all tokens in the corpus in which the word "whoreson" is within two words of the word "wilt,"
for instance, the following sentence:
finds this case:
Search functions
The query file for searching a POS-tagged corpus looks much like that for a parsed
corpus. The node boundary, however, is always $ROOT. CorpusSearch treats POS-tagged files as
containing sentences parsed with a completely flat structure, with every word/tag pair as an
immediate daughter of the root node. The tag for a word is treated as its mother, so that a query
like "(N iDoms king)" returns sentences containing the word/tag pair "king/N". Because of the
flat structure of a POS-tagged file, many CorpusSearch functions cannot be used. Below is a
list of those that are ordinarily appropriate. The function "Neighborhood" works only on POS-tagged
files.
Exists (variants: exists)
Exists searches for a POS tag or text anywhere in the sentence. For instance, this query:
(MD0 exists)
/~*
I shal not conne wel goo thyder ./. (ID CMREYNAR,14.261)
*~/
/*
4 MD0 conne
*/
( (PRO I) (MD shal) (NEG not) (MD0 conne) (ADV wel)) (VB goo) (ADV thyder) )
iDominates (variants: idominates, iDoms, idoms)
iDominates means "immediately dominates". That is, x dominates y if y is a
child of x. So this query:
((PRO iDominates he) AND (FP iDominates ane))
/~*
Sythen he ledes +tam by +tar ane,
(CMROLLEP,118.978)
*~/
/*
2 PRO he, 7 FP ane
*/
( (ADV Sythen) (PRO he) (VBP ledes) (8 PRO +tam) (10 P by) (12 PRO$ +tar) (13 FP ane) (. ,) )
/*
iPrecedes (variants: iprecedes, iPres, ipres)
This function is true if and only if its first argument immediately precedes
its second argument in the text/tag string.
query: (as iPrecedes sone) AND (sone iPrecedes P)
/~*
and as sone as he myght he toke his horse .
(CMMALORY,206.3401)
*~/
/*
2 as, 3 sone, 4 P as
*/
( CONJ and) (ADVR as) (ADV sone) (P as) (PRO he) (MD myght) (PRO he) (VBD toke) (PRO$ his) (N horse) (. .) )
Neighborhood (variant: neighborhood)
query: (whoreson Neighborhood 2 wilt)
/~*
why thou whoreson when wilt thou be maried?
(DELONEY,79.296)
*~/
/*
3 whoreson, 5 wilt
*/
( (WADV why) (PRO thou) (N whoreson) WADV when) (MD wilt) (PRO thou) (BE be) (VAN maried) (. ?) )
(ID DELONEY,79.296))
Precedes (variants: precedes, Pres, pres)
"x precedes y" means "x comes before y in the sentence but perhaps not immediately".
So this query:
(VB precedes N)
/~*
thenne have ye cause to make myghty werre upon hym.
(CMMALORY,2.25)
*~/
/*
6 VB make, 8 N werre
*/
( (ADV thenne) (HV have) (PRO ye) (N cause) (TO to) (VB make) (ADJ myghty) (N werre) (P upon)
(PRO hym) (. .) )
(ID CMMALORY,2.25))