Contents of this chapter:
Revisions are implemented by standard CS queries that are supplemented with indices linking nodes in the query to revision instructions. The revision-related indices (henceforth, “flags”) are enclosed in curly brackets. Here is the general idea:
query: ({x}A function B) AND (C function {y}D) revise{x}: info revise{y}: info
The curly brackets around flags distinguish them from the indices that are relevant for same-instance, which are enclosed in square brackets. N.B.: Contrary to what one might expect, flags must follow any same-instance indices. In other words, curly brackets follow square brackets. The proper order of the two types of indices is illustrated in the following example.
node: IP* copy_corpus: t query: ([1]NP iDoms [2]{1}NP) AND ([2]NP iDomsFirst DF) append_label{1}: -PART
The above query transforms the old structure below into the corresponding new structure:
Old: (NP (Q beaucoup) (NP (DF de) (NCPL livres)) New: (NP (Q beaucoup) (NP-PART (DF de) (NCPL livres))
Note the "copy_corpus t" line in the preamble of the query, which yields an output file containing a full copy of the input as modified by the revision query. If this line is omitted or commented out, the output contains only tokens that match the query, as modified by the specified revisions. This option can be useful in developing and testing complex revision queries, as it allows the developer to home in on the relevant tokens.
Another example (this one without same-instance indices) comes from the history of the Tycho Brahe Corpus of historical Portuguese. Originally, portmanteau items like "dos” were treated as one word, but it was later decided to split such items into two pieces - in the case at hand, a preposition “d@", and a determiner “@os”.
Old: (PP (P+D-P dos) (NP (ADJ-P grandes) (N-P homens) New: (PP (P d@) (NP (D-P @os) (ADJ-P grandes) (N-P homens)
The change can be implemented with the following query file:
node: IP* copy_corpus: t query: (PP iDoms {1}P+D-P) AND (P+D-P iDoms {2}dos) AND (P+D-P iPrecedes NP) AND (P+D-P hasSister NP) AND (NP iDomsFirst {3}*) replace_label{1}: P replace_label{2}: d@ add_leaf_before{3}: (D-P @os)
Suppose you have a query where the same node is mentioned several times. You may be tempted to flag the node every time it appears in the query, as below:
WRONG! query: (NP* iDoms [1]{1}Q) AND (NP* iDoms [2]{2}Q) AND ([1]{1}Q iPrecedes [2]{2}Q) add_internal_node{1, 2}: QP
The problem with this is that CorpusSearch only needs to have the arguments flagged once, and repeating the flags just increases the possibility of error (for instance, the same flag might wind up referring to two different nodes). For this reason, CorpusSearch ignores repeated flags, and issues a warning when they are encountered. The above query produces these WARNING messages:
WARNING! Subsequent flag {1} has been ignored. WARNING! Subsequent flag {2} has been ignored.
This version of the query is preferred:
query: (NP* iDoms [1]{1}Q) AND (NP* iDoms [2]{2}Q) AND ([1]Q iPrecedes [2]Q) add_internal_node{1, 2}: QP
The simplest way to change a tree is to change labels, leaving the structure intact. CS has the following label-changing revision functions:
node: IP* query: ({1}NP-ACC iDoms N*) replace_label{1}: BULLWINKLE
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-ACC (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (BULLWINKLE (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
node: $ROOT query: ({1}WPRO iDoms what|What) AND (WPRO iPrecedes IP*) append_label{1}: -THAT
( (IP-MAT (CONJ but) (CP-QUE (WNP-1 (WPRO what)) (IP-SUB (NP-TMP *T*-1) (NP-SBJ (PRO I)) (MD shall) (VB returne) (NP-DIR (N home)))) (NP-SBJ (PRO I)) (BEP am) (ADJP (NP-MSR (D a) (Q little)) (ADJ doubtfull)) (. .)) (ID KNYVETT-1630,94.268))
/~* but what I shall returne home I am a little doubtfull. (KNYVETT-1630,94.268) *~/ /* 1 IP-MAT: 6 WPRO, 7 what, 8 IP-SUB */ ( (IP-MAT (CONJ but) (CP-QUE (WNP-1 (WPRO-THAT what)) (IP-SUB (NP-TMP *T*-1) (NP-SBJ (PRO I)) (MD shall) (VB returne) (NP-DIR (N home)))) (NP-SBJ (PRO I)) (BEP am) (ADJP (NP-MSR (D a) (Q little)) (ADJ doubtfull)) (. .)) (ID KNYVETT-1630,94.268))
node: $ROOT ignore_nodes: null query: ([1]{1}, iDoms [2],) AND ([1], iPres *-PRN) AND (*-PRN iPres [3],) AND ([3]{2}, iDoms [4],) prepend_label{1}: PRN- prepend_label{2}: PRN-
( (IP-MAT (CONJ &) (NP-SBJ (PRO$ my) (NS horsses)) (, ,) (IP-MAT-PRN (NP-SBJ (PRO I)) (VBP thinke)) (, ,) (MD $wil) (BE $be) (CODE {TEXT:wilbe}) (VBN gone) (PP (P to) (NP (N morrowe))) (. ,)) (ID KNYVETT-1630,93.228))
/~* & my horsses, I thinke, $wil $be gone to morrowe, (KNYVETT-1630,93.228) *~/ /* 1 IP-MAT: 9 ,, 10 ,, 11 IP-MAT-PRN, 17 ,, 18 , */ ( (IP-MAT (CONJ &) (NP-SBJ (PRO$ my) (NS horsses)) (PRN-, ,) (IP-MAT-PRN (NP-SBJ (PRO I)) (VBP thinke)) (PRN-, ,) (MD $wil) (BE $be) (CODE {TEXT:wilbe}) (VBN gone) (PP (P to) (NP (N morrowe))) (. ,)) (ID KNYVETT-1630,93.228))
node: $ROOT query: (ADVP* iDoms {1}ADV+*) pre_crop_label{1}: +
( (IP-MAT (CONJ &) (NP-SBJ (Q many)) (VBD lost) (NP-ACC (PRO$ ther) (NS lifes)) (PP (PP (P aboute) (NP (D the) (NS Teames))) (CONJP (CONJ &) (ADVP-LOC (ADV+WADV elsewher)))) (. .)) (ID KNYVETT-1630,87.21))
/~* & many lost ther lifes aboute the Teames & elsewher. (KNYVETT-1630,87.21) *~/ /* 1 IP-MAT: 26 ADVP-LOC, 27 ADV+WADV */ ( (IP-MAT (CONJ &) (NP-SBJ (Q many)) (VBD lost) (NP-ACC (PRO$ ther) (NS lifes)) (PP (PP (P aboute) (NP (D the) (NS Teames))) (CONJP (CONJ &) (ADVP-LOC (WADV elsewher)))) (. .)) (ID KNYVETT-1630,87.21))
This query:
node: IP* query: ({1}NP-ACC iDoms N*) post_crop_label{1}: - append_label{1}: -OBJ
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-ACC (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-OBJ (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
It is possible for the above changes to result in an illegal tree, that is, a tree with crossing branches, or a tree containing an internal node with no leaf descendants (a pollarded tree?). In such a case, a warning is given and the tree is not changed.
node: IP* query: (PP iDoms {1}P) add_leaf_before{1}: (X BULLWINKLE) add_leaf_after{1}: (Q ROCKY)
( (IP-MAT (PP (P Unto) (NP (D that))) (NP-SBJ (PRO they) (QP (Q all))) (ADVP (ADV well)) (VBD accordyd)) (ID CMMALORY,5.110) )
/~* BULLWINKLE Unto ROCKY that they all well accordyd (CMMALORY,5.110) *~/ /* 1 IP-MAT: 2 PP, 3 P */ ( (IP-MAT (PP (X BULLWINKLE) (P Unto) (Q ROCKY) (NP (D that))) (NP-SBJ (PRO they) (QP (Q all))) (ADVP (ADV well)) (VBD accordyd)) (ID CMMALORY,5.110))
node: IP* query: (NP iDoms {1}D) move_up_node{1}:
( (IP-MAT (ADVP-TMP (ADV Thenne)) (PP (P in) (NP (Q all) (N haste))) (VBD came) (NP-SBJ (NPR Uther)) (PP (P with) (NP (D a) (ADJ grete) (N hoost)))) (ID CMMALORY,3.37))
( (IP-MAT (ADVP-TMP (ADV Thenne)) (PP (P in) (NP (Q all) (N haste))) (VBD came) (NP-SBJ (NPR Uther)) (PP (P with) (D a) (NP (ADJ grete) (N hoost)))) (ID CMMALORY,3.37))
node: IP* query: ({1}Q iprecedes {2}ADJ) move_up_nodes{1, 2}:
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (NP-ACC (Q no) (ADJ greate) (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
( (IP-MAT (NP-SBJ (PRO You)) (MD must) (NEG not) (VB exspecte) (Q no) (ADJ greate) (NP-ACC (NS matters)) (NP-TMP (D this) (N time)) (. ,)) (ID KNYVETT-1630,87.25))
node: IP* query: ({1}MD HasSister {2}VB) add_internal_node{1, 2}: MDVP
( (IP-MAT-SPE (' ') (NP-VOC (N Sir)) (, ,) (' ') (IP-MAT-PRN (VBD said) (NP-SBJ (NPR Ulfius))) (, ,) (' ') (NP-SBJ (PRO he)) (MD wille) (NEG not) (VB dwelle) (NP-MSR (ADJ long)) (E_S .) (' ')) (ID CMMALORY,3.66))
( (IP-MAT-SPE (' ') (NP-VOC (N Sir)) (, ,) (' ') (IP-MAT-PRN (VBD said) (NP-SBJ (NPR Ulfius))) (, ,) (' ') (NP-SBJ (PRO he)) (MDVP (MD wille) (NEG not) (VB dwelle)) (NP-MSR (ADJ long)) (E_S .) (' ')) (ID CMMALORY,3.66))
To add an internal node spanning just one existing node, list the same index twice. For instance, this query:
query: (IP* iDoms {1}BE*) add_internal_node{1, 1}: VP
( (IP-MAT-SPE (CONJ but) (ADVP (ADV truly)) (NP-VOC (N gossip)) (NP-SBJ (PRO you)) (BEP are) (ADJP (ADJ welcome)) (. ,)) (ID DELONEY,69.9))
/~* but truly gossip you are welcome, (DELONEY,69.9) *~/ /* 1 IP-MAT-SPE: 1 IP-MAT-SPE, 13 BEP */ ( (IP-MAT-SPE (CONJ but) (ADVP (ADV truly)) (NP-VOC (N gossip)) (NP-SBJ (PRO you)) (VP (BEP are)) (ADJP (ADJ welcome)) (. ,)) (ID DELONEY,69.9))
If the indicated leaf is an only child, a warning is given and the tree is not changed.
This query:
node: IP* ignore_nodes: null query: (NP* iDoms {1}\**) delete_leaf{1}:
( (CP-QUE-SPE (INTJP (INTJ Tush)) (NP-VOC (N woman)) (, ,) (WNP-1 (WPRO what)) (IP-SUB-SPE (NP-ACC *T*-1) (VBP talke) (NP-SBJ (PRO you)) (PP (P of) (NP (D that)))) (. ?)) (ID DELONEY,70.40))
/~* Tush woman, what talke you of that? (DELONEY,70.40) *~/ /* 13 IP-SUB-SPE: 14 NP-ACC, 15 *T*-1 */ ( (CP-QUE-SPE (INTJP (INTJ Tush)) (NP-VOC (N woman)) (, ,) (WNP-1 (WPRO what)) (IP-SUB-SPE (VBP talke) (NP-SBJ (PRO you)) (PP (P of) (NP (D that)))) (. ?)) (ID DELONEY,70.40))
This query:
node: FRAG* query: ({1}ADVP* iDoms ADV*) delete_node{1}:
( (FRAG-SPE (WNP (WPRO What)) (ADVP-TMP (ADV neuer)) (NP (D a) (ADJ great) (N belly)) (ADVP (ADV yet)) (. ?)) (ID DELONEY,69.5))
/~* What neuer a great belly yet? (DELONEY,69.5) *~/ /* 1 FRAG-SPE: 5 ADVP-TMP, 6 ADV 1 FRAG-SPE: 15 ADVP, 16 ADV */ ( (FRAG-SPE (WNP (WPRO What)) (ADV neuer) (NP (D a) (ADJ great) (N belly)) (ADV yet) (. ?)) (ID DELONEY,69.5))
This query:
node: IP* query: ({1}CONJP* iDoms CONJ*) delete{1}:
( (IP-MAT (NP-SBJ (PRO I)) (VBP hear) (CP-THT (C 0) (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery)) (CONJP-1 (CONJ and) (NP (D y=e=) (N Wardon) (PP (P of) (NP (NPR All) (NPRS Souls)))))) (BEP is) (ADJP (ADJ dead)))) (. .)) (ID ALHATTON,2,242.21))
( (IP-MAT (NP-SBJ (PRO I)) (VBP hear) (CP-THT (C 0) (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery))) (BEP is) (ADJP (ADJ dead)))) (. .)) (ID ALHATTON,2,242.21))
For instance, it might be useful in a study of relative clauses to study correlations between the properties of a relative clause and of its head noun phrase. In the example below, the properties of the relative clause and of the noun phrase are captured in the coding strings CODING-CP-REL* and CODING-NP*. The query concatenates the coding string specified by the index {2} to the one specified by the index {1}. Concatenation in the other order ({1, 2}) is of course also possible.
node: $ROOT copy_corpus: t query: (NP* iDoms CODING-NP*) AND (CODING-NP* iDoms [1]{2}.*) AND (NP* iDoms CP-REL*) AND (CP-REL* iDoms CODING-CP-REL*) AND (CODING-CP-REL iDoms [2]{1}.*) concat{2, 1}:
As an example, here's an input sentence:
( (IP-MAT (CONJ And) (ADVP-TMP (ADV than)) (NP-SBJ (PRO he)) (VBD seyde) (, ,) (' ') (CP-QUE-SPE (NP-VOC (NPR Sir) (NPR Melyas)) (, ,) (WNP (WPRO who)) (IP-SUB-SPE (HVP hath) (VBN wounded) (NP-OB1 (PRO you)))) (E_S ?)) (ID CMMALORY,645.4103))
query: (CP* iDoms {1}WNP*) AND (CP* iDoms IP-SUB*) AND (IP-SUB* iDomsFirst {2}.*) trace_before{1, 2}: (NP-SBJ *T*)
( (IP-MAT (CONJ And) (ADVP-TMP (ADV than)) (NP-SBJ (PRO he)) (VBD seyde) (, ,) (' ') (CP-QUE-SPE (NP-VOC (NPR Sir) (NPR Melyas)) (, ,) (WNP-1 (WPRO who)) (IP-SUB-SPE (NP-SBJ *T*-1) (HVP hath) (VBN wounded) (NP-OB1 (PRO you)))) (E_S ?)) (ID CMMALORY,645.4103))
Here's an example input sentence:
( (IP-MAT-SPE (NP (ADJ Good)) (NP-VOC (N Gossip)) (VB *) (, ,) (ADVP (ADV now)) (PP (P by) (NP (PRO$ my) (ADV truely))) (NP-SBJ (PRO I)) (BEP am) (ADJP (ADJ glad) (IP-INF-SPE (TO to) (VB see) (NP-ACC (PRO you)) (PP (P in) (NP (N health))))) (. .)) (ID DELONEY,69.3))
query: ({2}PP iDoms NP) AND (NP iDoms {1}PRO*) move_to{1, 2}:
( (IP-MAT-SPE (NP (ADJ Good)) (NP-VOC (N Gossip)) (VB *) (, ,) (ADVP (ADV now)) (PP (P by) (PRO$ my) (NP (ADV truely))) (NP-SBJ (PRO I)) (BEP am) (ADJP (ADJ glad) (IP-INF-SPE (TO to) (VB see) (NP-ACC (PRO you)) (PP (P in) (NP (N health))))) (. .)) (ID DELONEY,69.3))
( (IP-MAT (D the) (NP-SBJ (ADJ basic) (N problem)) (BEP is) (NP-OB1 (D this)) (E_S .)))
query: ({1}D iPrecedes {2}NP*) AND (D hasSister NP*) extend_span{2, 1}:
( (IP-MAT (NP-SBJ (D the) (ADJ basic) (N problem)) (BEP is) (NP-OB1 (D this)) (E_S .)))