Contents of this chapter:

revision feature
don't repeat flags
label changes
replace_label
append_label
prepend_label
pre_crop_label
post_crop_label
structural changes
add_leaf_before
add_leaf_after
move_up_node
move_up_nodes
add_internal_node
delete_leaf
delete_node
delete_subtree
concat
trace_before
move_to
extend_span

revision feature

Revisions are implemented by standard CS queries that are supplemented with indices linking nodes in the query to revision instructions. The revision-related indices (henceforth, “flags”) are enclosed in curly brackets. Here is the general idea:

query: ({x}A function B) AND (C function {y}D)
revise{x}: info
revise{y}: info

The curly brackets around flags distinguish them from the indices that are relevant for same-instance, which are enclosed in square brackets. N.B.: Contrary to what one might expect, flags must follow any same-instance indices. In other words, curly brackets follow square brackets. The proper order of the two types of indices is illustrated in the following example.

node: IP*
copy_corpus: t

query:     ([1]NP iDoms [2]{1}NP)
       AND ([2]NP iDomsFirst DF)

append_label{1}: -PART

The above query transforms the old structure below into the corresponding new structure:

Old:     (NP (Q beaucoup)
             (NP (DF de) (NCPL livres))

New:     (NP (Q beaucoup)
             (NP-PART (DF de) (NCPL livres))

Note the "copy_corpus t" line in the preamble of the query, which yields an output file containing a full copy of the input as modified by the revision query. If this line is omitted or commented out, the output contains only tokens that match the query, as modified by the specified revisions. This option can be useful in developing and testing complex revision queries, as it allows the developer to home in on the relevant tokens.

Another example (this one without same-instance indices) comes from the history of the Tycho Brahe Corpus of historical Portuguese. Originally, portmanteau items like "dos” were treated as one word, but it was later decided to split such items into two pieces - in the case at hand, a preposition “d@", and a determiner “@os”.

Old:     (PP (P+D-P dos)
             (NP (ADJ-P grandes)
                 (N-P homens)

New:     (PP (P d@)
             (NP (D-P @os)
                 (ADJ-P grandes)
                 (N-P homens)

The change can be implemented with the following query file:

node: IP*
copy_corpus: t

query:     (PP iDoms {1}P+D-P)
       AND (P+D-P iDoms {2}dos)
       AND (P+D-P iPrecedes NP) 
       AND (P+D-P hasSister NP) 
       AND (NP iDomsFirst {3}*)

replace_label{1}: P
replace_label{2}: d@
add_leaf_before{3}: (D-P @os)

don't repeat flags

Suppose you have a query where the same node is mentioned several times. You may be tempted to flag the node every time it appears in the query, as below:

WRONG!

query:     (NP* iDoms [1]{1}Q)
       AND (NP* iDoms [2]{2}Q)
       AND ([1]{1}Q iPrecedes [2]{2}Q)
add_internal_node{1, 2}: QP

The problem with this is that CorpusSearch only needs to have the arguments flagged once, and repeating the flags just increases the possibility of error (for instance, the same flag might wind up referring to two different nodes). For this reason, CorpusSearch ignores repeated flags, and issues a warning when they are encountered. The above query produces these WARNING messages:

WARNING!  Subsequent flag {1} has been ignored.

WARNING!  Subsequent flag {2} has been ignored.

This version of the query is preferred:

query: (NP* iDoms [1]{1}Q)
       AND (NP* iDoms [2]{2}Q)
       AND ([1]Q iPrecedes [2]Q)
add_internal_node{1, 2}: QP

label changes

The simplest way to change a tree is to change labels, leaving the structure intact. CS has the following label-changing revision functions:

replace_label
replace_label{x}: new_label

append_label
append_label{x}: label_to_append

prepend_label
prepend_label{x}: label_to_prepend

post_crop_label
post_crop_label{x}: label_to_crop

pre_crop_label
pre_crop_label{x}: label_to_crop

replace_label

This query:
node: IP*
query:  ({1}NP-ACC iDoms N*)

replace_label{1}: BULLWINKLE
applied to this input:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-ACC (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,)) (ID KNYVETT-1630,87.25))
produces this output:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (BULLWINKLE (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,))
  (ID KNYVETT-1630,87.25))

append_label

This appends the given label to the flagged argument. This query:
node: $ROOT

query: ({1}WPRO iDoms what|What) AND (WPRO iPrecedes IP*)

append_label{1}: -THAT
applied to this input:
( (IP-MAT (CONJ but)
          (CP-QUE (WNP-1 (WPRO what))
                  (IP-SUB (NP-TMP *T*-1)
                          (NP-SBJ (PRO I))
                          (MD shall)
                          (VB returne)
                          (NP-DIR (N home))))
          (NP-SBJ (PRO I))
          (BEP am)
          (ADJP (NP-MSR (D a) (Q little))
                (ADJ doubtfull))
          (. .)) (ID KNYVETT-1630,94.268))
produces this output:
/~*
but what I shall returne home I am a little doubtfull.
(KNYVETT-1630,94.268)
*~/
/*
1 IP-MAT:  6 WPRO, 7 what, 8 IP-SUB
*/


( (IP-MAT (CONJ but)
          (CP-QUE (WNP-1 (WPRO-THAT what))
                  (IP-SUB (NP-TMP *T*-1)
                          (NP-SBJ (PRO I))
                          (MD shall)
                          (VB returne)
                          (NP-DIR (N home))))
          (NP-SBJ (PRO I))
          (BEP am)
          (ADJP (NP-MSR (D a) (Q little))
                (ADJ doubtfull))
          (. .))
  (ID KNYVETT-1630,94.268))

prepend_label

This prepends the given label to the flagged argument. This query:
node: $ROOT
ignore_nodes: null
query: ([1]{1}, iDoms [2],) AND ([1], iPres *-PRN)
       AND (*-PRN iPres [3],) AND ([3]{2}, iDoms [4],)

prepend_label{1}: PRN-
prepend_label{2}: PRN-
applied to this input:
( (IP-MAT (CONJ &)
          (NP-SBJ (PRO$ my) (NS horsses))
          (, ,)
          (IP-MAT-PRN (NP-SBJ (PRO I))
                      (VBP thinke))
          (, ,)
          (MD $wil)
          (BE $be)
          (CODE {TEXT:wilbe})
          (VBN gone)
          (PP (P to)
              (NP (N morrowe)))
          (. ,)) (ID KNYVETT-1630,93.228))
produces this output:
/~*
& my horsses, I thinke, $wil $be gone to morrowe,
(KNYVETT-1630,93.228)
*~/
/*
1 IP-MAT:  9 ,, 10 ,, 11 IP-MAT-PRN, 17 ,, 18 ,
*/


( (IP-MAT (CONJ &)
          (NP-SBJ (PRO$ my) (NS horsses))
          (PRN-, ,)
          (IP-MAT-PRN (NP-SBJ (PRO I))
                      (VBP thinke))
          (PRN-, ,)
          (MD $wil)
          (BE $be)
          (CODE {TEXT:wilbe})
          (VBN gone)
          (PP (P to)
              (NP (N morrowe)))
          (. ,))
  (ID KNYVETT-1630,93.228))

pre_crop_label

This crops the label ending at the given character. This query:
node: $ROOT

query: (ADVP* iDoms {1}ADV+*)

pre_crop_label{1}: +
applied to this input:
( (IP-MAT (CONJ &)
          (NP-SBJ (Q many))
          (VBD lost)
          (NP-ACC (PRO$ ther) (NS lifes))
          (PP (PP (P aboute)
                  (NP (D the) (NS Teames)))
              (CONJP (CONJ &)
                     (ADVP-LOC (ADV+WADV elsewher))))
          (. .)) (ID KNYVETT-1630,87.21))
results in this output:
/~*
& many lost ther lifes aboute the Teames & elsewher.
(KNYVETT-1630,87.21)
*~/
/*
1 IP-MAT:  26 ADVP-LOC, 27 ADV+WADV
*/

( (IP-MAT (CONJ &)
          (NP-SBJ (Q many))
          (VBD lost)
          (NP-ACC (PRO$ ther) (NS lifes))
          (PP (PP (P aboute)
                  (NP (D the) (NS Teames)))
              (CONJP (CONJ &)
                     (ADVP-LOC (WADV elsewher))))
          (. .))
  (ID KNYVETT-1630,87.21))

post_crop_label

This crops the label beginning at the indicated character.

This query:

node: IP*
query:  ({1}NP-ACC iDoms N*)

post_crop_label{1}: -
append_label{1}: -OBJ
applied to this input:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-ACC (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,)) (ID KNYVETT-1630,87.25))
produces this output:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-OBJ (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,))
  (ID KNYVETT-1630,87.25))

structural changes

CS has the following structure-changing revision functions. Use them with care, and always keep a backup copy of your original file.

add_leaf_before
add_leaf_before{x}: (pos text)

add_leaf_after
add_leaf_after{x}: (pos text)

move_up_node
move_up_node{x}:

move_up_nodes
move_up_nodes{x, y}:

add_internal_node
add_internal_node{x, y}: new_label

delete_leaf
delete_leaf{x}:

delete_node
delete_node{x}:

delete_subtree
delete_subtree{x}:

concat
concat{x,y}:

trace_before
trace_before{x,y}:

move_to
move_to{x,y}:

extend_span
extend_span{x,y}:

It is possible for the above changes to result in an illegal tree, that is, a tree with crossing branches, or a tree containing an internal node with no leaf descendants (a pollarded tree?). In such a case, a warning is given and the tree is not changed.

add_leaf_before, add_leaf_after

This query:
node: IP*
query:  (PP iDoms {1}P)

add_leaf_before{1}: (X BULLWINKLE)
add_leaf_after{1}: (Q ROCKY)
applied to this input:
( (IP-MAT (PP (P Unto)
              (NP (D that)))
          (NP-SBJ (PRO they)
                  (QP (Q all)))
          (ADVP (ADV well))
                (VBD accordyd))
  (ID CMMALORY,5.110) )
produces this output:
/~*
BULLWINKLE Unto ROCKY that they all well accordyd
(CMMALORY,5.110)
*~/
/*
1 IP-MAT:  2 PP, 3 P
*/

( (IP-MAT (PP (X BULLWINKLE)
              (P Unto)
              (Q ROCKY)
              (NP (D that)))
          (NP-SBJ (PRO they)
                  (QP (Q all)))
          (ADVP (ADV well))
          (VBD accordyd))
  (ID CMMALORY,5.110))

move_up_node

This query:
node: IP*
query:  (NP iDoms {1}D)

move_up_node{1}:
applied to this input:
( (IP-MAT (ADVP-TMP (ADV Thenne))
          (PP (P in)
              (NP (Q all) (N haste)))
          (VBD came)
          (NP-SBJ (NPR Uther))
          (PP (P with)
              (NP (D a) (ADJ grete) (N hoost))))
   (ID CMMALORY,3.37))
produces this output:
( (IP-MAT (ADVP-TMP (ADV Thenne))
          (PP (P in)
              (NP (Q all) (N haste)))
          (VBD came)
          (NP-SBJ (NPR Uther))
          (PP (P with)
              (D a)
              (NP (ADJ grete) (N hoost))))
  (ID CMMALORY,3.37))
Notice that the direction of movement is constrained by word order. If the node to move is a middle or only child, a warning is given and the tree is not changed.

move_up_nodes

This query:
node: IP*
query:  ({1}Q iprecedes {2}ADJ)

move_up_nodes{1, 2}:
applied to this input:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-ACC (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,)) (ID KNYVETT-1630,87.25))
produces this output:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (Q no)
          (ADJ greate)
          (NP-ACC (NS matters))
          (NP-TMP (D this) (N time))
          (. ,))
  (ID KNYVETT-1630,87.25))
If the indicated move would leave an internal node with no leaf descendants, a warning is given and the tree is not changed.

add_internal_node

This query:
node: IP*
query:  ({1}MD HasSister {2}VB)

add_internal_node{1, 2}: MDVP
applied to this input:
( (IP-MAT-SPE (' ')
              (NP-VOC (N Sir))
              (, ,)
              (' ')
              (IP-MAT-PRN (VBD said)
                          (NP-SBJ (NPR Ulfius)))
              (, ,)
              (' ')
              (NP-SBJ (PRO he))
              (MD wille)
              (NEG not)
              (VB dwelle)
              (NP-MSR (ADJ long))
              (E_S .)
              (' '))
  (ID CMMALORY,3.66))
produces this output:
( (IP-MAT-SPE (' ')
              (NP-VOC (N Sir))
              (, ,)
              (' ')
              (IP-MAT-PRN (VBD said)
                          (NP-SBJ (NPR Ulfius)))
              (, ,)
              (' ')
              (NP-SBJ (PRO he))
              (MDVP (MD wille) (NEG not) (VB dwelle))
              (NP-MSR (ADJ long))
              (E_S .)
              (' '))
  (ID CMMALORY,3.66))
If the addition of the indicated node would produce crossing branches in the tree, a warning is given and the tree is not changed.

To add an internal node spanning just one existing node, list the same index twice. For instance, this query:

query: (IP* iDoms {1}BE*)

add_internal_node{1, 1}: VP
applied to this input:
( (IP-MAT-SPE (CONJ but)
              (ADVP (ADV truly))
              (NP-VOC (N gossip))
              (NP-SBJ (PRO you))
              (BEP are)
              (ADJP (ADJ welcome))
              (. ,))
  (ID DELONEY,69.9))
produces this output:
/~*
but truly gossip you are welcome,
(DELONEY,69.9)
*~/
/*
1 IP-MAT-SPE:  1 IP-MAT-SPE, 13 BEP
*/
( (IP-MAT-SPE (CONJ but)
              (ADVP (ADV truly))
              (NP-VOC (N gossip))
              (NP-SBJ (PRO you))
              (VP (BEP are))
              (ADJP (ADJ welcome))
              (. ,))
  (ID DELONEY,69.9))

delete_leaf

The argument specified in the query can match either a part of speech or text node: in either case, the entire part-of-speech/text pair is deleted.

If the indicated leaf is an only child, a warning is given and the tree is not changed.

This query:

node: IP*
ignore_nodes: null
query: (NP* iDoms {1}\**)

delete_leaf{1}:
applied to this input:
( (CP-QUE-SPE (INTJP (INTJ Tush))
              (NP-VOC (N woman))
              (, ,)
              (WNP-1 (WPRO what))
              (IP-SUB-SPE (NP-ACC *T*-1)
                          (VBP talke)
                          (NP-SBJ (PRO you))
                          (PP (P of)
                              (NP (D that))))
              (. ?)) (ID DELONEY,70.40))
produces this output:
/~*
Tush woman, what talke you of that?
(DELONEY,70.40)
*~/
/*
13 IP-SUB-SPE:  14 NP-ACC, 15 *T*-1
*/

( (CP-QUE-SPE (INTJP (INTJ Tush))
              (NP-VOC (N woman))
              (, ,)
              (WNP-1 (WPRO what))
              (IP-SUB-SPE (VBP talke)
                          (NP-SBJ (PRO you))
                          (PP (P of)
                              (NP (D that))))
              (. ?))
  (ID DELONEY,70.40))

delete_node

This is what syntacticians call "pruning". An internal node is deleted, but its descendants remain.

This query:

node: FRAG*

query: ({1}ADVP* iDoms ADV*)

delete_node{1}:
applied to this input:
( (FRAG-SPE (WNP (WPRO What))
            (ADVP-TMP (ADV neuer))
            (NP (D a) (ADJ great) (N belly))
            (ADVP (ADV yet))
            (. ?)) (ID DELONEY,69.5))
yields this output:
/~*
What neuer a great belly yet?
(DELONEY,69.5)
*~/
/*
1 FRAG-SPE:  5 ADVP-TMP, 6 ADV
1 FRAG-SPE:  15 ADVP, 16 ADV
*/

( (FRAG-SPE (WNP (WPRO What))
            (ADV neuer)
            (NP (D a) (ADJ great) (N belly))
            (ADV yet)
            (. ?))
  (ID DELONEY,69.5))

delete_subtree

This deletes the indicated node and all its descendants.

This query:

node: IP*
query:  ({1}CONJP* iDoms CONJ*)

delete{1}:
applied to this input:
( (IP-MAT (NP-SBJ (PRO I))
          (VBP hear)
          (CP-THT (C 0)
                  (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery))
                                  (CONJP-1 (CONJ and)
                                           (NP (D y=e=)
					   (N Wardon)
                                           (PP (P of)
                                               (NP (NPR All) (NPRS
          Souls))))))
                          (BEP is)
                          (ADJP (ADJ dead))))
          (. .)) (ID ALHATTON,2,242.21))
results in this output:
( (IP-MAT (NP-SBJ (PRO I))
          (VBP hear)
          (CP-THT (C 0)
                  (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery)))
                          (BEP is)
                          (ADJP (ADJ dead))))
          (. .))
  (ID ALHATTON,2,242.21))

concat

The command "concat" concatenates the words dominated by two part-of-speech tags. It is useful primarily and perhaps only for concatenating coding strings, which formally are orthographic words dominated by the tag CODING-*, where * is some node label (e.g., IP-MAT, NP-OB1, etc.). Coding strings are constrained to contain information about the structures dominated by the node boundary where they are inserted, but "concat" allows information from different parts of a sentence to appear in the same coding string.

For instance, it might be useful in a study of relative clauses to study correlations between the properties of a relative clause and of its head noun phrase. In the example below, the properties of the relative clause and of the noun phrase are captured in the coding strings CODING-CP-REL* and CODING-NP*. The query concatenates the coding string specified by the index {2} to the one specified by the index {1}. Concatenation in the other order ({1, 2}) is of course also possible.

	node: $ROOT

	copy_corpus: t

	query:           (NP* iDoms CODING-NP*)
                   AND (CODING-NP* iDoms [1]{2}.*)
                   AND (NP* iDoms CP-REL*)
                   AND (CP-REL* iDoms CODING-CP-REL*)
                   AND (CODING-CP-REL iDoms [2]{1}.*)

	concat{2, 1}:

Note: The coding strings themselves are specified by the expression ".*" and are distinguished from one another in the usual way by the indices in square brackets. Omitting the indices in square brackets leads to a failure of the "concat" query, since it is impossible for the same node to have more than one mother. For further discussion of square bracket indices, see the discussion of same instance.

trace_before

The command "trace_before" adds a trace before the node flagged {2} and at the same time coindexes the trace with the node flagged {1}.

As an example, here's an input sentence:

( (IP-MAT (CONJ And)
          (ADVP-TMP (ADV than))
          (NP-SBJ (PRO he))
          (VBD seyde)
          (, ,)
          (' ')
          (CP-QUE-SPE (NP-VOC (NPR Sir) (NPR Melyas))
                      (, ,)
                      (WNP (WPRO who))
                      (IP-SUB-SPE (HVP hath)
                                  (VBN wounded)
                                  (NP-OB1 (PRO you))))
          (E_S ?)) (ID CMMALORY,645.4103))
Apply this query:
query: (CP* iDoms {1}WNP*) AND
       (CP* iDoms IP-SUB*) AND
       (IP-SUB* iDomsFirst {2}.*)

trace_before{1, 2}: (NP-SBJ *T*)
and get this output sentence:
( (IP-MAT (CONJ And)
          (ADVP-TMP (ADV than))
          (NP-SBJ (PRO he))
          (VBD seyde)
          (, ,)
          (' ')
          (CP-QUE-SPE (NP-VOC (NPR Sir) (NPR Melyas))
                     (, ,)
                     (WNP-1 (WPRO who))
                     (IP-SUB-SPE (NP-SBJ *T*-1)
                                 (HVP hath)
                                 (VBN wounded)
                                 (NP-OB1 (PRO you))))
         (E_S ?)) (ID CMMALORY,645.4103))

move_to

The command "move_to" moves a source node (flagged {1}) to become a daughter of a target node (flagged {2}). Any combination of source and target may be used, as long as the result is a legitimate tree.

Here's an example input sentence:

( (IP-MAT-SPE (NP (ADJ Good))
              (NP-VOC (N Gossip))
	          (VB *)
              (, ,)
              (ADVP (ADV now))
              (PP (P by)
                  (NP (PRO$ my) (ADV truely)))
              (NP-SBJ (PRO I))
              (BEP am)
              (ADJP (ADJ glad)
                    (IP-INF-SPE (TO to)
                                (VB see)
                                (NP-ACC (PRO you))
                                (PP (P in)
                                    (NP (N health)))))
              (. .))
 (ID DELONEY,69.3))
Apply this query:
query:  ({2}PP iDoms NP) AND (NP iDoms {1}PRO*)

move_to{1, 2}:
and get this output sentence:
( (IP-MAT-SPE (NP (ADJ Good))
	          (NP-VOC (N Gossip))
	          (VB *)
	          (, ,)
	          (ADVP (ADV now))
	          (PP (P by)
		      (PRO$ my)
		      (NP (ADV truely)))
	          (NP-SBJ (PRO I))
	          (BEP am)
	          (ADJP (ADJ glad)
		      (IP-INF-SPE (TO to)
		                  (VB see)
		                  (NP-ACC (PRO you))
		                  (PP (P in)
		                  (NP (N health)))))
              (. .))
 (ID DELONEY,69.3))

extend_span

The command "extend_span" extends the span of some constituent over an immediately adjacent sister. Here's an example input sentence:
       ( (IP-MAT (D the)
                 (NP-SBJ (ADJ basic)
		 	 (N problem))
		 (BEP is)
		 (NP-OB1 (D this))
		 (E_S .)))
Apply this query (note the order of the arguments in "extend_span"):
      query: ({1}D iPrecedes {2}NP*) AND
             (D hasSister NP*)

      extend_span{2, 1}:
and get this output sentence:
    ( (IP-MAT (NP-SBJ (D the) 
           	      (ADJ basic)
		      (N problem))
      	      (BEP is)
	      (NP-OB1 (D this))
	      (E_S .)))