Contents of this chapter:
-
What is CorpusDraw?
Input to CorpusDraw
The CorpusDraw graphical user interface
-
the tree display window
the text window
editing buttons
display buttons
- Output of CorpusDraw
-
the tree display window
CorpusDraw displays the tree structures assigned to sentences in a parsed corpus and allows an annotator to edit these trees in the course of corpus construction or revision. It can also be used to display parse trees for presentation purposes.
CorpusDraw accepts the following command line arguments:
A source file is any file that contains parsed, labelled sentences. This could be a file from the Penn Parsed Corpora of Historical English or from another parsed corpus.
The command file contains a query, which describes a structure that every sentence must meet to be displayed by CD. The use of such a command file allows the annotator to view only those sentences relevant for a given editing change being implemented on the corpus.
If CorpusDraw is given a file of legal tags, it will constrain node labels, both phrasal and part-of-speech tags, to come from the list in this file. This constraint prevents the accidental introduction of ill-formed labels.
To create a file of legal tags, the following lines should be inserted into a command file with the name "tags.q":
corpus_encoding: UTF-8
make_tag_list: t
The corpus_encoding line should be changed if the corpus font encoding is other than UTF-8. The command file may not contain any other contents. When CorpusSearch is run on the entire parsed corpus with this command file, a file with the name "tags.tag" is created. This is the legal tags file used by CorpusDraw.
The CorpusDraw GUI is intended to be largely self-explanatory. The display, which can be seen by clicking here, contains the follow parts:
The editing buttons allow the annotator to change node labels, to move nodes and their descendants around in the tree, to coindex nodes, and to add empty categories the various types specified in the legal tags file. CorpusDraw will not permit the annotator to accidently change the order of words in the sentence or to delete any.
The actions controlled by the editing buttons can also be triggered by the use of shortcuts, both keystrokes and mouse clicks. Some of these require a sequence of keystrokes or clicks. A current list of these shortcuts can be found in the next section of the manual. Here is a QuickTime movie of these shortcuts in action:
When CorpusDraw is displaying a file with the name "foo.psd" and the file is saved after certain changes are made, the saved file has the name "foo.psd.new." This change in name guarantees that changes can easily be discarded.