Contents of this chapter:

What is CorpusDraw?
Input to CorpusDraw
source file(s)
command file
file of legal tags
The CorpusDraw graphical user interface
the tree display window
the text window
editing buttons
display buttons
Output of CorpusDraw

What is CorpusDraw?

CorpusDraw displays the tree structures assigned to sentences in a parsed corpus and allows an annotator to edit these trees in the course of corpus construction or revision. It can also be used to display parse trees for presentation purposes.

Input to CorpusDraw

CorpusDraw accepts the following command line arguments:

  1. an optional specification of structural constraints on what sentences to display (command file).
  2. the corpus file to display (source file).
In addition, CorpusDraw will read in a file of legal syntactic and part-of-speech tags, if one is supplied. The corpus file, command file, and CorpusSearch program itself must reside in different directories. The recommended directory configuration has a root corpus directory with three sister subdirectories, one for CorpusSearch itself, one for the corpus source files and one for the command files and for the file of legal tags. When starting CorpusDraw, the current directory should normallybe the root directory of the corpus, with the path to the corpus file being worked on specified on the command line.

source file

A source file is any file that contains parsed, labelled sentences. This could be a file from the Penn Parsed Corpora of Historical English or from another parsed corpus.

command file

The command file contains a query, which describes a structure that every sentence must meet to be displayed by CD. The use of such a command file allows the annotator to view only those sentences relevant for a given editing change being implemented on the corpus.

file of legal tags

If CorpusDraw is given a file of legal tags, it will constrain node labels, both phrasal and part-of-speech tags, to come from the list in this file. This constraint prevents the accidental introduction of ill-formed labels.

To create a file of legal tags, the following lines should be inserted into a command file with the name "tags.q":

corpus_encoding: UTF-8

make_tag_list: t

The corpus_encoding line should be changed if the corpus font encoding is other than UTF-8. The command file may not contain any other contents. When CorpusSearch is run on the entire parsed corpus with this command file, a file with the name "tags.tag" is created. This is the legal tags file used by CorpusDraw.

The CorpusDraw graphical user interface

The CorpusDraw GUI is intended to be largely self-explanatory. The display, which can be seen by clicking here, contains the follow parts:

The scroll bars at the bottom and on the right edge of the tree display window allow different parts of the tree to be centered in the window. This can also be accomplished by clicking on the word in the text window that the user wishes to place in the center of the display. The arrows at left of the editing button row move the display from one sentence to the next.

The editing buttons allow the annotator to change node labels, to move nodes and their descendants around in the tree, to coindex nodes, and to add empty categories the various types specified in the legal tags file. CorpusDraw will not permit the annotator to accidently change the order of words in the sentence or to delete any.

The actions controlled by the editing buttons can also be triggered by the use of shortcuts, both keystrokes and mouse clicks. Some of these require a sequence of keystrokes or clicks. A current list of these shortcuts can be found in the next section of the manual. Here is a QuickTime movie of these shortcuts in action:

Output of CorpusDraw

When CorpusDraw is displaying a file with the name "foo.psd" and the file is saved after certain changes are made, the saved file has the name "foo.psd.new." This change in name guarantees that changes can easily be discarded.