Contents of this chapter:
CorpusDraw displays the tree structures assigned to sentences in a parsed corpus and allows an annotator to edit these trees in the course of corpus construction or revision. It can also be used to display parse trees for presentation purposes.
CorpusDraw is a module within the CorpusSearch program. On any computer where CorpusSearch has been downloaded and installed, CorpusDraw is also available. It has been used extensively under Linux and MacOS X but has not been tested under Windows.
CorpusDraw is invoked with the following command, where "/FOO" represents the path to the CorpusSearch .jar file
% java -classpath /FOO/CS.jar drawtree/CorpusDraw
An alias to this command can be included in a .cshrc or .bashrc file, as described for CorpusSearch itself in the installation chapter for the program.
CorpusDraw accepts a command file and a source file as its command line arguments:
A source file is any file that contains parsed, labelled sentences. This could be a file from the Penn Parsed Corpora of Historical English or from another parsed corpus.
The command file contains a query, which describes a structure that every sentence must meet to be displayed by CD. The use of such a command file allows the annotator to view only those sentences relevant for a given editing change being implemented on the corpus.
In order to prevent the accidental introduction of ill-formed labels, CorpusDraw can be given a file of all allowed tags (both phrasal and part-of-speech).
The file of allowed or "legal" tags is generated from existing parsed files by using a query file with the following command as its content:
It is possible to add a line concerning the font encoding of the corpus to the query file, as in the following sample, but that information is perhaps better included in the preferences file.
Like any other query file, the query file should have a .q extension.
Here is an example of how the legal tags query would be invoked:
CS queries/legaltags.q parsed/DONE/*.psd
The legal tags creation query outputs a file with the same basename as the query file, but with the .tag extension. (It also outputs a spurious empty .out file which should be discarded.)
Caution: The corpus tag set and hence the .tag file must not contain any tags that consist of or begin with a hyphen or a colon, since these characters function as delimiters.
When generated, the .tag file is placed into the same directory as its associated .q file. In order for CorpusDraw to read it, however, it must be moved to the directory from which CorpusDraw is invoked (ordinarily the directory above the one containing the parsed files). Note that CorpusDraw expects the directory from which it is invoked, the directory containing the parsed files, and the queries directory to be distinct. If they aren't, CorpusDraw will issue a warning.
When invoked, CorpusDraw automatically looks for a .tag file in the same directory. On opening the display, it displays a message containing the name of the .tag file when it succeeds or a warning if there is a problem with the .tag file (as when it contains illegal characters - see above). If no message is displayed, no .tag file has been read, and tag editing is not constrained.
The CorpusDraw GUI is intended to be largely self-explanatory. The display, which can be seen by clicking here, contains the follow parts:
The editing buttons allow the annotator:
The actions controlled by the editing buttons can also be triggered by the use of shortcuts, both keystrokes and mouse clicks. Some of these require a sequence of keystrokes or clicks. A current list of these shortcuts can be found in the last section of this chapter. Here is a QuickTime movie of these shortcuts in action:
When CorpusDraw is displaying a file with the name "foo.psd" and the file is saved after certain changes are made, the saved file has the name "foo.psd.new." This change in name guarantees that changes can easily be discarded.