CorpusSearch 2 is a Java program that supports research in corpus linguistics. It is useful both for the construction of syntactically annotated (parsed) corpora and for searching them. Running CorpusSearch on an appropriately annotated corpus a user can automatically:
- find and count lexical and syntactic configurations of any complexity
- correct systematic errors
- code the linguistic features of corpus sentences for later statistical analysis
Both the input and output files of CorpusSearch are ordinary text files, with syntactic annotations in the Penn-Treebank format.
CorpusSearch 2 runs under any Java-supported operating system, including Linux, Macintosh, Unix and Windows. It requires Java 2, version 1.3 or later. In addition to being downloadable from this site, CorpusSearch is distributed with the Penn-Helsinki Parsed Corpora of Historical English.