Contents of this chapter:

paths and shells
invoking CorpusSearch
your query/output directory

paths and shells

In the description below, we assume that CorpusSearch is installed in the top-level directory and that other files also have locations that are simply specified. To put the program and other files into convenient locations and to define aliases that make running the program easier, it is necessary to learn something of how paths work on whatever system you are running. This is especially important if you are running CorpusSearch on a multiuser machine. The documentation for your operating system will contain a complete discussion of how paths work. The syntax differs somewhat across operating systems, though the concepts are the same.

If you are using a unix-derived system, including linux and Mac OS X, you should also learn something about what shells are and how they work. We assume here that you are using the c-shell, but you may find yourself in a Bourne shell (bash) environment, where the syntax of path specfication and other matters is a bit different. Again, documentation on shells is widely available.

invoking CorpusSearch

CorpusSearch now comes in a single jar file called "CS.jar," which must be installed in an appropriate directory (folder) of your computer. We will assume that you have installed CS.jar into the directory "FOO," which is at the root level of your hard disk. CorpusSearch can then be invoked by typing the following line at a command line prompt (here "%>") in a terminal window:

%>java -classpath /FOO/CS.jar csearch/CorpusSearch
A terminal window can be obtained under any flavor of unix/linux by launching an xterm under X11. On a Macintosh running OS X, it can also be obtained by launching the Terminal program. Under Windows, depending on the version of the operating system you are running, use Start>Run>cmd or Start>Run>command to launch the appropriate window. The -classpath switch can be left out if your shell initialization file or equivalent specifies the classpath.

Because Windows path syntax differs slightly from unix path syntax, you must invoke CorpusSearch under Windows with the following line, assuming that you have installed it in a directory "FOO" at the top of the C:\ drive:

%>java -classpath "C:\FOO\CS.jar" csearch/CorpusSearch
Note that, under Windows, the direction of the slashes changes between the class path and the command invocation.

To save typing, the following alias can be entered into your .cshrc file if you are running any variant of the c-shell on any unix system, including Mac OS X. An equivalent form exists for the bash shell.

alias CS 'java -classpath /FOO/CS.jar csearch/CorpusSearch'
If you put CorpusSearch anywhere but in a top-level directory, or if you install it on a multi-user machine, you must include the entire path to the CS.jar file in any command that invokes it.

your query/output directory

Let us assume that you have a corpus in the directory "corpus" at the root of your hard disk. Make a new sister directory of "corpus"; you might call it "corpus_stuff". This directory will hold your query files (ending with ".q"), and your output files (ending with ".out").

Here's a CorpusSearch command using the query file "inversion.q," run from a directory called "corpus_stuff" on a unix machine:

%>CS inversion.q ../corpus/*
This command will search the entire corpus (because of the "/*" after "corpus"). The output will appear in a file called "inversion.out" in the corpus_stuff directory.

Be patient; a search of a million word corpus takes a few minutes, depending on the complexity of the query. To run a search in the background under unix, write "&" at the end of your command:

%>CS inversion.q ../corpus/* &