Since these notes are generic to most workstations, it is impossible to be completely specific about the locations of files, etc. You will want to know where the nwchem and parallel executables reside. You may want to add this location to your shell's executable search path.
You may need to be aware of where the code expects to find the standard basis set library. The location is fixed at compile time and should have been set to something appropriate for your site. Without any special configuration, NWChem will look for the standard library in the source directory tree. This means that moving the source tree may confuse an existing executable. For most installations, however, we are configuring the library to live in the same directory as the nwchem and parallel executables. Should you run into problems in which NWChem cannot locate the standard basis library, you can easily work around it by using the "file" option on the basis set entry for each library basis set you need.
Since workstations vary widely in how much memory is available, the defaults may not be appropriate to your situation. It is advisable to check the defaults given in the manual and if necessary adjust them in the input deck. Remember that the memory specification is per process, so if you set the limit to 32 MB and run four processes on the machine, you'll use 128 MB in total.
Single process execution is easy -- just invoke nwchem with the name of the input file as an argument: "nwchem input.nw".
The "parallel" command is part of the TCGMSG (message passing) package, which NWChem uses to run jobs in parallel. The following description is largely cribbed from the TCGMSG README file (which can be found in the NWChem source tree). Explanations specific to NWChem follow it.
An auxiliary "process group" (aka PROCGRP) file controls the parallel execution. It is usually named with a ".p" suffix. The PROCGRP file can contain multiple lines, and comments are denoted by a "#" sign. Non-comment lines consist of the following fields, separated by white space:
e.g.userid The username on the machine that will be executing the process. hostname The hostname of the machine to execute this process. If it is the same machine on which parallel was invoked the name must match the value returned by the command hostname. If a remote machine it must allow remote execution from this machine (see man pages for rlogin, rsh). nslave The total number of copies of this process to be executing on the specified machine. Only 'clusters' of identical processes specified in this fashion can use shared memory to communicate. If no shared memory is supported on machine then only the value one (1) is valid (e.g. on the Cray). executable Full path name on the host of the image to execute. If is the local machine then a local path will suffice. workdir Full path name on the host of the directory to work in. Processes execute a chdir() to this directory before returning from pbegin(). If specified as a '.' then remote processes will use the login directory on that machine and local processes (relative to where parallel was invoked) will use the current directory of parallel.
harrison boys 3 /home/harrison/c/ipc/testf.x /tmp # my sun 4 harrison dirac 3 /home/harrison/c/ipc/testf.x /tmp # ron's sun4 harrison eyring 8 /usr5/harrison/c/ipc/testf.x /scratch # alliant fx/8
The above PROCGRP file would put processes 0-2 on boys (executing testf.x in /tmp), 3-5 on dirac (executing testf.x in /tmp) and 6-13 on eyring (executing testf.x in /scratch). Processes on each machine use shared memory to communicate with each other, sockets otherwise.
To run NWChem using the parallel command, the comand line is "parallel procgrp input.nw". The first argument of parallel is the name of the PROCGRP file. Parallel automatically adds a ".p" to the end, so in this case it would look for a file named "procgrp.p". A common convention is to name the PROCGRP file "nwchem.p", but remember the actual executable to be invoked is specified within the PROCGRP file, not on the parallel command line. Remaining arguments to parallel are passed to the program being invoked, so we give then NWChem input deck "input.nw".
Execution on remote workstations is initiated using rsh/rexec protocol. Users must have remote execution privileges enabled for parallel to work, this requires that the master workstation hostname appears in the slave's .rhosts files (see the man page, rsh(1)).
If you have a multiprocessor workstation, you can run multiple
processes using the shared memory regions. In this case, your PROCGRP
file would have a single (non-comment) line with
gg502 bohr 12 /disk1/gg502/hpcci/nwchem .which would run twelve processes of nwchem working in the current directory ("." in Unix shorthand).
In this case, your PROCGRP file will have multiple lines, generally one for each machine in your cluster. You must be able to rsh to each username/host you specify. Processes on different lines communicate via TCP/IP sockets. It is also possible to run multiple processes on individual nodes in a cluster. In this case, processes sharing a single host communicate internally by shared memmory and with processes on other machines via sockets.
The Global Array Toolkit sets up a process on each node in the
cluster to act as a data server to facilitate answering off-processor
requests for data. Consequently, the number of
gg502 bohr 8 /disk1/gg502/hpcci/nwchem /disk1/gg502/wrk gg502 coho 4 /scr/gg502/hpcci/nwchem /scr/gg502/wrkwould start 7 compute processes on bohr and 3 on coho, along with one data server on each for a total of twelve processes.
When running in parallel on workstations, NWChem uses shared memory and may use semaphores. If the run terminates abnormally (errors not trapped by the code itself, interrupted by the user, etc.) it may not release these resources back to the system. These are global resources, and it is possible for you and/or other users to exhaust them.
To see if you have any of these resources allocated, use the command "ipcs". You will see a table subdivieded into "Message Queues", "Shared Memory" and "Semaphores". The second column lists an id number which you can use to remove your claim on them with the "ipcrm" command. For example user gg502 would deallocate his resources described by the output of the ipcs command
IPC status from fermi as of Wed Aug 9 15:50:32 1995 T ID KEY MODE OWNER GROUP Message Queues: Shared Memory: m 600 0x00000000 --rw-rw-rw- d3g681 101 m 1302 0x00000000 --rw------- gg502 101 m 903 0x00000000 --rw------- gg502 101 m 1104 0x00000000 --rw------- gg502 101 m 306 0x00000000 --rw------- gg502 101 m 107 0x00000000 --rw------- gg502 101 m 9 0x00000000 --rw------- gg502 101 Semaphores: s 131 0x00000000 --ra------- gg502 101 s 92 0x00000000 --ra------- gg502 101 s 113 0x00000000 --ra------- gg502 101 s 35 0x00000000 --ra------- gg502 101by using the command
"ipcrm -m 1302 -m 903 -m 1104 -m 306 -m 107 -m 9 -s 131 -s 92 -s 113 -s 35"A script (ipcreset) to simplify this procedure is provided by TCGMSG (see its README file).
You will not be able to remove someoneelse's resources unless you have sufficient privilige (i.e. root access).
Note that running multiple processes on a single processor machine is only useful for debugging purposes. You'll get faster turn around with production jobs by running them as a single process if you have only one processor available.
Running multiple processes on a single machine via a multi-line
PROCGRP file (forcing them to communicate over sockets) is likewise
less efficient than using the shared memory facilities with a one-line
PROCGRP file specifying the desired number of
Heterogeneous clusters (where all nodes are not the same type of hardware) are not generally supported by the current release of the Global Array Toolkit, and consequently by NWChem. If all machines involved use the same representation for data (big vs little endian, IEEE or other floating point, etc.) it will probably work, but otherwise it will not.