NWChem - computational chemistry on parallel computers


Notes on running NWChem on the MHPCC IBM SP2

Much of the information below has been exracted from information available through the web at MHPCC or IBM SP man pages by Jeff Nichols (ja_nichols@pnl.gov).

For NWChem support mail nwchem-support@emsl.pnl.gov or visit the NWChem homepage.


Useful adresses at MHPCC

Note - Lon Waters is the designated support for chemistry and physics at MHPCC

General SP information

MHPCC uses LoadLeveler for scheduling batch use of the machine. You log on to one of the SP2 interactive nodes (tsunami.sp2.mhpcc.edu) and from there proceed to launch

  1. sequential interactive jobs just like you were logged on to an individual IBM RS6000 workstation,
  2. interactive parallel jobs sharing the interactive pool resources using procedures described below, or
  3. batch parallel jobs via LoadLeveler which will also be described below.

You need to know about just a few facts and commands to get going.

Each node of the SP is a Power2 cpu with varying amounts of physical memory and local scratch disk (named /localscratch) see the table below. The O/S and I/O buffers consume about 17 MB (estimate), and the NWChem executable is about 7 MB. MHPCC provides temporary disk space for all users in two locations:

/localscratch
Approximately 250 Megabytes on each node. This temporary file space provides the the best I/O performance because it is local to each node.
/scratch1, /scratch2, /scratch3, /scratch4
Approximately 2 Gigabytes on each partition, shared across all nodes. I/O is performed over the network -- not as efficient as /localscratch temp space but much larger.

You should note that MHPCC will regularly remove "old" files from the temporary directories. File removal is based upon the last time the file was used/accessed. The schedule for when files are removed is subject to change. Currently, the schedule is:

Thus, in order for useful files to be saved by default and to make sure that scratch files with high bandwidth requirements are in /localscratch, you should always

There are only a couple of commands which will give you per node activity information; "jmstat" and "jm_status -Pv". These commands tell you activity on the SP (by job and by node). Examples of information retrieved from these commands is given below.

fr2n07% jmstat
  Job started   Nodes    PID     Title    User
Mar_25_09:49:52    1    17407 LoadLeveler vnatoli
Mar_25_09:49:54    1    18609 LoadLeveler tang
Mar_25_09:50:16    1    17805 LoadLeveler apsteffe
Mar_25_09:50:20    1    21931 LoadLeveler jjyoh
Mar_25_09:50:27    1    22109 LoadLeveler apsteffe
Mar_25_09:50:27    1    22407 LoadLeveler swilke
Mar_25_09:50:54    3    23759 LoadLeveler kairys
Mar_25_09:51:25    1    20436 LoadLeveler vnatoli
Mar_25_09:53:35    1    22047 LoadLeveler apsteffe
Mar_25_09:55:53    1    19344 LoadLeveler petrisor
Mar_25_09:55:56    1    22998 LoadLeveler petrisor
Mar_25_10:15:47   16    22545 LoadLeveler daypn
Mar_25_10:15:47    8    14361 LoadLeveler ansaria
Mar_25_10:45:32    1    17596 LoadLeveler mgomez
Mar_25_10:45:32    1    19436 LoadLeveler sinkovit
Mar_25_10:45:39    1    21947 LoadLeveler keesh
Mar_25_10:46:04    1    15970 LoadLeveler mgomez
Mar_25_13:05:44   32    17034 LoadLeveler rlee
Mar_25_13:25:35    5    18803 LoadLeveler hyun
Mar_25_14:24:19    8    16199 LoadLeveler zhong
Mar_25_14:53:32   64    17314 LoadLeveler calhoun
Mar_25_15:03:04    1    19428 LoadLeveler gardnerk
Mar_25_15:14:55    8    15820 LoadLeveler mws
Mar_25_15:15:59    8    16138 LoadLeveler ansaria
Mar_25_15:25:13   16    21248 LoadLeveler bogusz
Mar_25_15:33:43    1    20652 LoadLeveler gardnerk
Mar_25_15:35:45    3    19149 LoadLeveler kairys

fr2n07% jm_status -Pv
Pool 0:    Free_for_all_pool
  Subpool: GENERAL
    Node:  fr1n05.mhpcc.edu
    Node:  fr1n06.mhpcc.edu
    Node:  fr1n07.mhpcc.edu
    Node:  fr1n08.mhpcc.edu
    Node:  fr1n09.mhpcc.edu
    Node:  fr1n10.mhpcc.edu
    Node:  fr1n11.mhpcc.edu
    Node:  fr1n12.mhpcc.edu
    Node:  fr1n13.mhpcc.edu
    Node:  fr1n14.mhpcc.edu
    Node:  fr1n15.mhpcc.edu
    Node:  fr1n16.mhpcc.edu
    Node:  fr2install1.mhpcc.edu
    Node:  fr2n04.mhpcc.edu
    Node:  fr2n05.mhpcc.edu
    Node:  fr2n06.mhpcc.edu
    Node:  fr2n07.mhpcc.edu
    Node:  fr2n08.mhpcc.edu
    Node:  fr2n09.mhpcc.edu
    Node:  fr2n10.mhpcc.edu
    Node:  fr2n11.mhpcc.edu
    Node:  fr2n12.mhpcc.edu
    Node:  fr2n13.mhpcc.edu
    Node:  fr2n14.mhpcc.edu
    Node:  fr2n15.mhpcc.edu
    Node:  fr2n16.mhpcc.edu
Pool 1:    LoadLeveler
  Subpool: BATCH
    Node:  fr3install1.mhpcc.edu
      Job 108: time_allocated=Mon_Mar_25_10:45:32_1996
        description=LoadLeveler
        requestor=sinkovit requestor_pid=19436
        requestor_node=fr3n01.mhpcc.edu
        Adapter type=ETHERNET
        Usage: cpu=SHARED adapter=SHARED
        virtual task ids: 0 
    Node:  fr3n02.mhpcc.edu
...

MHPCC SP2 configuration

The configuration of the MHPCC SP2 as of 22 March 1996

----------------------------------------------------------------------------
                               LOCAL-
                    NODE  MEM SCRATCH MIN   MAX  TIME             
 CLASS/USE   #NODES TYPE  MB     GB   PROC  PROC LIMIT  FRAMES
----------------------------------------------------------------------------
 bigmem         5   wide  1024   2.0    1     5   8 hr  28 (n07-n15)
----------------------------------------------------------------------------
 large        128   thin   128   1.0   64   128   8 hr  5,6,19,20,23,24,25,26
                1   thin   128   1.0   64   128   8 hr  18
----------------------------------------------------------------------------
 medium        32   wide   256   2.0    8    64   4 hr  16,21 22,27
               48   thin    64   .25    8    64   4 hr  10,11,12
----------------------------------------------------------------------------
 long           8   wide   256   2.0    1    32  24 hr  15
               15   thin   128   1.0    1    32  24 hr  18
               16   thin    64   .25    1    32  24 hr  9
                8   thin    64   .25    1    32  24 hr  3   (n01-n08)
----------------------------------------------------------------------------
 small_long    16   thin   128   1.0    1     8   8 hr  17
----------------------------------------------------------------------------
 small_short   16   thin    64   .25    1     8   2 hr  4
                8   wide   256   1.0    1     8   2 hr  7
                8   thin    64   .25    1     8   2 hr  3   (n09-n16)
----------------------------------------------------------------------------
 Interactive   27   thin    64   .25   n/a   n/a  n/a   1,2 
 Only        
----------------------------------------------------------------------------
 Staff         16   thin    64   .25   n/a   n/a  n/a   29
 (reserved)
----------------------------------------------------------------------------
 Training      16   thin    64   .25   n/a   n/a  n/a   30
 (reserved)
----------------------------------------------------------------------------

Reserved Nodes: 
  fr1n01,n02,n03,n04
  fr2n02
  fr8n01,n03,n05,n07,n09,n11,n13,n15
  fr28n01, fr28n03, fr28n05
  fr29n01 - n16
  fr30n01 - n16
  fr13n01,n03,n05,n07,n09,n11,n13,n15
  fr14n01,n03,n05,n07,n09,n11,n13,n15

Node Sharing Among Classes:
  fr17n01-fr17n16 are shared between small_long (primary) and small_short
  fr28n07-fr28n15 are shared between bigmem (primary) and medium

Running interactive parallel jobs

Interactive parallel jobs are executed using IBM's Parallel Operating Environment (poe). This is IBM's environment for developing and running distributed memory, parallel Fortran, C or C++ programs.

Executing Parallel Programs Using POE

In order to execute a parallel program, you need to:

  1. Set your path to include the necessary POE executables.
  2. Create a .rhosts file
  3. Compile and link the program using one of the POE compile scripts.
  4. Set up your execution environment by setting the necessary POE environment variables.
  5. Create a host list file (optional)
  6. Invoke the executable.

1. Setting Your Path

The POE executables are usually located in the directory /usr/lpp/poe/bin. There may be symbolic links pointing to them from /usr/bin or some other location. This may vary from system to system.

To determine if the POE executables are in your path, use a command such as which poe or whence. If the poe executable can not be found, then you will need to include the directory /usr/lpp/poe/bin in your path. For example:

   set path = ($path /usr/lpp/poe/bin)

This can be done by typing at the Unix prompt or adding it to one of your startup files (.cshrc, .profile or .login):

If the POE executables are found with the which or whence command, you need to do nothing to your path.

2. Creating a .rhosts File

Copy the .rhosts file supplied in this directory to your home directory.

3. Compiling and Linking a Parallel Program

POE includes three compile scripts which will automatically link in the necessary POE libraries and then call the native IBM Fortran, C or C++ compiler (xlf, cc, CC); mpxlf for Fortran, mpcc for C, and mpCC for C++. All three compiler scripts can take -ip or -us as options. The -ip flag causes the IP CSS to be statically bound with the executable. Communication during execution will use the Internet Protocol. The -us flag causes the US CSS library to be statically bound with the executable. This library uses the User Space protocol for dedicated use of the high-performance switch adapter. If neither flag is set, then no CSS library will be linked at compile time. Instead it will be linked dynamically with the executable at run time. The library which will be linked is determined by the MP_EUILIB environment variable.

Options include all valid options available with the native compiler (xlf, cc, CC). There are numerous compile options available with the IBM Fortran, C and C++ compilers, many of which can dramatically improve performance. Users are advised to consult the IBM documentation (e.g., man pages) for details.

4. Setting Environment Variables

There are many environment variables and command line flags that you can set to influence the operation on PE tools and the execution of parallel programs. A complete discussion and list of the PE environment variables can be found in the IBM AIX Parallel Environment Operation and Use manual. They can also be reviewed (in less detail) in the POE man page.

Environment variables may be set on the shell command line or placed within your shell's "dot" files (.cshrc, .profile). Alternately, they may be put into a file which is "sourced" prior to execution. The relevant variables are found in the accompanying .cshrc (which can be included in your own).

PE environment variables can be overridden by supplying the appropriate flag when the program executable is invoked. See the POE man page for details.

5. The Host List File

The host list file is not required if you let the Resource Manager allocate which nodes your job uses (MP_HOSTFILE set to NULL or ""). This is preferred. The sophisticated user can see the appropriate documentation to do otherwise.

6. Invoking the Executable

Once the environment is setup and the executables are created, invoking the executables is relatively easy.

For single program multiple data (SPMD) programs, simply issue the name of the executable, specifying any command line flags that may be required. Command line flags may be used to temporarily override any MP environment variables that have been set. See the POE man page for a complete listing of flags.

Interactive jobs are straightforward if you utilize NFS input, output, and data files. If you wish to use local file systems (which is more efficient) it gets a bit more complicated. Perl scripts have been written by MHPCC staff (Lon Waters) which make the interactive job scripts resemble LoadLeveler (batch queueing) scripts. These will be discussed in the context of running NWChem after the LoadLeveler discussion below.

CPU and Communications Adapter Usage

POE jobs typically require both CPU and communications adapter resources. The manner in which a job uses these two resources effects both job performance and whether or not other users can run jobs on the same node.

CPU Usage: may be either "unique" or "multiple"

Communications Adapter Usage: may be either "shared" or "dedicated"

Best performance is usually obtained running with US communications when CPU use is "unique" and the adapter is "dedicated".

The best "good neighbor" policy is for all POE jobs to run with IP communications using the defaults of CPU use "multiple" and adapter "shared".

For the most part, users do not need to change the default settings for CPU usage and communications adapter usage. One instance where the default might be changed concerns the use of US communications in an interactive pool of nodes shared by many users. In this case, it might be considered "good neighbor policy" to set MP_CPU_USE to multiple so that at least other IP jobs can run also.

Running batch parallel jobs via LoadLeveler

LoadLeveler is a batch job scheduling application and program product of IBM. It provides the facility for building, submitting and processing batch jobs within a network of machines. It attempts to match job requirements with the best available machine resources. It can schedule serial or parallel (PVMe, PVM, MPL, MPI) jobs. It provides a graphical user interface called xloadl for job submission and monitoring.

The entire collection of machines available for LoadLeveler scheduling is called a "pool". Every machine in the pool has one or more LoadLeveler daemons running on it. There is one Central Manager machine for the LoadLeveler pool whose principal function is to coordinate LoadLeveler related activities on all machines in the pool; maintaining status information on all machines and jobs, making decisions on where jobs should be run, etc. Other machines in the pool may be used to: submit jobs, execute jobs, or schedule submitted jobs (in cooperation with Central Manager)

Every LoadLeveler job must be submitted through a job command file. LoadLeveler will not directly accept executable (a.out) files. Only after defining a job command file, may a user submit the job for scheduling and execution. LoadLeveler scripts resemble shell scripts in appearance. It contains LoadLeveler statement lines with LoadLeveler keywords that describe the job to run, comment lines (not executed), and may contain csh, ksh, or sh lines if desirable. LoadLeveler keywords specify job information such as: executable name, class (queue), resource requirements, input/output files, number of processors required, job type (serial, parallel, pvm3), etc.

With the LoadLeveler GUI (xloadl), jobs can be submitted by using the "File" menu on any of the 3 xloadl windows.

In addition to submitting the job, there are LoadLeveler commands available to monitor and change characteristics of the job. For example,

llstatus
Shows you information about the machines in the LoadLeveler pool.
llsubmit
Submits a LoadLeveler job command file for scheduling and execution.
llq
Gets information about jobs.
llcancel
Cancels a job.
llhold
Holds/releases a job.
llprio
Changes the priority of a job.
xloadl
Invokes LoadLeveler's graphical user interface.

Running NWChem

In the directory /u/nichols/nwchem/contrib/ibm_sp@mhpcc are several files

ibm_sp@mhpcc.html
this file.
LLnwchem
a LoadLeveler script for running NWChem in batch.
intnwchem
a script for running nwchem interactively (under development).
poesubmit
a perl script for submitting LoadLeveler jobs interactively (under development).
examples
a directory with example inputs and outputs for NWChem.
.rhosts
a file containing the 400 IP names associated with the MHPCC.
.cshrc
a file containing the appropriate environment variable defaults.

Append or copy the dot files to the dot files (.rhosts and .cshrc) in your login directory. Copy the scripts for running nwchem (LLnwchem and intnwchem) into your login directory.

Copy one of the example input files into your login directory (e.g., /u/nichols/nwchem/contrib/ibm_sp@mhpcc/examples/scf_h2o.nw - input for a conventional SCF calculation on water).

Running NWChem in batch

Modify LLnwchem as appropriate (e.g., change "nichols" to your user id, etc. Submit the job to the SP using LoadLeveler:
    "llsubmit LLnwchem"

Running NWChem interactively

1. Using NFS file systems

  /u/nichols/nwchem/bin/SP1/nwchem scf_h2o.nw >& scf_h2o.out -rmpool 0 -procs 4

2. Using local file systems

This is under development and currently not supported (4/10/96) Modify intnwchem as appropriate (e.g., change "nichols" to your user id, etc. Launch the job using the perl script poesubmit.

  "poesubmit intnwchem"

Troubleshooting

This section under construction.

NWChem support

Report NWChem problems, suggestions, feedback, etc., to nwchem-support@emsl.pnl.gov or using the WWW support form.

There is a mailing list for NWChem users that can be used for announcements and discussion amoung users. To subscribe send email to majordomo@emsl.pnl.gov with the body "subscribe nwchem-users". You can do this with the following command

     echo subscribe nwchem-users | mail majordomo@emsl.pnl.gov

To post a message to the mailing list send it to nwchem-users@emsl.pnl.gov.

Acknowledgements

Lots of help from MHPCC support, specifically Lon Waters.
Prepared by JA Nichols: Email: nwchem-support@emsl.pnl.gov.