TeraGrid Home
About TeraGrid TeraGrid News TeraGrid Links User Info
TeraGrid Home

UC/ANL TeraGrid Guide

The UC/ANL TeraGrid cluster includes 62 IBM Itaninum 2 and 96 IBM Xeon Visualization systems. Each node is built with the SuSE Linux Enterprise Server "SLES8" and interconnected using Myricom's Myrinet network. Jobs are scheduled and run using a combination of the Moab scheduler and the Torque (PBS) batch system.

Guide Topics:

  1. Help and Information
  2. Hardware
  3. Software
  4. System Access
  5. File Storage and Transfer
  6. File System Usage
  7. Porting MPI Programs
  8. Compiling: Numerical Libraries
  9. Running: Batch Jobs
  10. Debugging Programs
  11. Optimizing Programs
  12. References

Help and Information

Help from TeraGrid Consultants:

Please send email to help@teragrid.org, or call 866-336-2357 (toll free), to report problems or ask questions related to any TeraGrid systems or services.  Please see the TeraGrid Help Desk Web page for more information.

User News and Information:

Information about system downtimes, maintenance periods, upgrades, etc., is available at http://news.teragrid.org. Updates can be received via email or via the website. Users can manage their news subscriptions to receive information only on the platforms of interest.

Hardware

Compute Resource Hardware specification:

Number of Compute Nodes 62 (2 processors per node)
CPU Itanium II - Madisons (1.3 and 1.5 GHz)
Memory 4 GB physical per node
Network Communication: Myrinet
Management: Gigabit and Fast ethernet
Disk Space ~60 GB local scratch
Operating System Linux 2.4.21 (SuSE ia64 v8.0)

Vizualization Resource Hardware specification:

Number of Visualization Nodes 96 (2 processors per node)
CPU Intel Xeon (2.40GHz GHz/512KB cache)
Memory 4 GB physical node
Network Communication: Myrinet
Management: Gigabit and Fast ethernet
Disk Space ~130 GB local scratch
Operating System Linux 2.4.21 (SuSE x86 v8.0)

Software

The current software stack as of 3/1/2004 on the UC/ANL TeraGrid cluster includes:

  • SuSE Linux Enterprise Server V8 "SLES8" gold
  • Intel C++ 7.1 & 8.0
  • Intel Fortran 7.1 & 8.0
  • Intel MKL 5.2
  • Globus 2.4.3 plus patches
  • Condor-G
  • Mpich-gm 1.2.5..10
  • Torque 1.1.0
  • Moab 4.0.4
  • OpenSSH/SSL 3.1
  • Python w/XML and pyXML

The available grid services (inside Globus) are:

  • Globus 2.4.3 plus patches
  • gsi-openssh 1.9
  • gsi-ncftp 3.0.3
  • Condor-G (NMI 2.1 binary bundle) + gahp_server 6.4.7 rebuild
  • Myproxy 1.4

SoftEnv (http://www.teragrid.org/docs/softenv/), a system designed to make it easier for users to define what applications they want to use, is installed in /usr/local/apps/softenv-1.4.2.

System Access

To login to the TeraGrid system, use the hostname tg-login.uc.teragrid.org. You can connect using ssh as follows:

ssh username@tg-login.uc.teragrid.org
or
ssh -l username tg-login.uc.teragrid.org

More information about Secure Shell (SSH) and NPACI security policy may be found at the NPACI Security site.

Computational grid users may also use X.509 certificates for authentication.

File Storage and Transfer

Each user has several areas of disk space for storing files for immediate use on UC/ANL TeraGrid cluster. These areas may have size or time limits for how long disk files may stay resident.

Filesystem
Characteristics
/home/< your_username >
($TG_CLUSTER_HOME)
limited diskspace per user; regular backup
/scratch/local/< your_username>
($TG_NODE_SCRATCH)
local node scratch space shared among all users, no backups, 7 days purge policy
/scratch/pvfs/< your_username>
($TG_CLUSTER_SCRATCH)
global cluster scratch space shared among all users, no backups, 7 days purge policy
/disks/scratchgpfs1/< your_username>
($TG_CLUSTER_PFS)
global cluster high-performance file space shared among all users, no backups, 7 days purge policy
/scratch/pvfs/< your_username>
($TG_CLUSTER_PVFS)
global PVFS cluster scratch space shared among all users, no backups, 7 days purge policy
/disks/scratchgpfs1/&lt your_username&gt ($TG_CLUSTER_GPFS) global GPFS cluster scratch space shared among all users, no backups, 7 day purge policy

There are several ways to transfer files to the TeraGrid. From Unix systems, secure copy (scp) is recommended. The following is an example of an scp from a local machine to UC/ANL TeraGrid cluster (the user command is in red text, with italicized variables):

% scp original_file username@tg-login.uc.teragrid.org:/to_dir/copied_file

To use secure copy from Windows platforms, download a copy of WinSCP (freeware). Other software packages for file transfer from Windows platforms are listed at the SDSC Security site.

TeraGrid users may also move entire directory structures from one system to another via the UC/ANL archival storage system. The following example illustrates a directory move from a local machine to UC/ANL TeraGrid cluster via HPSS (user commands are in red text, with italicized variables):

  1. create a copy of the local directory with tar (the time to do this depends upon the sizes and number of files, etc.):
    % tar -cf name-of-your-tar-file .

  2. compress tar file with gzip (this step may not be necessary if your tar file is small):
    % gzip name-of-your-tar-file

    This creates a compressed tar file with the name name-of-your-tar-file.gz

  3. access HPSS from local machine with HSI (client binaries are available for download at the HSI site):
    % hsi

  4. store the compressed file in HPSS:
    % put name-of-your-tar-file.gz

  5. login to SDSC's TeraGrid cluster and access HPSS with HSI:
    % ssh tg-login.sdsc.teragrid.org
    % hsi

  6. download compressed tar file from HPSS:
    % get name-of-your-tar-file.gz

  7. uncompress tar file:
    % gunzip name-of-your-tar-file.gz

  8. move tar file to the desired location on SDSC's TeraGrid cluster and untar:
    % tar -xf name-of-your-tar-file

More detailed information on HPSS commands can be found at: http://www.npaci.edu/HPSS.

The Storage Resource Broker, a data management tool, may also be used to store large TeraGrid data sets across distributed, heterogeneous storage systems. More information is available in the "TeraGrid Archival and Data Services" document.

File System Usage

PVFS is designed for high-performing parallel I/O and not as a general purpose file-system. Please do not use it to untar application source into, build or compile applications in, execute applications from, or store and process small files. The environment variable $TG_CLUSTER_SCRATCH and $TG_CLUSTER_PVFS both point to PVFS.

GPFS is designed for high-performance parallel I/O and also behaves as a general purpose file-system. Our GPFS scratch space may be used for the temporary software and data storage. The environment variables $TG_CLUSTER_PFS and $TG_CLUSTER_GPFS both point to GPFS scratch.

If you need a general purpose file-system please use your home directory ($TG_CLUSTER_HOME), node local scratch ($TG_NODE_SCRATCH), or cluster wide GPFS scratch ($TG_CLUSTER_GPFS).

Porting MPI Programs

If you have an existing MPI-based parallel application program already running on a distributed-memory platform:

  • Copy your application file(s) to the SDSC TeraGrid local disk space - either to your $HOME directory or to the $WORK area associated with your user account - /work/username, where "username" is your NPACI user name. If this directory does not exist, you will have to create it yourself. Please note that the /work area is purged regularly and is not backed up. Files for long-term storage should be stored either in the user's $HOME directory or on HPSS.

  • Source code should be recompiled for the TeraGrid system with the following exemplary compiler commands:
      mpicc [options] file.c	(C and C++)
      mpif90 [options] file.f	(fixed form Fortran source code)
    

The following compilers are available on SDSC's TeraGrid cluster:

Compiler
Commands
Description
Intel ecc (C, C++), efc (Fortran 77/90) Default compiler on SDSC's TeraGrid system. ecc (or efc) -help for usage information
Gnu gcc (C), g++ (C++), g77(Fortran 77) Code's performance will deteriorate compared to the Intel compilers.
mpich-gm mpicc (MPI C, C++), mpif90 (MPI Fortran 77/90) Default MPI compiler. Uses Myrinet network for communication
mpich-tcp mpicc (MPI C, C++), mpif90 (MPI Fortran 77/90) Installed in /usr/local/apps/mpich-tcp directory. Suggested to use when debugging of network part is needed.

Compiling: Numerical Libraries

Intel has developed Math Kernel Library which contains most of the lapack and fft routines. Users are encouraged to use these routines where applicable instead of their own because they generally produce faster programs and have been tested for accuracy and correctness. The following Math libraries are available on SDSC's TeraGrid cluster:

Intel Math Kernel Library This library contains Blas, Lapack and 1,2D FFT routines. It is installed in /usr/local/apps/intel/mkl/lib/64 directory. The library can be linked by "-L/usr/local/apps/intel/mkl/lib/64 -lmkl_lapack -lmkl_ipf -lguide -lpthread" for any fortran program.
Netlib LAPACK This library is built with the Intel compilers. The library can be linked by "-L/users/dchoi/LAPACK -lblas -llapack.

HDF Software*
Version
Compiler - C
Compiler - Fortran
Software key
HDF4 4.2r0 icc ifort +hdf4
HDF5 1.6.2 icc ifort +hdf5
gcc ifort +phf5
PHDF5 1.6.2 icc ifort +phdf5

*Remarks:
  1. No parallel file system.
  2. HDF5 requires a Fortran 90 compiler and has not been tested yet.

Running: Batch Jobs - PBS

There is a small pool of development nodes available 9 AM to 5 PM Monday thru Friday. To use these nodes PBS jobs just be 2 hours or shorter and require at most 8 nodes. PBS is a utility supporting batch processing which is scheduled by Moab scheduler to help maximize processing throughput.

All submitted jobs must specify a project using the the '-A <projectid>' command line argument, or via the '#PBS -A <projectid>' directive in the PBS script. To view your list of projects use the 'tgprojects' command.

Some PBS commands and their functions are as follows:

Function
PBS example
submit a batch job to a queue qsub [list of qsub options] script_name
"man qsub" for more options
create your own interactive nodes "qsub -I -A <your_project> -V -l walltime=00:30:00,nodes=2:ia64-compute:ppn=2" will put you on one of the ia64 compute nodes. Upon your exit of the node, or the wall time limit of 30 minutes in this example, the interactive nodes will expire.
create your own interactive nodes "qsub -I -A <your_project> -V -l walltime=00:30:00,nodes=2:ia32-compute:ppn=2" will put you on one of the ia32 compute nodes. Upon your exit of the node, or the wall time limit of 30 minutes in this example, the interactive nodes will expire.
display the status of PBS batch jobs qstat -a
"man qstat" for more options
delete (cancel) a queued job qdel PBS_JOBID
show all running jobs on system qstat -r
show detailed information of the specified job qstat -f PBS_JOBID
show all queues on system qstat -q
show queue limits for all queues qstat -Q
show quick information of the server qstat -B
shows node status pbsnodes -a
suppress pre and post job output touch $HOME/.pbsquiet

The following is an example of a PBS batch script (the script is the top set of ten lines, and is explained in the bottom set of ten lines):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/bin/csh
#PBS -q dque
#PBS -N my_job
#PBS -l nodes=10:ia64-compute:ppn=2
#PBS -l walltime=0:50:00
#PBS -o file.out
#PBS -e file.err
#PBS -V
cd /work/username
mpirun -v -machinefile $PBS_NODEFILE -np 20 ./a.out
 1
 2
 3
 4
 5
 6
 7
 8
 9
10

use queue called "dque" current job name is "my_job" request 10 nodes and 2 processors per node reserve the requested nodes for 50 minutes standard output to a file called "file.out" standard error to a file called "file.err" export all my environment variables to the job change to my working directory run my parallel job

Batch Queues

Currently only one queue "dque" is available for all jobs.

Debugging Programs

TotalView will be available for serial and parallel code debugging on this system. Use gnu gdb or ddd in the interim.

To compile your program using the TotalView debugger (when it becomes available), use the -g compile line option. For example:

    mpcc -g do_mpi.c -o do_mpi
    

Documentation for Totalview is available at http://www.etnus.com/Products/TotalView/index.html.

Optimizing Programs

Timing

There are gettimeofday (c), getrusage (c), cputime (fortran), times (os) which are routines and commands that you can measure the run time of whole or segments of the code.

Profiling

gnu gprof is available for a quick information on functions and routines. The procedure is compile with "-qp -g" for the Intel compilers, then execute the binary to generate a profile output "gmon.out". Run gprof for the binary and profile output to view the output.

Intel compiler optimization and other flags:

Option
Description
-O0 Disables all optimization
-O1 reordering of functions, inlining with _inline keyword
-O2 -O1 + inline expansion, constant propagation, predication, speculation, software pipelining
-O3 -O2 + loop unrolling, memory hierarchy( data prefetching, loop and register blocking, linear loop transformations, scalar replacements)
-Op improves the predictability of applications by enabling only those optimizations that preserve numerical accuracy and floating-point state of the machine
-Oi enables inline expansion of intrinsic or standard library functions
-Ob0 disables inline expansion
-Ow tells the compiler that no aliasing occurs within function bodies but might occur across function calls
-mp Maintain floating point precision (disables some optimizations)

 

References

 



TeraGrid