Building Software on the ACCRE Cluster
CPU Architecture Optimization
Compute nodes on the ACCRE cluster are heterogeneous in terms of CPU architecture (and also RAM and local disk space). Some compute nodes contain processors that are 4-5 years old, while others use processors that are less than a year old. The same goes for cluster gateways.
This presents some challenges when it comes to building software, as newer processors can use instructions that are unsupported by older processors. As a result, programs that are built from source code on a newer gateway may not run successfully on a compute node or gateway with an older processor. “Illegal Instruction” error messages are likely to occur in this scenario.
In order to allow users to build software for the different CPU architectures on the cluster, we provide one node per architecture in the debug partition. You can request a specific architecture from SLURM by using the following directive:
The architectures shown above are listed in order from oldest to newest. If you would like to build software interactively on the cluster, you can use an
salloc command like the following to control the CPU architecture level instructions of your program:
salloc --partition=debug --constraint=westmere --ntasks=1 --mem=2G --time=30:00
If you know the exact commands needed to build the software, you can also submit a batch job with the
If you would like the ability to run your program on any compute node on the cluster, you will want to build on the oldest CPU microarchitecture available, which as on August 2018 is Westmere.
If performance is very important to you (especially through vectorization instructions like AVX2), you can try building on a more recent architecture (e.g. Haswell supports AVX2). Just be sure to then only request nodes with Haswell or Skylake processors for jobs making use of this program. Note that this may lead to longer queue times as you are effectively shrinking the pool of resources eligible to run your job.
Another important note is that all ACCRE-managed software made available through Lmod is optimized for all CPU architectures. Specifically, for each Lmod package, we build a version of the package for each CPU architecture, and then choose the appropriate version at runtime (based on what node your job runs on). If you are interested in designing a similar setup for your own application, feel free to open a helpdesk ticket with us.
Transition to CentOS 7
If you have built your own software to run on the cluster, you will likely need to re-build it when transitioning to the CentOS 7 environment. ACCRE staff are available to assist if needed.
Here are some potentially useful tips to facilitate this process. These also apply to modules in Python, Perl, R, and Ruby:
- Be aware of what machine you are logged into when re-building your application (see the “CPU Architecture Optimization” section above for more details).
- Create a new installation directory for your CentOS 7 version of the application, and don’t delete your old installation directory until you have verified that everything works as expected in CentOS 7. This will enable you to continue running production jobs in CentOS 6 while building your updated software in CentOS 7.