Building Software on the ACCRE Cluster
Compute nodes on the ACCRE cluster are heterogeneous in terms of CPU architecture (and also RAM and local disk space). Some compute nodes contain processors that are 4-5 years old, while others use processors that are less than a year old. The same goes for cluster gateways.
This presents some challenges when it comes to building software, as newer processors can use instructions that are unsupported by older processors. As a result, programs that are built from source code on a newer gateway may not run successfully on a compute node or gateway with an older processor. “Illegal Instruction” error messages are likely to occur in this scenario. This is a typical error when running the local built R libraries.
In order to allow users to build software for the different CPU architectures on the cluster, we provide one node per architecture in the debug partition. You can request a specific architecture from SLURM by using the following directive:
#SBATCH --constraint=westmere|sandybridge|haswell|skylake
This line means only nodes with at least one of specified features will be used. The architectures shown above are listed in order from oldest to newest. Currently in ACCRE all of shared gateway machines (gw341 to gw346) are sandybridge, so if you build your application (for example, R libs) on the shared gateways, you may exclude the westmere nodes:
#SBATCH --constraint=sandybridge|haswell|skylake
In the above example the slurm jobs will pick up the nodes satisfying one of the features. Hence for an array job you may find different subjob may land on nodes with different CPU architecture. If you want to enforce all of subjob in the array job uses the same CPU architecture, you can add this line in the slurm script:
#SBATCH --constraint=[westmere|sandybridge|haswell|skylake]
This line means only one of the options should be used for all allocated nodes.
If you would like the ability to run your program on any compute node on the cluster, you will want to build on the oldest CPU microarchitecture available, which as on August 2018 is Westmere.
If performance is very important to you (especially through vectorization instructions like AVX2), you can try building on a more recent architecture (e.g. Haswell supports AVX2). Just be sure to then only request nodes with Haswell or Skylake processors for jobs making use of this program. Note that this may lead to longer queue times as you are effectively shrinking the pool of resources eligible to run your job.