ACCRE CentOS 7 Environment and Changelog
In Summer 2018, ACCRE will be transitioning the cluster operating system from CentOS 6 to CentOS 7. During this process, two parallel environments will be available: one running CentOS 6 and the other running CentOS 7. Compute resources will gradually (over the course of several months) be transitioned from the old environment to the new one as ACCRE staff work with groups to transition them over.
If users/groups rely on packages (e.g. R and Python packages) that were built and installed into their own private ACCRE directories, these will likely need to be rebuilt for the new environment. ACCRE staff are available to assist. Alternatively, using Singularity containers will eliminate the need to rebuild packages.
To access the CentOS 7 environment, type the following command:
Your ACCRE credentials will work in both environments. Note, however, that there is a new process for password changes (see below), and if you change your password in the new environment it will not be updated in the old environment.
We have used this opportunity to incorporate numerous improvements to the ACCRE environment, which are detailed below. These changes are not tied to the operating system upgrade, per se, but were less disruptive to incorporate into the new environment.
- Move SLURM state files to dedicated solid state drives for improved scheduler responsiveness1
- Upgrade SLURM to latest stable release
pkginfo– all ACCRE-managed packages now available exclusively through Lmod
- Automatically load optimized version of package via Lmod, leading to better performance, especially for vectorized applications2
- Transition authentication/authorization system to internal LDAP system; password changes may be performed from anywhere on the cluster with the “accre_password” command (users no longer need to login to the auth server to update their passwords)
- Incorporate memory (i.e. RAM) limitations for users via control groups on shared gateway machines3
- Make compute node accessible via ssh if user has job running on node
- Expand debug queue (one node for each CPU architecture)
- Enhance SSH security
1One recurring problem on the existing cluster is slow response times (or eventually socket timeout errors) from SLURM commands like sbatch and squeue. We have moved the storage location of SLURM job state files from GPFS to dedicated solid state drives, which offer superior I/O performance. We expect this change to improve SLURM responsiveness considerably.
2Previously Lmod packages were built for old CPU architectures to ensure compatibility with the multiple generations of CPU architectures available on the cluster. Lmod is now CPU architecture aware and will load the version of a package that was built with optimized instructions for that architecture.
3Occasionally users will run gateways out of RAM, which impacts other users with sessions on that gateway and users attempting to login to that gateway.
CentOS images from here