Skip to main content

ACCRE CentOS 7 Environment and Changelog

Posted by on Friday, May 25, 2018 in Cluster Status Notice.

In Summer 2018, ACCRE will be transitioning the cluster operating system from CentOS 6 to CentOS 7. During this process, two parallel environments will be available: one running CentOS 6 and the other running CentOS 7. Compute resources will gradually (over the course of several months) be transitioned from the old environment to the new one as ACCRE staff work with groups to transition them over.

If users/groups rely on packages (e.g. R and Python packages) that were built and installed into their own private ACCRE directories, these will likely need to be rebuilt for the new environment. ACCRE staff are available to assist. Alternatively, using Singularity containers will eliminate the need to rebuild packages.

To access the CentOS 7 environment, type the following command:

ssh <vunetid>@login7.accre.vanderbilt.edu

Your ACCRE credentials will work in both environments. Note, however, that there is a new process for password changes (see below), and if you change your password in the new environment it will not be updated in the old environment.

Changelog

We have used this opportunity to incorporate numerous improvements to the ACCRE environment, which are detailed below. These changes are not tied to the operating system upgrade, per se, but were less disruptive to incorporate into the new environment.

  • Move SLURM state files to dedicated solid state drives for improved scheduler responsiveness1
  • Upgrade SLURM to latest stable release
  • Remove setpkgs/pkginfo – all ACCRE-managed packages now available exclusively through Lmod
  • Automatically load optimized version of package via Lmod, leading to better performance, especially for vectorized applications2
  • Transition authentication/authorization system to internal LDAP system; password changes may be performed from anywhere on the cluster with the “accre_password” command (users no longer need to login to the auth server to update their passwords)
  • Incorporate memory (i.e. RAM) limitations for users via control groups on shared gateway machines3
  • Make compute node accessible via ssh if user has job running on node
  • Expand debug queue (one node for each CPU architecture)
  • Remove rsh
  • Enhance SSH security

1One recurring problem on the existing cluster is slow response times (or eventually socket timeout errors) from SLURM commands like sbatch and squeue. We have moved the storage location of SLURM job state files from GPFS to dedicated solid state drives, which offer superior I/O performance. We expect this change to improve SLURM responsiveness considerably.

2Previously Lmod packages were built for old CPU architectures to ensure compatibility with the multiple generations of CPU architectures available on the cluster. Lmod is now CPU architecture aware and will load the version of a package that was built with optimized instructions for that architecture.

3Occasionally users will run gateways out of RAM, which impacts other users with sessions on that gateway and users attempting to login to that gateway.

CentOS images from here

Leave a Response

You must be logged in to post a comment