Scheduled downtime is now complete
Update, 8/27/2020: We have successfully completed all the In Scope upgrades, maintenance items, and testing for this downtime ahead of schedule. Therefore we will restore access to the cluster and helpdesk tonight at 8pm. Details on how to use new functionality provided by SLURM v20 will be posted once we have reviewed our test results.
We thank you for your continued use and contributions that have fueled ACCRE’s organic growth for over 10 years.
Update, 8/24/2020: This is a reminder that our second and last scheduled downtime for the year starts tonight at midnight. The cluster will be offline for three days and ACCRE accounts will be disabled. The following services will remain online during that period:
- ACCRE networked storage (aka DORS fileshares)
- SciLo archive storage
ACCRE networked storage are the NFS and SMB shares migrated from the DORS storage that users can mount to their desktops, not the ACCRE cluster storage which is only available to cluster resources. Configuration changes will be performed as part of our effort to address the ongoing networked storage sluggishness.
We will upgrade the Slurm scheduler from version 18.08 to 20.02. Version 18.08 was released two years ago and the support for this version will be terminated soon, so the upgrade is a necessary step for ACCRE. This will not require any changes to existing Slurm scripts. The 20.02 version contains several bugs fixes and also includes some new features:
- full support for the GPU cards
- centralized configuration management for the Slurm client
Update, 8/24/2020: During the downtime we will upgrade Slurm from version 18.08 to 20.02. The version of 18.08 was released two years ago and the support for this version will be terminated soon, so the upgrade is a necessary step for ACCRE. The 20.02 version is the current active version of Slurm, it contains several patches of the bugs for its previous versions (including the patch for the bug that caused the ACCRE cluster to collapse in this January). The 20.02 version also includes a couple of new features such as providing full support for the GPU card, the “configure-less” setting for Slurm so that the configuration file is not necessarily stored on the compute node anymore.
Update, 8/18/2020: This is a reminder that next week we will be having our second and last scheduled downtime for the year. The cluster will be offline for three days starting on the 25th and ACCRE accounts will be disabled. The following services will remain online during that period:
- networked ACCRE storage (i.e. legacy DORS fileshares)
- SciLo archive storage
We will be upgrading the SLURM scheduler so please check our website next week for details about the new version.
The next scheduled maintenance window is currently set for August 25-27th. Items within the scope of work:
- SLURM upgrade
- network maintenance and saturation test
- GPFS monitoring
- hypervisor upgrade for infrastructure services
- power distribution upgrade
This will not result in a shutdown of all ACCRE services. Users who mount ACCRE storage (i.e. legacy DORS customers) will continue to have access to their files during this period.
Also, the tentative dates for the next maintenance window are January 12-14th, 2021.