Website
I/O delays affecting jobs resolved; Visualization Portal back online
Dec. 10, 2021—Update, 12/13/2021: These issues have been resolved. Please open up a helpdesk ticket if you run into any issues. Update, 12/10/2021: The Visualization Portal is being taken offline for emergency preventive security updates. You can check the status of the service here. We have received reports of compute jobs failing due to I/O delays during...
New software stack “2020b” is available on the cluster
Sep. 27, 2021—The new software stack 2020b is available on the ACCRE cluster. It includes the following features: GCC version 10.2 New Intel compiler and MKL libraries etc. released in 2020 R 4.0.5 Python 3.8.6 and Python 2.7.18 SciPy-bundle/2020.11 (which includes the newer version of numpy) etc. The 2020b software stack is built based on the GCC...
ACCRE Downtime scheduled for September 17-18 has been completed
Aug. 27, 2021—Update, 9/18: We have finished all the work within scope of this scheduled downtime and successfully completed all the system tests. All ACCRE systems are now available. You can monitor system availability here and please report any odd behaviors via our helpdesk. Our next scheduled downtime will exceptionally only last 2 days and fall on...
May/June 2021 storage issues resolved; /scratch and Slurm are back to normal use
Jun. 1, 2021—We have verified that /scratch is performing at the same level it was prior to the down time. You may resume using it as you normally would as well as submit jobs to Slurm without using the special AtRiskUnstableEnvironment reservation. Prior to the downtime /scratch was operating with only 1 of the 2 controllers in...
Some programs on “maxwell” and “pascal” GPU nodes may need to be upgraded following May downtime
May. 4, 2021—We are going to upgrade the underlying operating system on the GPU nodes in the next downtime two weeks from now. As we tested we found that the OpenMPI associated with GCC 5.4 (OpenMPI/1.10.3) does not work with the newly installed driver, so if the system gets upgraded then the 1.10.3 version of the MPI...
Cluster access restored following downtime on May 18-20
May. 3, 2021—The May 2021 scheduled downtime has concluded. Updates on the subsequent storage issues have been moved to a separate post. Update, 5/21/2021 5pm: While we wait for the results of the vendor’s analysis of the system events, we are restoring job submissions to the cluster so they can run over the weekend. Jobs will be...
Info on /scratch outages from 4/14 and 4/23
Apr. 14, 2021—Update, 4/23 10pm: The /scratch storage subsystem is recovered. You may resume any affected jobs and report any issues via our helpdesk. Update, 4/23 7pm: The /scratch storage subsystem was unable to gracefully handle the controller failure and the remaining controller began losing connections to some of the LUNs. We are doing a clean shutdown...
Ask ACCRE: How do I map my home directory on ACCRE as a network drive on my computer?
Apr. 12, 2021—By mapping ACCRE as a networked drive, we can move files from our computer to ACCRE, and from ACCRE to our computer, as if it was a drive on our computer. Here are instructions for mapping ACCRE as a networked drive on Windows and macOS. Mapping ACCRE to a computer running Windows We will be...
Improvements to the ACCRE onboarding process, including training classes offered on demand
Mar. 18, 2021—Starting today, our training classes for new users for ACCRE will be offered in a new online, on-demand format hosted on RedCap. This applies to all three classes – Intro to Unix, Intro to the Cluster and Intro to Slurm – and will replace the three day series of classes we have held in the...
/data and /home interruption, MATLAB jobs interrupted
Feb. 23, 2021—Update, 2/24: In a separate issue we have updated the license file for MATLAB and a couple of jobs have probably died in the process. Please check the output of your MATLAB jobs and try again if this was the case. The cluster experienced a momentarily loss in connectivity to /home and /data at approximately 4:07pm...