MATLAB license file to be updated; some jobs may fail
Jan. 28, 2022—We are going to install MATLAB 2021a and 2021b onto ACCRE, which will require us to update the license file on the license server. In the process we are going to restart the license daemon so it may cause some MATLAB jobs to fail.
ACCRE systems back online early from scheduled downtime
Jan. 6, 2022—Update, 1/12: Our systems are back online and ready for use. Please open a support ticket if you notice any issues. We will be taking our systems offline for our scheduled downtime on January 11-13th. This will include all gateways, all storage systems, and all compute clusters. The scope of the work includes: firmware updates...
I/O delays affecting jobs resolved; Visualization Portal back online
Dec. 10, 2021—Update, 12/13/2021: These issues have been resolved. Please open up a helpdesk ticket if you run into any issues. Update, 12/10/2021: The Visualization Portal is being taken offline for emergency preventive security updates. You can check the status of the service here. We have received reports of compute jobs failing due to I/O delays during...
Sai Medury joins ACCRE as Associate System Administrator
Nov. 10, 2021—Sai Medury joined ACCRE in Nov 2021. He helps configure, test, and troubleshoot systems at ACCRE. He also works closely with the Centre for Structural Biology (CSB) and helps maintain CSB systems at ACCRE. He is currently completing his Ph.D. in Computational Science from the University of Tennessee at Chattanooga and has a Master of...
/scratch restored following outage this morning
Nov. 1, 2021—Update, 11/1/2021 3pm: The /scratch storage sub-system has been remounted across the cluster and the public gateways. There are a few custom gateways that will need to be rebooted and that will be coordinated with the respective groups. One of the components for the /scratch storage sub-system entered a bad state over the weekend. Our...
Storage issues: cluster available for normal use
Oct. 3, 2021—Update, 10/25/2021: The cluster will be be available for normal use at 10:30am this morning. The system remained stable and error free over the weekend. We were also able to catch up on tape backup operations. Update, 10/22/2021 1pm: Based on input we received from the vendor last night and comparing the available options, we...
New software stack “2020b” is available on the cluster
Sep. 27, 2021—The new software stack 2020b is available on the ACCRE cluster. It includes the following features: GCC version 10.2 New Intel compiler and MKL libraries etc. released in 2020 R 4.0.5 Python 3.8.6 and Python 2.7.18 SciPy-bundle/2020.11 (which includes the newer version of numpy) etc. The 2020b software stack is built based on the GCC...
ACCRE Downtime scheduled for September 17-18 has been completed
Aug. 27, 2021—Update, 9/18: We have finished all the work within scope of this scheduled downtime and successfully completed all the system tests. All ACCRE systems are now available. You can monitor system availability here and please report any odd behaviors via our helpdesk. Our next scheduled downtime will exceptionally only last 2 days and fall on...
ACCRE welcomes Mark Keever, executive director of research IT for Vanderbilt
Aug. 4, 2021—Mark Keever is the Executive Director of Research IT for Vanderbilt University. Mark comes to Vanderbilt from Oregon State University, where he was the Director of Digital Research Infrastructure, and Co-PI on an NSF cyberinfrastructure datacenter. Prior experience from Georgia Tech contributes to his 20 years of research computing experience.
May/June 2021 storage issues resolved; /scratch and Slurm are back to normal use
Jun. 1, 2021—We have verified that /scratch is performing at the same level it was prior to the down time. You may resume using it as you normally would as well as submit jobs to Slurm without using the special AtRiskUnstableEnvironment reservation. Prior to the downtime /scratch was operating with only 1 of the 2 controllers in...