Skip to main content

Cluster Status Notice Category

[Resolved] /scratch and /data are back online following weekend maintenance

Jan. 24, 2019—Update, 2/12/2019: /scratch and /data are back online and we are now accepting new jobs. We were never able to get the maintenance command to run successfully, but we were able to verify (with IBM’s assistance) the integrity of /scratch and /data, which is great news and means we will not need to take another...

Read more


Final Steps for CentOS 7 Upgrade

Jan. 10, 2019—Update, Jan 25: The CentOS 6 login is now closed. Original post below… It has been a long journey, but we are almost to the end! Please see below for a schedule of the final systems to be upgraded to CentOS 7. Note this schedule does not include a handful of custom/private gateways that still...

Read more


[Resolved] Full cluster downtime on Wednesday, Dec 19 starting at 6am; make sure to log out and halt any running processes before downtime starts

Dec. 10, 2018—Update, 12/20/2018: The GPU drivers upgrade on all Maxwell and Pascal nodes in the CentOS 7 cluster is now complete and the nodes are available to host jobs. Thank you for your patience. Update, 12/19/2018: The cluster is now back online and accessible for normal use again, with the exception of the GPU nodes. We...

Read more


[Resolved] Problems with GPFS; logins and jobs may be affected

Nov. 7, 2018—Update, 3pm: /home is back online. Please check your jobs’ output very carefully as it is likely that many will need to be re-run, especially if they were performing I/O to or from /home. Jobs performing I/O to /scratch, /data, or /dors may have survived. Please open a helpdesk ticket with us if you have...

Read more


[Resolved] Cluster unresponsive or sluggish for some users

Oct. 18, 2018—Update, 2:30pm: All clear – if you notice anything usual, submit a helpdesk ticket as always. We’re receiving reports from users this morning about the cluster being unresponsive or sluggish. We are investigating the issue and will have an update soon. Thanks!

Read more


[Resolved] /scratch and /data partially offline

Oct. 11, 2018—Update, 10/16/2018:  /scratch and /data are now 100% available again and we have successfully upgraded the cache in all of our GPFS storage appliances. Please open a helpdesk ticket with us if you have any questions or notice anything unusual. Update, 10/15/2018, 5pm: We are still working to get parts of /scratch and /data back...

Read more


[Resolved] Cluster inaccessible

Oct. 1, 2018—Update, 2pm: The cluster is accessible again. Thank you for bearing with us and we sorted through these issues. We were able to successfully upgrade the cache on three of our six GPFS storage appliances. The third one is where we encountered issues. A hardware problem prevented us from booting the device cleanly until an...

Read more


[Resolved] Network maintenance ongoing; some login issues and/or stale file handles possible

Sep. 27, 2018—Update, 10/1/2018: This has been resolved. We have been experiencing some network issues for the last few days that have impacted logins and caused occasional stale file handle errors. Later today, we will be performing some maintenance that we hope will resolve this issue or at least further isolate the root cause. This work (which...

Read more


[Resolved] Sporadic stale file handles & potential error messages when logging in to CentOS 7 environment

Sep. 18, 2018—Update 9/21/2018: It appears this issue has been resolved. Please open a Helpdesk ticket if you experience any problems. This morning we began sporadically seeing stale file handle messages in our CentOS 7 environment, which may have also impacted logins to login7.accre.vanderbilt.edu. We initially thought this due to a bad hard drive on one of...

Read more


CentOS 7 Upgrade: login.accre.vanderbilt.edu now points to CentOS 7, login6.accre.vanderbilt.edu available temporarily if needed

Aug. 13, 2018—Update, 12/20/2018: GPUs are now available for CentOS 7. See: GPU Nodes Available for Testing in CentOS 7 Environment [Resolved] Full cluster downtime on Wednesday, Dec 19 starting at 6am; make sure to log out and halt any running processes before downtime starts Update, 9/28/2018: We have now pointed login.accre.vanderbilt.edu to the CentOS 7 environment....

Read more