Skip to main content

Storage issues with /scratch and networked storage

Posted by on Friday, November 20, 2020 in Website.

At around 10am this morning we received alerts for the /scratch storage sub-system and subsequently for the networked storage sub-system. /scratch unmounted on 57 compute nodes and 20 GPU nodes as well as the gateways. An investigation of all three sub-systems (/data, /scratch, networked storage) showed that a few LUNs were unavailable due to three of the servers being in a bad state. Those have been cleared and all three storage sub-systems have been stabilized. We continue to keep a close eye on them and are looking into any performance issues.