Skip to main content

GPFS briefly unmounted at 8:02am on Wednesday; check output of jobs running at the time

Posted by on Wednesday, July 24, 2019 in Website.

At approximately 8:02 AM today a command was issued to remove a misconfigured compute node from the cluster in GPFS. Due to the nature of the configuration problem and the timing of the command, this caused most or all of the ACCRE gateway and compute nodes to unmount GPFS for approximately one minute.

We are working to determine the precise cause of this event, but have updated our internal policies and documentation so that we should not again encounter the conditions that caused this interruption.

If you had jobs running at 8:02 AM today that were accessing /home, /scratch, /data, or /dors, those jobs may have died. If they did not, please check the output of any such jobs very carefully.

All filesystem mounts recovered from this problem and are working normally. We apologize for any interruption or delay in your work that this may have caused.

If you have any questions about this, please do not hesitate to open up a Help Desk ticket with us. Thank you…