I/O delays affecting jobs resolved; Visualization Portal back online
Update, 12/13/2021: These issues have been resolved. Please open up a helpdesk ticket if you run into any issues.
Update, 12/10/2021: The Visualization Portal is being taken offline for emergency preventive security updates. You can check the status of the service here.
We have received reports of compute jobs failing due to I/O delays during at least 3 different periods in as many days. The I/O delays seem limited to the DORS sub-system of the GPFS storage. We are looking through the usage logs to identify the cause, but our initial assessment is that these jobs are heavily using a large number of files in a manner that our data tiering policies don’t expect. If you have experienced jobs repeatedly failing, please open up a helpdesk ticket detailing the event, the time frame, and the path of the files you are using.