/data and /home interruption, MATLAB jobs interrupted
Feb. 23, 2021—Update, 2/24: In a separate issue we have updated the license file for MATLAB and a couple of jobs have probably died in the process. Please check the output of your MATLAB jobs and try again if this was the case. The cluster experienced a momentarily loss in connectivity to /home and /data at approximately 4:07pm...
Networked storage refusing to connect
Feb. 8, 2021—Further updates will be added to the downtime announcement. Update, 2/9 1pm: In order to avoid having two general downtimes in quick succession, ACCRE will begin both the GPFS fix as well as the planned maintenance items at noon tomorrow, Feb 10th. Please plan accordingly and open a helpdesk ticket if you need any assistance...
ACCRE networked storage connectivity
Dec. 11, 2020—Update, 12/12: We were able to identify the source of the problem that was preventing the export services from operating normally. An update was applied just afternoon today and that has cleared the errors that were causing the NFS service to crash. The workstations of one of the groups that use the service got into...
February 2021 maintenance: All ACCRE systems are back online
Dec. 11, 2020—Update, 2/12: We have finished all the work in scope for the planned 3-day downtime and restored the network storage service. All our production and ACCREx systems are back online. New hardware for /scratch has been installed and full network redundancy between the rooms in the data center has been reestablished. We did not shutdown...
Storage issues with /scratch and networked storage
Nov. 20, 2020—At around 10am this morning we received alerts for the /scratch storage sub-system and subsequently for the networked storage sub-system. /scratch unmounted on 57 compute nodes and 20 GPU nodes as well as the gateways. An investigation of all three sub-systems (/data, /scratch, networked storage) showed that a few LUNs were unavailable due to three...
[Resolved] Downtime for /scratch storage on Wednesday 11/4 from 6am to 2pm CT
Oct. 27, 2020—Update, 11/4 2pm: The /scratch storage was successfully repaired during the exceptionally scheduled downtime this morning. Please feel free to resume activities that use /scratch and report any issues you may find. One of the GPFS disk groups of the /scratch storage had a drive failure, which is not unusual. We replaced the failed drive...
Network issues resolved and network storage stabilized; emergency update to GPFS applied
Sep. 20, 2020—9/22 3:05pm: ACCRE’s networked storage (aka DORS) is stabilized. We are working with the hardware vendor to determine why one of the controllers for that subsystem performed a service halt instead of a graceful fail over to the redundant controller. Please submit a helpdesk ticket for any new issues you might experience. 9/22 9:26am: This...
Staff Spotlight: Lindsey Fox
Sep. 2, 2020—Lindsey holds an M.S. in Engineering Science with an emphasis in Geology from the University of Mississippi. She has worked as a GIS Analyst in Vanderbilt’s Civil and Environmental Engineering department and most recently as GIS Coordinator to the campus community through the Vanderbilt Library. Lindsey has actively contributed to research teams investigating topics ranging from transportation to...
Staff Spotlight: Ashley Brammer
Aug. 31, 2020—Ashley is a graduate of Belmont University and has a diverse background in business operations management. Along with her role at ACCRE, Ashley also serves as an Administrative Specialist and Executive Secretary in the Department of Physics & Astronomy. When not working, she loves her animals, sewing dog sweaters, collecting anything vintage, and playing everything...
Increased user demand for ACCRE resources is generating a high volume of support tickets
Aug. 24, 2020—Increased user demand for ACCRE resources is generating a high volume of support tickets. Please bear with us as we work diligently to resolve user issues. You can view current ticket volume here.